Skip to content

adjustments made to getting unique row id#1

Merged
tmaiaroto merged 8 commits intomainfrom
fix-duplicate-rows
Aug 16, 2025
Merged

adjustments made to getting unique row id#1
tmaiaroto merged 8 commits intomainfrom
fix-duplicate-rows

Conversation

@tmaiaroto
Copy link

@tmaiaroto tmaiaroto commented Aug 16, 2025

Summary by CodeRabbit

  • New Features

    • Basic-slice mappings now preserve duplicates and original row order by treating each row as a distinct element.
  • Bug Fixes

    • Corrected de-duplication for slices of basic types, including nested cases.
    • Improved and clearer error messages when basic-slice mappings have zero or multiple matching columns.
  • Documentation

    • Added design philosophy and deduplication guidance; README updated to explain fail-fast behavior and mapping rules.
  • Tests

    • Added tests covering basic-slice and nested-basic mapping scenarios and updated row-loading tests.

@coderabbitai
Copy link

coderabbitai bot commented Aug 16, 2025

Walkthrough

Enforces single-column results for basic-type slice mappings and changes deduplication identity: basic mappers use a per-row position-based UID propagated to nested mappers. Updates column allocation, row loading (adds rowCount), trims tag whitespace, adds tests, and updates documentation and design notes.

Changes

Cohort / File(s) Summary
Design / Plan
ai_plans/FIX_DUPLICATE_ROWS.md
New plan describing root cause and final approach: require one column for basic slices, use position-based per-row identity for basic mappings, handle nested basics, and add tests/docs.
Column allocation logic
column.go
For m.IsBasic: distinguish top-level vs nested basics. Top-level requires exactly one remaining column; nested basics must match exactly one ancestor-qualified column. Chosen column is mapped to PresentColumn and removed from remaining columns. Added doc comment and improved error messages.
Row loading / identity
load.go
loadRow gains a rowCount parameter; loadRows tracks/increments rowCount per scanned row. For m.IsBasic, uid = "row-"+strconv.Itoa(rowCount); non-basic use getUniqueId. rowCount is propagated into subMap recursive calls. Replaced errors.New(fmt.Sprintf(...)) with fmt.Errorf(...).
Mapper tag parsing tweak
mapper.go
Trim whitespace when parsing carta/db tag first segment (use strings.TrimSpace) to properly derive ancestor-qualified names.
Tests updated / added
load_test.go, mapper_test.go
load_test.go updated to pass new rowCount arg (0). Added tests covering top-level and nested basic-slice mapping, positive and negative cases, and a PostWithTags test type; ensures duplicates preserved for []string and errors for invalid column matches.
Documentation
README.md, DESIGN_PHILOSOPHIES.md
README reworked to state dedup rules (struct slices dedupe by identity; basic-type slices preserve rows) and fail-fast guidance. New DESIGN_PHILOSOPHIES.md documents design choices and single-column rule for basic slices.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Mapper
  participant Loader as loadRows/loadRow
  participant Resolver

  Client->>Mapper: Map(rows, &dest)
  Mapper->>Loader: loadRows(...)
  loop for each row (index n)
    Loader->>Loader: Scan row
    alt mapper.IsBasic
      Loader->>Loader: uid = "row-" + n (position-based)
    else
      Loader->>Loader: uid = getUniqueId(row, mapper)
    end
    Loader->>Resolver: lookup/create entry by uid
    Loader->>Loader: populate fields, handle nulls
    Loader->>Loader: recurse into sub-maps (pass rowCount n)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hop through rows, each step a beat,
One column per hop, duplicates I meet.
I mark each place with a tiny tag,
No accidental joins, no lost rag.
A rabbit's map—order kept, neat 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-duplicate-rows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
column.go (3)

21-23: Enforce-single-column rule for basic mappers: good, but include actual count in error

The guard is correct. Consider including the actual number of columns to aid debugging and use fmt.Errorf.

Apply this diff:

-		if len(columns) != 1 {
-			return errors.New("carta: when mapping to a slice of a basic type, the query must return exactly one column")
-		}
+		if len(columns) != 1 {
+			return fmt.Errorf("carta: when mapping to a slice of a basic type, the query must return exactly one column (got %d)", len(columns))
+		}

And import fmt:

 import (
 	"database/sql"
-	"errors"
+	"errors"
+	"fmt"
 	"sort"
 	"strings"
 )

24-31: Minor: break after mapping the single present column

Since the basic path requires exactly one column, you can break after handling it to avoid iterating unnecessarily.

 		for cName, c := range columns {
 			presentColumns[cName] = column{
 				typ:         c.typ,
 				name:        cName,
 				columnIndex: c.columnIndex,
 			}
 			delete(columns, cName)
+			break
 		}

3-8: fmt import needed if adopting fmt.Errorf above

If you take the previous suggestion, add fmt to imports here.

 import (
 	"database/sql"
-	"errors"
+	"errors"
+	"fmt"
 	"sort"
 	"strings"
 )
load.go (2)

71-75: Avoid fmt.Sprintf in hot path

Micro-optimization: replace fmt.Sprintf with strconv.Itoa to reduce allocations in a per-row path.

Add strconv to imports:

 import (
 	"database/sql"
 	"errors"
 	"fmt"
+	"strconv"
 	"reflect"
 
 	"github.com/hackafterdark/carta/value"
 )

Change UID generation:

-	if m.IsBasic {
-		uid = uniqueValId(fmt.Sprintf("row-%d", rowCount))
-	} else {
+	if m.IsBasic {
+		uid = uniqueValId("row-" + strconv.Itoa(rowCount))
+	} else {
 		uid = getUniqueId(row, m)
 	}

111-114: Use fmt.Errorf instead of errors.New(fmt.Sprintf(...))

More idiomatic and avoids an extra allocation.

-				if !(isDstPtr || nullable) {
-					return errors.New(fmt.Sprintf("carta: cannot load null value to type %s for column %s", typ, col.name))
-				}
+				if !(isDstPtr || nullable) {
+					return fmt.Errorf("carta: cannot load null value to type %s for column %s", typ, col.name)
+				}
mapper_test.go (1)

512-549: Add a negative test for multi-column basic mappings

To lock in the new allocateColumns rule, add a test that mapping to []string with more than one column returns an error.

Example:

func TestMapToBasicSlice_MultipleColumnsError(t *testing.T) {
	db, mock, err := sqlmock.New()
	if err != nil { t.Fatal(err) }
	defer db.Close()

	rows := sqlmock.NewRows([]string{"tag", "extra"}).
		AddRow("tag1", "x")

	mock.ExpectQuery("SELECT (.+) FROM tags").WillReturnRows(rows)

	sqlRows, err := db.Query("SELECT * FROM tags")
	if err != nil { t.Fatal(err) }

	var tags []string
	err = Map(sqlRows, &tags)
	if err == nil {
		t.Fatalf("expected error when mapping to []string with multiple columns, got nil")
	}
}
ai_plans/FIX_DUPLICATE_ROWS.md (1)

25-31: Align plan wording with implementation (rowCount vs flags) and tighten phrasing

The implementation uses a row-count-based UID; update the doc to reflect the parameter and clean up phrasing.

Suggested edits:

  • Replace “loadRow signature updated … flags int” with “loadRow signature updated … rowCount int”.
  • Clarify: “A new, position-based unique ID will be generated for each row using a monotonically increasing row index (rowCount).”
  • Minor grammar: add missing articles and commas.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 65d1a2d and 8b4f5c3.

📒 Files selected for processing (5)
  • ai_plans/FIX_DUPLICATE_ROWS.md (1 hunks)
  • column.go (2 hunks)
  • load.go (3 hunks)
  • load_test.go (6 hunks)
  • mapper_test.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
mapper_test.go (1)
mapper.go (1)
  • Map (92-146)
load.go (2)
value/cell.go (2)
  • NewCell (37-39)
  • Cell (20-27)
mapper.go (1)
  • Mapper (41-88)
🪛 LanguageTool
ai_plans/FIX_DUPLICATE_ROWS.md

[grammar] ~9-~9: There might be a mistake here.
Context: ...tests exist for mapping to basic slices. 2. Attempts to write such tests lead to inf...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...ix Column Allocation (allocateColumns) The logic will be modified to enforce a ...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...bypass the existing name-matching logic. - It will validate that only one column is...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ... 2. Fix Unique ID Generation (loadRow) The logic will be modified to generate a...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ... getUniqueId(row, m) will be bypassed. - A new, position-based unique ID will be ...

(QB_NEW_EN)

🔇 Additional comments (4)
load.go (2)

21-33: Row index tracking for basic mapping: LGTM

Using a per-row counter to produce per-row identities fixes the duplicate-collapsing bug for basic mappers.


222-229: Passing the row index into submaps: LGTM

Propagating rowCount ensures nested basic submaps also get per-row identity while non-basic submaps keep content-based identities.

load_test.go (1)

52-52: No remaining 3-arg loadRow calls detected — safe to merge.

Verified across the repo with

ast-grep --pattern 'loadRow($_, $_, $_)'

which returned no matches.

mapper_test.go (1)

512-549: New basic-slice mapping test preserves duplicates: LGTM

Good coverage for the basic-mapper path and duplicate handling.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
mapper_test.go (2)

512-549: Solid coverage for basic slice mapping; consider parallelizing and closing rows

The test correctly asserts order preservation and duplicate retention when mapping to []string. Minor nits:

  • Run in parallel to speed up the suite.
  • Defer closing sqlRows to avoid leaking resources in longer test runs.

Apply this diff:

 func TestMapToBasicSlice(t *testing.T) {
+	t.Parallel()
 	db, mock, err := sqlmock.New()
 	if err != nil {
 		t.Fatalf("an error '%s' was not expected when opening a stub database connection", err)
 	}
 	defer db.Close()

 	rows := sqlmock.NewRows([]string{"tag"}).
 		AddRow("tag1").
 		AddRow("tag2").
 		AddRow("tag1") // Duplicate tag

 	mock.ExpectQuery("SELECT (.+) FROM tags").WillReturnRows(rows)

 	sqlRows, err := db.Query("SELECT * FROM tags")
 	if err != nil {
 		t.Fatalf("error '%s' was not expected when querying rows", err)
 	}
+	defer sqlRows.Close()

 	var tags []string
 	err = Map(sqlRows, &tags)
 	if err != nil {
 		t.Errorf("error was not expected while mapping rows: %s", err)
 	}

551-573: Good negative case; verify expectations and ensure no partial writes

This test correctly ensures an error is returned for multiple columns. Two small improvements:

  • Verify sqlmock expectations to catch any unmet expectations in failure paths.
  • Assert the destination slice remains untouched on error.
  • Optionally run in parallel and close rows.

Apply this diff:

 func TestMapToBasicSlice_MultipleColumnsError(t *testing.T) {
+	t.Parallel()
 	db, mock, err := sqlmock.New()
 	if err != nil {
 		t.Fatal(err)
 	}
 	defer db.Close()

 	rows := sqlmock.NewRows([]string{"tag", "extra"}).
 		AddRow("tag1", "x")

 	mock.ExpectQuery("SELECT (.+) FROM tags").WillReturnRows(rows)

 	sqlRows, err := db.Query("SELECT * FROM tags")
 	if err != nil {
 		t.Fatal(err)
 	}
+	defer sqlRows.Close()

 	var tags []string
 	err = Map(sqlRows, &tags)
 	if err == nil {
 		t.Fatalf("expected error when mapping to []string with multiple columns, got nil")
 	}
+
+	// Ensure nothing was written to the destination on error.
+	if len(tags) != 0 {
+		t.Fatalf("expected no tags to be mapped on error, got %d", len(tags))
+	}
+
+	// Verify mock expectations even on error paths.
+	if err := mock.ExpectationsWereMet(); err != nil {
+		t.Errorf("there were unfulfilled expectations: %s", err)
+	}
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8b4f5c3 and 0553cf8.

📒 Files selected for processing (3)
  • column.go (2 hunks)
  • load.go (5 hunks)
  • mapper_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • load.go
  • column.go

Docstrings generation was requested by @tmaiaroto.

* #1 (comment)

The following files were modified:

* `column.go`
* `load.go`
@coderabbitai
Copy link

coderabbitai bot commented Aug 16, 2025

Note

Generated docstrings for this pull request at #2

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
README.md (1)

239-242: Clarify identity semantics for structs and constraints for basic types; add pointer to the fix doc

Recommend tightening the language and documenting the single-column requirement and error behavior for basic-type mappings. Also add the cross-reference to the FIX_DUPLICATE_ROWS plan for discoverability.

-When mapping to **slices of structs**, Carta removes duplicate entities. This is a side effect of the data mapping process, which merges rows that identify the same entity (e.g., a `Blog` with the same ID appearing in multiple rows due to a `JOIN`). To ensure correct mapping, you should always include uniquely identifiable columns (like a primary key) in your query for each struct entity.
+When mapping to **slices of structs**, Carta removes duplicate entities by identity: rows that refer to the same entity (e.g., the same primary key across JOIN-expanded rows) are merged into a single struct. Include a stable, uniquely identifying column (typically the primary key) for each struct entity (including nested associations) in your SELECT.

-When mapping to **slices of basic types** (e.g., `[]string`, `[]int`), every row from the query is treated as a unique element, and **no de-duplication occurs**.
+When mapping to **slices of basic types** (e.g., `[]string`, `[]int`), Carta expects exactly one column in the result set. It preserves row order and duplicates; no de-duplication occurs. If multiple columns are returned, mapping fails with an error. See [ai_plans/FIX_DUPLICATE_ROWS.md](ai_plans/FIX_DUPLICATE_ROWS.md) for details.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0553cf8 and 5306111.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🔇 Additional comments (1)
README.md (1)

239-241: AI summary claims a README cross-reference that isn’t present in the text

The AI-generated summary mentions adding a cross-reference to ai_plans/FIX_DUPLICATE_ROWS.md in the README, but no such link appears in the current content. Consider adding it where you describe basic-type mappings.

📝 Add docstrings to `fix-duplicate-rows`
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
column.go (1)

18-32: Docstring is clear; consider clarifying top-level vs. nested basic mappers.
You did a good job documenting mutability and the basic-mapper constraint. If nested basic mappers are supported (e.g., struct fields that are []string), clarify that the single-column constraint applies to the basic mapper’s column, not necessarily the entire unclaimed column set. See my note below on potential misbehavior for nested basic sub-maps.

Apply this small wording tweak:

-// For a mapper marked IsBasic, the function requires exactly one remaining column in
-// columns and binds that single column to the mapper; otherwise it returns an error.
+// For a mapper marked IsBasic, the function binds exactly one column to the mapper.
+// At the top level (no ancestors), the query must return exactly one column overall.
+// For nested basic mappers (with ancestors), exactly one matching column must exist
+// by name/prefix within the remaining unclaimed columns; otherwise an error is returned.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5306111 and d20e33d.

📒 Files selected for processing (2)
  • column.go (2 hunks)
  • load.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • load.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
column.go (1)
mapper.go (1)
  • Mapper (41-88)
🔇 Additional comments (1)
column.go (1)

5-5: fmt import is appropriate and used in new error path.
Nothing to change here.

…. update the ai plan documentation for historic documentation. adjust the readme to add a new design philosophies section to further help illustrate, document, and guide.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (2)
README.md (2)

23-23: Fix typo: “exmple” → “example” and improve phrasing

Public-facing docs should avoid typos.

-Assume that in above exmple, we are using a schema containing has-one and has-many relationships:
+Assume that in the above example, we are using a schema containing has-one and has-many relationships:

27-27: Fix typo: “corresponging” → “corresponding”

-And here is our SQL query along with the corresponging Go struct:
+And here is our SQL query along with the corresponding Go struct:
♻️ Duplicate comments (1)
column.go (1)

36-79: Correct fix: handle nested basic sub-mappers by ancestor-qualified matching

This resolves the false “exactly one column” error for nested basic slices and enforces the single-column rule only for top-level basics. Matches prior guidance and the new tests.

🧹 Nitpick comments (14)
mapper.go (1)

260-264: Consider trimming whitespace for db tags as well

For consistency and to prevent edge-case mismatches, apply the same TrimSpace to db tags.

Outside selected lines, the minimal change would look like:

// inside determineFieldsNames, db-tag branch
if tag := nameFromTag(field.Tag, DbTagKey); tag != "" {
    name = strings.TrimSpace(tag)
} else {
    name = field.Name
}
column.go (2)

18-33: Doc comment does not reflect nested-basic behavior

The comment states “IsBasic requires exactly one remaining column,” but the implementation distinguishes top-level vs nested basic mappers. Update the comment to prevent confusion.

Apply this diff to clarify:

-// For a mapper marked IsBasic, the function requires exactly one remaining column in
-// columns and binds that single column to the mapper; otherwise it returns an error.
+// For a mapper marked IsBasic:
+// - Top-level (no ancestors): requires exactly one remaining column overall and binds it.
+// - Nested (has ancestors): requires exactly one matching ancestor-qualified column among
+//   the remaining columns and binds it.
+// Otherwise it returns an error.

129-159: Potential enhancement: reduce accidental multi-match sensitivity

getColumnNameCandidates generates both case- and snake-variants, which is correct, but it can intentionally flag errors if both Tags and tags exist. That’s desired per philosophy. If you ever need lenient behavior per dialect, consider using driver-provided canonicalization or EqualFold-based matching behind a feature flag. No action needed now.

DESIGN_PHILOSOPHIES.md (2)

26-31: Optional style tweak: avoid split infinitive

Not critical, but reads cleaner.

-A "graceful" solution might be to arbitrarily pick the first column,
+A "graceful" solution might be to pick the first column arbitrarily,

34-39: Clarify NULL-to-non-nullable wording

Improve clarity while keeping intent intact.

-**Behavior:** `carta.Map` **returns an error during scanning**: `carta: cannot load null value to type string for column name`.
+**Behavior:** `carta.Map` **returns an error during scanning** (e.g., `carta: cannot load NULL into non-nullable type string for column name`).
mapper_test.go (1)

575-579: Tag casing is fine; consider adding a whitespace-trim test

Since Carta-tags for sub-maps now TrimSpace, add a test to cover stray whitespace in the carta tag value.

Example test to add:

func TestNestedBasicSliceMap_TagWhitespace(t *testing.T) {
	type PostWithTagsWS struct {
		ID   int      `db:"id"`
		Tags []string `carta:"   Tags  "` // intentional whitespace
	}

	db, mock, err := sqlmock.New()
	if err != nil {
		t.Fatal(err)
	}
	defer db.Close()

	rows := sqlmock.NewRows([]string{"id", "Tags"}).
		AddRow(1, "tag1").
		AddRow(1, "tag2")

	mock.ExpectQuery("SELECT (.+) FROM posts").WillReturnRows(rows)

	sqlRows, err := db.Query("SELECT * FROM posts")
	if err != nil {
		t.Fatal(err)
	}

	var posts []PostWithTagsWS
	if err := Map(sqlRows, &posts); err != nil {
		t.Fatalf("unexpected error: %v", err)
	}
	if len(posts) != 1 || len(posts[0].Tags) != 2 {
		t.Fatalf("unexpected result: %#v", posts)
	}
	if err := mock.ExpectationsWereMet(); err != nil {
		t.Errorf("unfulfilled expectations: %s", err)
	}
}
ai_plans/FIX_DUPLICATE_ROWS.md (8)

3-7: Add a minimal before/after example to ground the problem statement.

A concise SQL + mapping example (showing duplicates being lost before vs preserved after) will make the problem and success criteria unambiguous.

You could append something like:

 ## 1. Problem Summary
 The `carta` library was incorrectly de-duplicating rows when mapping to a slice of a basic type (e.g., `[]string`). The logic, designed to merge `JOIN`ed rows for slices of structs, was misapplied, causing data loss. This also meant the `m.IsBasic` code path was entirely untested.

 The goal was to modify the library to correctly preserve all rows, including duplicates, when mapping to a basic slice, and to add the necessary test coverage.
+### Example
+Given:
+```sql
+SELECT tag FROM post_tags WHERE post_id = 1 ORDER BY tag;
+-- rows: ("go"), ("go"), ("orm")
+```
+Mapping to `[]string` previously returned `["go","orm"]` (incorrectly collapsed).
+After this change it returns `["go","go","orm"]` (correct).

12-16: Clarify lifecycle of the position-based UID (reset per result set/pagination).

Position-based row identity is great for preserving duplicates, but it relies on the UID being scoped to a single result set. If rows are streamed/merged across pages or multiple queries into one destination slice, position-based UIDs may collide and reintroduce dedup/overwrite behaviors.

Please confirm:

  • rowCount resets to 0 for each new top-level mapping operation (not shared across queries/pagination).
  • For any append/merge APIs, UIDs for basic slices are not compared across batches.

If needed, document the scope explicitly:

-1.  **Position-Based Unique IDs:** In `load.go`, the `loadRow` function was modified. When `m.IsBasic` is true, it now generates a unique ID based on the row's position in the result set (e.g., "row-0", "row-1") instead of its content. This ensures every row is treated as a unique element.
+1.  **Position-Based Unique IDs:** In `load.go`, when `m.IsBasic` is true, `loadRow` generates a per-result-set unique ID (e.g., "row-0", "row-1") from the zero-based row index, ensuring every row is treated as unique within that mapping operation.

17-23: Tighten grammar/style in the review refinements bullets.

Minor copy edits for flow and concision.

-Feedback from a code review (via Coderabbit) prompted several improvements:
--   **Performance:** In `load.go`, `fmt.Sprintf` was replaced with the more performant `strconv.Itoa` for generating the position-based unique ID.
--   **Idiomatic Go:** Error creation was changed from `errors.New(fmt.Sprintf(...))` to the more idiomatic `fmt.Errorf`.
--   **Clearer Errors:** The error message for the single-column rule was improved to include the actual number of columns found, aiding debugging.
--   **Test Coverage:** A negative test case was added to `mapper_test.go` to ensure the single-column rule correctly returns an error.
+Feedback from code review prompted several improvements:
+-   **Performance:** In `load.go`, `fmt.Sprintf` was replaced with `strconv.Itoa` for generating the position-based unique ID.
+-   **Idiomatic Go:** Error creation now uses `fmt.Errorf` instead of `errors.New(fmt.Sprintf(...))`.
+-   **Clearer errors:** The single-column rule error message includes the actual number of columns found.
+-   **Test coverage:** A negative test was added to `mapper_test.go` to ensure the single-column rule correctly returns an error.

33-50: Cross-check symbol names and add compatibility notes.

  • Verify the function name “determineFieldsNames” (plural/plural) is correct; “determineFieldNames” reads more idiomatic. If it’s a typo here or in code, align both.
  • Add a compatibility/behavior-change note since identity semantics for basic slices changed; this can affect users relying on previous dedup behavior.
 3.  **Modified `mapper.go`**:
-    -   Corrected the logic in `determineFieldsNames` to properly handle casing in `carta` tags, ensuring ancestor names are generated correctly.
+    -   Corrected the logic in `determineFieldNames` (verify name) to properly handle casing in `carta` tags, ensuring ancestor names are generated correctly.

 5.  **Updated Documentation**:
     -   Updated `README.md` to clarify the difference in de-duplication behavior.
-    -   Created `DESIGN_PHILOSOPHIES.md` to document the "fail-fast" error handling approach.
+    -   Created `DESIGN_PHILOSOPHIES.md` to document the "fail-fast" error handling approach.
+    -   Added a Compatibility Notes section explaining the new per-row identity semantics for basic slices and potential impact when upgrading.

44-47: Vary the phrasing in successive “Added … test …” bullets.

Avoid repetitive sentence starts; list test names or purposes succinctly.

-    -   Added a test for a top-level basic slice (`[]string`) to verify that duplicates are preserved.
-    -   Added a negative test to ensure an error is returned for a multi-column query to a top-level basic slice.
-    -   Added a test for a nested basic slice (`PostWithTags.Tags []string`) to verify correct mapping.
-    -   Added negative tests to ensure errors are returned for nested basic slices with zero or multiple matching columns.
+    -   Top-level `[]string`: verifies duplicates are preserved.
+    -   Top-level `[]string` (negative): multi-column queries produce an error.
+    -   Nested `PostWithTags.Tags []string`: verifies correct column matching and mapping.
+    -   Nested (negative): zero or multiple matching columns produce an error.

35-37: Call out that rowCount propagation also affects nested basic mappers.

You mention propagation earlier; reiterate here that nested basics receive the correct per-row UID without reusing parent content-based IDs.

-    -   Updated `loadRow` to accept a `rowCount` parameter.
-    -   Implemented logic to generate a unique ID from `rowCount` when `m.IsBasic` is true.
+    -   Updated `loadRow` to accept a `rowCount` parameter and propagate it to nested mappers.
+    -   For `m.IsBasic`, generate a per-row unique ID from `rowCount` to preserve duplicates (applies to nested basics as well).

28-30: Clarify “exactly one column overall” for top-level basics (driver column labels vs expressions).

State whether expressions/aliases count as “columns,” and that the rule applies to the result set width as returned by the driver (post-alias).

--   **For top-level basic slices** (`len(m.AncestorNames) == 0`), the query must still contain exactly one column overall.
+-   **For top-level basic slices** (`len(m.AncestorNames) == 0`), the result set must contain exactly one projected column (as labeled by the driver after alias resolution). Expressions are allowed if aliased to a single column.

49-50: Consider adding a “Risks and Trade-offs” section.

Call out observable behavior changes, test coverage boundaries, and known limitations (e.g., ambiguity when multiple identically named columns appear without aliases).

Proposed section:

+## 4. Risks and Trade-offs
+- Behavior change: basic slices no longer de-duplicate by content; duplicates are preserved by design.
+- Column naming: nested basic slices require exactly one matching column label; ambiguous or missing aliases will error.
+- Pagination/streaming: position-based IDs are scoped to a single mapping operation; do not merge across batches using these IDs.
+- Performance: negligible impact; replacing `fmt.Sprintf` with `strconv.Itoa` avoids small allocations on hot paths.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d20e33d and e8a2522.

📒 Files selected for processing (6)
  • DESIGN_PHILOSOPHIES.md (1 hunks)
  • README.md (2 hunks)
  • ai_plans/FIX_DUPLICATE_ROWS.md (1 hunks)
  • column.go (2 hunks)
  • mapper.go (1 hunks)
  • mapper_test.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
column.go (1)
mapper.go (1)
  • Mapper (41-88)
mapper_test.go (1)
mapper.go (1)
  • Map (92-146)
🪛 LanguageTool
DESIGN_PHILOSOPHIES.md

[grammar] ~5-~5: There might be a mistake here.
Context: ...er when using this package. ## Approach Carta adopts the "database mapping" appr...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...omparison to Related Projects #### GORM Carta is NOT an an object-relational map...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...ted Projects #### GORM Carta is NOT an an object-relational mapper(ORM). #### sq...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...must return exactly one column (got 2)`. - Why this is Protection: The library ha...

(QB_NEW_EN)


[style] ~29-~29: Try moving the adverb to make the sentence clearer.
Context: ...il` column. A "graceful" solution might be to arbitrarily pick the first column, but this could lead to the wrong data ...

(SPLIT_INFINITIVE)


[grammar] ~37-~37: There might be a mistake here.
Context: ...l value to type string for column name`. - Why this is Protection: A standard Go ...

(QB_NEW_EN)

ai_plans/FIX_DUPLICATE_ROWS.md

[grammar] ~19-~19: There might be a mistake here.
Context: ...generating the position-based unique ID. - Idiomatic Go: Error creation was chang...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...c Go:** Error creation was changed from errors.New(fmt.Sprintf(...)) to the more idiomatic fmt.Errorf. - ...

(QB_NEW_EN)


[style] ~46-~46: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ery to a top-level basic slice. - Added a test for a nested basic slice (`PostW...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

README.md

[grammar] ~237-~237: Ensure spelling is correct
Context: ...ructure of your struct using the column mames of your query response as well as the t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (8)
mapper.go (1)

247-247: Trimmed Carta tag names: good defensive parsing

Using strings.TrimSpace avoids subtle tag mismatches due to stray whitespace and aligns with stricter matching elsewhere.

column.go (1)

5-5: fmt import addition is appropriate

New error paths in allocateColumns justify the fmt import.

mapper_test.go (5)

512-549: Top-level []string mapping preserves ordering and duplicates: LGTM

Solid test that validates the per-row identity semantics for basic slices.


551-574: Error on multiple columns for []string: LGTM

Confirms the single-column rule for top-level basic mappers.


580-620: Nested basic slice mapping: LGTM

Verifies the ancestor-qualified column selection for a single matching column.


622-644: No matching column error: LGTM

Correctly exercises the failure path when zero candidates remain.


646-668: Multiple matching columns error: LGTM

Ensures ambiguity is surfaced when both case-variants exist.

README.md (1)

233-236: Deduplication guidance reads well

Clear distinction between struct-slice vs basic-slice behavior matches the new tests and implementation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
DESIGN_PHILOSOPHIES.md (1)

10-12: Resolved: ORM wording and spacing are now correct

“Carta is NOT an object-relational mapper (ORM).” fixes the prior double-article and spacing issues.

ai_plans/FIX_DUPLICATE_ROWS.md (1)

27-31: Document the exact column-matching semantics for nested basic slices

This bullet still leaves ambiguity. Spell out how a “match” is determined (source of the name, case sensitivity, aliasing, qualification) to set user expectations and align with actual code.

Apply this diff:

--   **For nested basic slices**, the function now searches the remaining columns for exactly one that matches the ancestor-qualified name (e.g., `tags`). It returns an error if zero or more than one match is found.
+-   **For nested basic slices**, `allocateColumns` searches the remaining columns for exactly one column that matches the field’s resolved name:
+    -   Source: the first segment of the `carta` tag if present (whitespace-trimmed), otherwise the field name.
+    -   Matching: case-insensitive match on the unqualified column label (i.e., the alias as returned by the driver); qualified prefixes like `posts.tags` are ignored for matching.
+    -   Requirement: exactly one match must exist; zero or multiple matches error out.
+    -   Example: For `PostWithTags.Tags []string` with `carta:"tags"`, among columns `[posts.id, posts.title, tags]`, only `tags` matches.

To ensure this matches the actual behavior, run the script below to inspect allocateColumns and its comparisons (case sensitivity, qualification handling):

#!/bin/bash
set -euo pipefail

echo "Locate allocateColumns implementation:"
rg -nC3 -P '\ballocateColumns\s*\(' --type go

echo
echo "Check how column names are compared (case-insensitive, qualification ignored, etc.):"
rg -nC2 -P '\b(strings\.EqualFold|strings\.ToLower|strings\.TrimSpace|SplitN|Split)\b' --type go carta/column.go carta/mapper.go || true

echo
echo "Show columnsByName usages and any filtering for nested basic mappers:"
rg -nC3 -P 'columnsByName|PresentColumns|AncestorNames|IsBasic|len\(m\.AncestorNames\)' --type go carta/column.go carta/load.go carta/mapper.go || true
README.md (1)

4-6: LGTM: positioning and ORM clarification

The refreshed description and explicit non-ORM stance read clearly and align with the design philosophy.

🧹 Nitpick comments (4)
mapper.go (1)

243-256: Trim whitespace for carta tag options (key and value) and guard against extra "="

You already trim the tag name. Extend this to options to avoid silent misses when users add spaces, e.g., carta:"posts, delimiter = _". Also prefer SplitN to handle values containing '='.

Apply this diff:

-					if len(parts) > 1 {
-						for _, part := range parts[1:] {
-							option := strings.Split(part, "=")
-							if len(option) == 2 && option[0] == "delimiter" {
-								subMap.Delimiter = option[1]
-							}
-						}
-					}
+					if len(parts) > 1 {
+						for _, part := range parts[1:] {
+							part = strings.TrimSpace(part)
+							option := strings.SplitN(part, "=", 2)
+							if len(option) == 2 {
+								key := strings.TrimSpace(option[0])
+								val := strings.TrimSpace(option[1])
+								if key == "delimiter" && val != "" {
+									subMap.Delimiter = val
+								}
+							}
+						}
+					}

If you’d like, I can add a small test covering carta:"posts, delimiter = _" to lock this in.

DESIGN_PHILOSOPHIES.md (2)

13-15: Tighten wording and fix minor grammar (sqlx paragraph)

Lowercase “sqlx” for consistency, fix double space, and use “i.e.,”. Also simplify the latter clause.

Apply this diff:

-#### sqlx
-Sqlx does not track has-many relationships when mapping SQL data. This works fine when all your relationships are at most has-one (Blog has one Author) ie, each SQL row corresponds to one struct. However, handling has-many relationships (Blog has many Posts), requires  running many queries or running manual post-processing of the result. Carta handles these complexities automatically.
+#### sqlx
+sqlx does not track has-many relationships when mapping SQL data. This works fine when all your relationships are at most has-one (Blog has one Author), i.e., each SQL row corresponds to one struct. However, handling has-many relationships (Blog has many Posts) requires running many queries or manual post-processing of the result. Carta handles these complexities automatically.

5-7: Optional: Clarify tone in “Approach” sentence

Minor style tweak to read more naturally.

Apply this diff:

-Carta adopts the "database mapping" approach (described in Martin Fowler's [book](https://books.google.com/books?id=FyWZt5DdvFkC&lpg=PA1&dq=Patterns%20of%20Enterprise%20Application%20Architecture%20by%20Martin%20Fowler&pg=PT187#v=onepage&q=active%20record&f=false)) which is useful among organizations with strict code review processes.
+Carta adopts the "database mapping" approach (described in Martin Fowler's [book](https://books.google.com/books?id=FyWZt5DdvFkC&lpg=PA1&dq=Patterns%20of%20Enterprise%20Application%20Architecture%20by%20Martin%20Fowler&pg=PT187#v=onepage&q=active%20record&f=false)), which is well suited to organizations with strict code review processes.
README.md (1)

227-237: Capitalize “Carta” consistently

Keep the project name capitalized for consistency with the rest of the doc.

Apply this diff:

-## Design Philosophy
+## Design Philosophy
@@
-The `carta` package follows a "fail-fast" philosophy to ensure that mapping operations are unambiguous and to protect users from silent bugs. For a detailed explanation of the error handling approach and the balance between user protection and graceful handling, please see the [Design Philosophies](./DESIGN_PHILOSOPHIES.md) document.
+The `Carta` package follows a "fail-fast" philosophy to ensure that mapping operations are unambiguous and to protect users from silent bugs. For a detailed explanation of the error handling approach and the balance between user protection and graceful handling, please see the [Design Philosophies](./DESIGN_PHILOSOPHIES.md) document.
@@
-When mapping to **slices of basic types** (e.g., `[]string`, `[]int`), every row from the query is treated as a unique element, and **no de-duplication occurs**.
- 
-To prevent relatively expensive reflect operations, carta caches the structure of your struct using the column names of your query response as well as the type of your struct.
+When mapping to **slices of basic types** (e.g., `[]string`, `[]int`), every row from the query is treated as a unique element, and **no de-duplication occurs**.
+
+To prevent relatively expensive reflect operations, Carta caches the structure of your struct using the column names of your query response as well as the type of your struct.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e8a2522 and 2293378.

📒 Files selected for processing (6)
  • DESIGN_PHILOSOPHIES.md (1 hunks)
  • README.md (3 hunks)
  • ai_plans/FIX_DUPLICATE_ROWS.md (1 hunks)
  • column.go (2 hunks)
  • mapper.go (2 hunks)
  • mapper_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • column.go
  • mapper_test.go
🧰 Additional context used
🪛 LanguageTool
DESIGN_PHILOSOPHIES.md

[grammar] ~5-~5: There might be a mistake here.
Context: ...er when using this package. ## Approach Carta adopts the "database mapping" appr...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...omparison to Related Projects #### GORM Carta is NOT an object-relational mapper...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...must return exactly one column (got 2)`. - Why this is Protection: The library ha...

(QB_NEW_EN)


[grammar] ~37-~37: There might be a mistake here.
Context: ...-nullable type string for column name`). - Why this is Protection: A standard Go ...

(QB_NEW_EN)

ai_plans/FIX_DUPLICATE_ROWS.md

[grammar] ~14-~14: There might be a mistake here.
Context: ...as unique within that mapping operation. 2. Single-Column Rule: In column.go, th...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...generating the position-based unique ID. - Idiomatic Go: Error creation now uses ...

(QB_NEW_EN)

🔇 Additional comments (2)
mapper.go (1)

260-264: LGTM: robust handling of whitespace in db tag names

Trimming db tag values removes a common source of subtle mapping bugs (e.g., trailing spaces in tags). Looks good.

README.md (1)

23-28: LGTM: added context improves onboarding

The brief schema/setup context before the SQL and struct examples helps readers connect the dots.

@tmaiaroto tmaiaroto merged commit b676bb0 into main Aug 16, 2025
3 checks passed
@tmaiaroto tmaiaroto deleted the fix-duplicate-rows branch August 16, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant