Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds a new CSV file-based connector (implementation, utilities, tests), CLI registration and flags, documentation/checklists for creating dsync connectors, a test-harness flag to skip WriteUpdates tests, and gosec suppression comments in two connector files. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FileConn as FileConnector
participant FS as FileSystem
participant Test as TestHarness
Client->>FileConn: GeneratePlan(request)
FileConn->>FS: Stat / list files (dir or single file)
FS-->>FileConn: file list + sizes
FileConn-->>Client: Partitions (namespace, cursor, estimatedCount)
Client->>FileConn: ListData(partition, cursor)
FileConn->>FS: Open/read file from offset
FS-->>FileConn: CSV bytes
FileConn-->>Client: Documents (JSON/BSON) + next cursor
Test->>FileConn: WriteData(namespace, documents)
FileConn->>FS: Create/append CSV, write header/rows
FS-->>FileConn: ack
FileConn-->>Test: WriteResult
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@connectors/file/connector.go`:
- Around line 418-437: The CSV header construction in the loop over docs uses
map iteration (for k := range doc) which is nondeterministic; to fix, collect
the non-id keys from doc into a slice, sort.Strings that slice before appending
to writer.header (preserve special handling for "id" or "_id" so id appears
first if present), then set writer.header and call
writer.writer.Write(writer.header) as before; update the block around the docs
iteration that computes header (references: writer.header, docs, doc, header
slice, namespace, writer.writer.Write) so CSV columns are produced in a stable,
deterministic order.
🧹 Nitpick comments (7)
internal/app/options/connectorflags.go (1)
667-690:delimiterStron line 668 is only written to but never read.The
Destination: &delimiterStrwrites the CLI string into the local variable, but the actual assignment tosettings.Delimiterhappens in theActioncallback. ThedelimiterStrvariable itself is never used after that. This works because theActionreceives the valuevdirectly, but it's slightly misleading.Note that if the user doesn't pass
--delimiter, theActionwon't fire andsettings.Delimiterstays0. This is safe only becauseNewConn(line 115 in connector.go) defaults0toDefaultDelimiter. This implicit coupling is worth being aware of.Also, a multi-byte UTF-8 character (e.g.,
—) passed as--delimiterwould pass thelen(v) != 1byte-length check despite being a single Unicode code point. If only ASCII delimiters are intended, this is fine; otherwise, consider usingutf8.RuneCountInString(v) != 1.connectors/file/connector_test.go (1)
405-405: Fragile type assertion forTeardown().
conn.(interface{ Teardown() }).Teardown()relies on the concrete type having aTeardown()method that isn't part of theConnectorServiceHandlerinterface. If the method signature ever changes, this will panic at runtime rather than failing at compile time. This pattern is repeated on lines 405, 446, 487, and 532.Consider adding
Teardownto a small interface or at least adding a comment explaining why this cast is necessary.connectors/file/connector.go (5)
97-136: Dead code: line 104 path-empty check is unreachable.
parseFileConnectionStringalready returns an error when the path is empty (line 70-72), sosettings.Pathcan never be empty at line 104. The check andErrPathRequiredare unreachable after a successful parse.
241-260:countRecordsreads the entire file into memory.
reader.ReadAll()loads the full CSV content. This method is called from bothGeneratePlan(which walks a directory, calling it per file) andGetNamespaceMetadata. For large CSV files or directories with many files, this can cause significant memory pressure during planning.A line-counting approach (reading row by row and discarding) would be more memory-efficient if this becomes an issue.
290-352: All data returned in a single response — no pagination.
ListDataloads the entire CSV withReadAll()and returns everything at once withNextCursor: nil. For very large files, this means the entire dataset must fit in memory simultaneously (both as CSV records and serialized bytes).This is acceptable for a v1 CSV connector, but worth noting that it will not scale to large files. If pagination is added later, the cursor bytes (currently holding the file path) would need to also encode an offset.
491-499:Teardownignores errors fromfile.Close().On certain filesystems (e.g., NFS),
Close()can fail even after a successfulFlush(). SinceTeardown()has no return value, consider at least logging errors:🔧 Proposed fix
func (c *connector) Teardown() { c.writeMutex.Lock() defer c.writeMutex.Unlock() - for _, writer := range c.writers { + for ns, writer := range c.writers { writer.writer.Flush() - writer.file.Close() + if err := writer.file.Close(); err != nil { + slog.Warn("failed to close CSV file", "namespace", ns, "err", err) + } } c.writers = make(map[string]*csvFileWriter) }
80-95: Namespace-to-path round-trip may break on namespaces that contain filesystem-sensitive characters.
pathToNamespacereplaces path separators with.andnamespaceToPathreverses it. This creates an ambiguity: a file nameda.b.csvin the root directory produces namespacea.b, whichnamespaceToPathconverts toa/b.csv(a subdirectory). This could silently misroute writes for flat files with dots in their names.Given that this is a known design choice (dots = hierarchy), just worth keeping in mind as a documented limitation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@connectors/file/util.go`:
- Around line 35-43: pathToNamespace currently maps file names with dots into
ambiguous namespaces (e.g., pathToNamespace -> "my.data") that namespaceToPath
can't round-trip; modify the mapping to be reversible by encoding dots in file
names before replacing path separators (or choose a reversible delimiter
scheme), e.g., escape or percent-encode literal '.' characters from base file
names inside pathToNamespace and decode them in namespaceToPath; update the
implementations of pathToNamespace and namespaceToPath together (referencing the
functions pathToNamespace and namespaceToPath) so that any encoding/escaping
rule is applied consistently and ensures round-trip fidelity between namespace
and file path.
🧹 Nitpick comments (4)
internal/app/options/connectorflags.go (1)
667-696: Delimiter validation uses byte length, not rune/character length.
len(v) != 1on line 682 checks byte count. A single multi-byte UTF-8 character (e.g.,·or→) would be rejected even though it's one character. If only ASCII delimiters are intended, this is fine but could be made explicit. If arbitrary single-character delimiters should be supported, useutf8.RuneCountInString(v) != 1instead.Additionally,
rune(v[0])on line 685 only converts the first byte, which would be incorrect for multi-byte characters — though currently guarded by thelen(v) != 1check.Suggested fix for multi-byte support (if desired)
+ "unicode/utf8" ... Action: func(_ *cli.Context, v string) error { - if len(v) != 1 { + if utf8.RuneCountInString(v) != 1 { return fileconnector.ErrInvalidDelimiter } - settings.Delimiter = rune(v[0]) + r, _ := utf8.DecodeRuneInString(v) + settings.Delimiter = r return nil },connectors/file/util.go (1)
52-91:countCSVRowsdoes not handle bare\r(classic Mac) line endings.Lines 72-73 treat
\routside quotes as a non-newline character. Files using bare\ras line endings (no\n) will be counted as a single row. This is acceptable per RFC 4180 (which specifies CRLF), but worth noting if the connector may ingest files from varied sources.connectors/file/connector.go (2)
443-463: Header inference from first document only — subsequent docs with extra keys lose data silently.If a later batch contains documents with keys not present in the first document, those columns are silently dropped since the header was already written. This is a reasonable CSV constraint, but could surprise users with heterogeneous data. Consider logging a warning when a document has keys not in the header.
489-491: Missing blank line beforeWriteUpdates.Line 488 ends
writeCSVand line 489 startsWriteUpdateswith no separator.Nitpick fix
writer.writer.Flush() return writer.writer.Error() } + func (c *connector) WriteUpdates(context.Context, *connect.Request[adiomv1.WriteUpdatesRequest]) (*connect.Response[adiomv1.WriteUpdatesResponse], error) {
| func pathToNamespace(basePath, filePath, extension string) string { | ||
| relPath, err := filepath.Rel(basePath, filePath) | ||
| if err != nil { | ||
| relPath = filepath.Base(filePath) | ||
| } | ||
| relPath = strings.TrimSuffix(relPath, extension) | ||
| relPath = strings.ReplaceAll(relPath, string(filepath.Separator), ".") | ||
| return relPath | ||
| } |
There was a problem hiding this comment.
Namespace ↔ path mapping is ambiguous when filenames contain dots.
A file like /base/my.data.csv produces namespace my.data, which namespaceToPath would resolve back to /base/my/data.csv — a different path. This round-trip asymmetry could cause issues when writing data to a namespace derived from a source file with dots in its name.
🤖 Prompt for AI Agents
In `@connectors/file/util.go` around lines 35 - 43, pathToNamespace currently maps
file names with dots into ambiguous namespaces (e.g., pathToNamespace ->
"my.data") that namespaceToPath can't round-trip; modify the mapping to be
reversible by encoding dots in file names before replacing path separators (or
choose a reversible delimiter scheme), e.g., escape or percent-encode literal
'.' characters from base file names inside pathToNamespace and decode them in
namespaceToPath; update the implementations of pathToNamespace and
namespaceToPath together (referencing the functions pathToNamespace and
namespaceToPath) so that any encoding/escaping rule is applied consistently and
ensures round-trip fidelity between namespace and file path.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@connectors/file/connector.go`:
- Around line 7-8: Remove the invalid top-of-file gosec directive
`//gosec:disable G304` in connectors/file/connector.go (same issue as in
connectors/s3/connector.go); rely on the existing inline suppressions
(`//nolint:gosec`) that are already placed on the specific os.Open/os.Create
calls (the sites around the code that open/create files) instead of a no-op
global directive.
In `@connectors/s3/connector.go`:
- Around line 7-8: The gosec suppression "//gosec:disable G117" is using a
non-existent rule ID (G117) so it is ineffective; update the directive by either
removing the suppression line or replacing "G117" with a valid gosec rule ID
(e.g., "G304" or "G404") that matches the actual false positive, ensuring the
comment text remains "//gosec:disable <VALID_RULE>" and leaving any surrounding
code and behavior unchanged.
🧹 Nitpick comments (2)
connectors/file/connector.go (2)
479-481: Nested/complex values will serialize as Go-syntax strings (e.g.,map[k:v]).When
valis a nested map or slice (common in BSON/JSON data),fmt.Sprintf("%v", val)produces Go's default representation, not valid JSON. Consider usingjson.Marshalfor non-scalar values to produce a more useful output in the CSV cell.♻️ Proposed fix
if val != nil { - row[i] = fmt.Sprintf("%v", val) + switch v := val.(type) { + case string: + row[i] = v + case nil: + // leave empty + default: + // For complex types (maps, slices), use JSON encoding + if b, err := json.Marshal(v); err == nil { + row[i] = string(b) + } else { + row[i] = fmt.Sprintf("%v", v) + } + }
495-506: Flush errors are silently lost during teardown.
csv.Writer.Flush()stores errors internally, retrievable viawriter.writer.Error(). If the flush fails (e.g., disk full), the error is never checked or logged here.♻️ Proposed fix
for ns, writer := range c.writers { writer.writer.Flush() + if err := writer.writer.Error(); err != nil { + slog.Warn("failed to flush CSV writer", "namespace", ns, "err", err) + } if err := writer.file.Close(); err != nil { slog.Warn("failed to close CSV file", "namespace", ns, "err", err) } }
| //gosec:disable G304 -- This is a false positive | ||
|
|
There was a problem hiding this comment.
Same invalid gosec directive syntax as noted in connectors/s3/connector.go.
//gosec:disable G304 is not recognized by gosec. The actual suppression is handled correctly via inline //nolint:gosec comments on the specific os.Open/os.Create calls (lines 223, 291, 429). This top-of-file directive is a no-op.
🤖 Prompt for AI Agents
In `@connectors/file/connector.go` around lines 7 - 8, Remove the invalid
top-of-file gosec directive `//gosec:disable G304` in
connectors/file/connector.go (same issue as in connectors/s3/connector.go); rely
on the existing inline suppressions (`//nolint:gosec`) that are already placed
on the specific os.Open/os.Create calls (the sites around the code that
open/create files) instead of a no-op global directive.
| //gosec:disable G117 -- This is a false positive | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual file and context
cat -n connectors/s3/connector.go | head -20Repository: adiom-data/dsync
Length of output: 446
🏁 Script executed:
# Search for gosec directives in the codebase
rg "//gosec:" --type=go -n
rg "#nosec" --type=go -nRepository: adiom-data/dsync
Length of output: 1113
🏁 Script executed:
# Check for G117 references
rg "G117" --type=go -n
rg -i "g117" --type=go -nRepository: adiom-data/dsync
Length of output: 215
🌐 Web query:
gosec rule IDs complete list official documentation
💡 Result:
Web search failed: Server error: no LLM provider could handle the message
🌐 Web query:
gosec suppression syntax //gosec:disable vs #nosec``
💡 Result:
In gosec, you can suppress findings inline using either #nosec or //gosec:disable.
#nosec (primary / documented form)
- Format:
#nosec [RuleList] [-- Justification] - Placement: on the same line where the issue is reported (or as a block comment to suppress within a section).
- Examples:
InsecureSkipVerify: true, //#nosecG402 -- controlled test environment/*#nosecG401 -- acceptable in this context */(pkg.go.dev)
//gosec:disable (alternative directive)
- Works similarly to
#nosecand supports rule + justification. - Example:
//gosec:disable G101 -- This is a false positive(pkg.go.dev)
Auditing note
You can run gosec while ignoring in-source suppressions via:
gosec -nosec=true ./...(pkg.go.dev)
Citations:
- 1: https://pkg.go.dev/github.com/securego/gosec/v2?utm_source=openai
- 2: https://pkg.go.dev/github.com/securego/gosec/v2?utm_source=openai
- 3: https://pkg.go.dev/github.com/securego/gosec/v2?utm_source=openai
🌐 Web query:
gosec G117 rule identifier what does G117 mean
💡 Result:
In gosec, a rule ID like G### is just the tool’s internal identifier for a specific check. In the official securego/gosec rule list, the “G1xx” series currently includes rules like G101–G115 (with G113 retired), and there is no official rule named G117. [1]
So if you’re seeing G117 in output, it typically means one of these is true:
- You’re not actually running upstream securego/gosec (you’re using a fork/wrapper that adds or renumbers rules).
- Another scanner is emitting a “gosec-compatible” ID that doesn’t exist in upstream gosec.
You can confirm what rules your installed binary knows by checking the upstream list and comparing it to your tool/version. [1]
Sources: official gosec README rule list. [1]
[1] (github.com)
Citations:
🌐 Web query:
complete list gosec rule IDs G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114 G115 G116 G117 G118 G119
💡 Result:
gosec rule IDs G101–G119 (upstream securego/gosec)
| Rule ID | Meaning / status |
|---|---|
| G101 | Look for hard coded credentials (github.com) |
| G102 | Bind to all interfaces (github.com) |
| G103 | Audit the use of unsafe block (github.com) |
| G104 | Audit errors not checked (github.com) |
| G105 | Retired: math/big.Int.Exp (CVE fixed) (github.com) |
| G106 | Audit the use of ssh.InsecureIgnoreHostKey (github.com) |
| G107 | URL provided to HTTP request as taint input (github.com) |
| G108 | Profiling endpoint automatically exposed on /debug/pprof (github.com) |
| G109 | Potential integer overflow via strconv.Atoi result conversion to int16/32 (github.com) |
| G110 | Potential DoS via decompression bomb (github.com) |
| G111 | Potential directory traversal (github.com) |
| G112 | Potential slowloris attack (github.com) |
| G113 | Retired: math/big.Rat.SetString overflow (CVE-2022-23772; old Go versions) (github.com) |
| G114 | Use of net/http serve function without support for setting timeouts (github.com) |
| G115 | Potential integer overflow when converting between integer types (github.com) |
| G116 | Trojan Source / bidirectional Unicode text control characters (scans full file content) (pkg.go.dev) |
| G117 | Not defined in upstream gosec rule list/docs (no current rule with this ID) (github.com) |
| G118 | Not defined in upstream gosec rule list/docs (no current rule with this ID) (github.com) |
| G119 | Not defined in upstream gosec rule list/docs (no current rule with this ID) (github.com) |
Citations:
- 1: https://github.com/securego/gosec?utm_source=openai
- 2: https://github.com/securego/gosec?utm_source=openai
- 3: https://github.com/securego/gosec?utm_source=openai
- 4: https://github.com/securego/gosec?utm_source=openai
- 5: https://github.com/securego/gosec?utm_source=openai
- 6: https://github.com/securego/gosec?utm_source=openai
- 7: https://github.com/securego/gosec?utm_source=openai
- 8: https://github.com/securego/gosec?utm_source=openai
- 9: https://github.com/securego/gosec?utm_source=openai
- 10: https://github.com/securego/gosec?utm_source=openai
- 11: https://github.com/securego/gosec?utm_source=openai
- 12: https://github.com/securego/gosec?utm_source=openai
- 13: https://github.com/securego/gosec?utm_source=openai
- 14: https://github.com/securego/gosec?utm_source=openai
- 15: https://github.com/securego/gosec?utm_source=openai
- 16: https://pkg.go.dev/github.com/securego/gosec/v2/testutils?utm_source=openai
- 17: https://github.com/securego/gosec?utm_source=openai
- 18: https://github.com/securego/gosec?utm_source=openai
- 19: https://github.com/securego/gosec?utm_source=openai
//gosec:disable is valid syntax, but G117 is not a recognized gosec rule.
The //gosec:disable directive format is documented and supported by gosec, so that part of the comment is fine. However, G117 does not exist in the official gosec rule list (which currently ranges from G101–G116, with some retired). This means the suppression is ineffective—it's attempting to suppress a non-existent rule. Either reference a valid rule ID (like G304, G404) or remove the suppression if the false positive is not actually triggered by any real gosec check.
🤖 Prompt for AI Agents
In `@connectors/s3/connector.go` around lines 7 - 8, The gosec suppression
"//gosec:disable G117" is using a non-existent rule ID (G117) so it is
ineffective; update the directive by either removing the suppression line or
replacing "G117" with a valid gosec rule ID (e.g., "G304" or "G404") that
matches the actual false positive, ensuring the comment text remains
"//gosec:disable <VALID_RULE>" and leaving any surrounding code and behavior
unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@connectors/file/connector.go`:
- Around line 516-527: The Teardown method currently calls writer.writer.Flush()
but never checks its error; update connector.Teardown to mirror writeCSV by
calling writer.writer.Flush() then checking writer.writer.Error() and logging
any error (use slog.Warn with namespace and error) before attempting
writer.file.Close(), and continue closing remaining writers; ensure c.writers is
still cleared at the end and reference the csvFileWriter type, the
writer.writer.Flush()/writer.writer.Error() calls, writer.file.Close(), and the
writeCSV behavior as the model for logging/handling.
🧹 Nitpick comments (2)
connectors/file/connector.go (2)
447-472: Header is inferred from only the first document — keys in later documents will be silently dropped.If documents across different
WriteDatabatches have heterogeneous schemas, any key not present in the very first document will never appear in the CSV. The warning on line 481 fires, but data is lost. This is an inherent CSV limitation, but consider documenting this constraint prominently (e.g., in theConnectorSettingsgodoc or the error message itself) so users are not surprised by silent data loss.
480-480: Nit: prefer!writer.headerSet[normalizedKey]over== false.Idiomatic Go for boolean map lookups.
Proposed fix
- if writer.headerSet[normalizedKey] == false { + if !writer.headerSet[normalizedKey] {
| func (c *connector) Teardown() { | ||
| c.writeMutex.Lock() | ||
| defer c.writeMutex.Unlock() | ||
|
|
||
| for ns, writer := range c.writers { | ||
| writer.writer.Flush() | ||
| if err := writer.file.Close(); err != nil { | ||
| slog.Warn("failed to close CSV file", "namespace", ns, "err", err) | ||
| } | ||
| } | ||
| c.writers = make(map[string]*csvFileWriter) | ||
| } |
There was a problem hiding this comment.
Teardown ignores flush errors — data loss can go unnoticed.
writer.writer.Flush() may fail (e.g., disk full), but writer.writer.Error() is never checked. Compare with writeCSV (line 510) which correctly checks the flush error.
Proposed fix
for ns, writer := range c.writers {
writer.writer.Flush()
+ if err := writer.writer.Error(); err != nil {
+ slog.Warn("failed to flush CSV writer", "namespace", ns, "err", err)
+ }
if err := writer.file.Close(); err != nil {
slog.Warn("failed to close CSV file", "namespace", ns, "err", err)
}
}🤖 Prompt for AI Agents
In `@connectors/file/connector.go` around lines 516 - 527, The Teardown method
currently calls writer.writer.Flush() but never checks its error; update
connector.Teardown to mirror writeCSV by calling writer.writer.Flush() then
checking writer.writer.Error() and logging any error (use slog.Warn with
namespace and error) before attempting writer.file.Close(), and continue closing
remaining writers; ensure c.writers is still cleared at the end and reference
the csvFileWriter type, the writer.writer.Flush()/writer.writer.Error() calls,
writer.file.Close(), and the writeCSV behavior as the model for
logging/handling.
4a05b82 to
ef683ab
Compare
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
Style