feat: tx account cleanup extension#127
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new tx-account-cleanup extension: config toggle, DB migration and GORM status model, exported address extractor, cache deduplication, extension manager integration, and batch reconciliation logic that reconciles CollectedTxAccount rows against expected accounts per transaction. Changes
Sequence DiagramsequenceDiagram
participant Manager as ExtensionManager
participant Ext as TxAccountCleanupExtension
participant DB as Database
participant Addr as AddressExtractor
participant Cache as AccountIDCache
Manager->>Ext: New(cfg, logger, db)
Manager->>Ext: Run(ctx)
Ext->>DB: Load or create cleanup status
loop Reverse batches
Ext->>DB: Query CollectedTx range (startSeq..endSeq)
Ext->>DB: Load CollectedTxAccounts for range
loop Per CollectedTx
Ext->>DB: Load CollectedTx (stored Data)
Ext->>Addr: GrepAddressesFromTx(events)
Addr-->>Ext: addresses[]
Ext->>Cache: GetOrCreateAccountIds(addresses)
Cache-->>Ext: accountID map
Ext->>DB: Reconcile (delete extras, insert missing, update signer) in transaction
DB-->>Ext: deleted/inserted counts
end
Ext->>DB: Update cleanup status counters
end
Ext-->>Manager: nil or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
orm/migrations/20260324000000_add_tx_account_cleanup_status.sql (1)
2-6: Consider adding NOT NULL constraints with defaults.The columns are nullable, but the Go struct
CollectedTxAccountCleanupStatususes non-pointerint64fields. While the extension initializes with zeros, adding explicit NOT NULL constraints with default values would make the schema more defensive.Suggested schema improvement
CREATE TABLE "public"."tx_account_cleanup_status" ( - "last_cleaned_sequence" bigint NULL, - "deleted_records" bigint NULL, - "inserted_records" bigint NULL + "last_cleaned_sequence" bigint NOT NULL DEFAULT 0, + "deleted_records" bigint NOT NULL DEFAULT 0, + "inserted_records" bigint NOT NULL DEFAULT 0 );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@orm/migrations/20260324000000_add_tx_account_cleanup_status.sql` around lines 2 - 6, The table tx_account_cleanup_status currently allows NULLs while the Go struct CollectedTxAccountCleanupStatus uses non-pointer int64 fields; update the CREATE TABLE statement for tx_account_cleanup_status to make last_cleaned_sequence, deleted_records, and inserted_records NOT NULL and set DEFAULT 0 so the DB representation matches the non-nullable Go types and avoids surprises on read/insert.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 109-119: ProcessBatch currently returns (deleted, inserted, err)
but hides per-transaction reconcileTx failures while the caller unconditionally
advances status.LastCleanedSequence to endSeq; change the behavior so the caller
only checkpoints up to the highest contiguous successful sequence (or fails the
batch) instead of endSeq: have ProcessBatch return the highestSuccessfulSeq (or
an explicit partial-success error) alongside deleted/inserted and any error,
then in the caller (where ProcessBatch is invoked) set
status.LastCleanedSequence = highestSuccessfulSeq and only call e.updateStatus
when there is at least one contiguous success; alternatively propagate a non-nil
error from ProcessBatch when any reconcileTx failed so the outer loop does not
advance LastCleanedSequence past failures. Ensure you update the signatures and
handling of ProcessBatch, reconcileTx error propagation, and the use of
status.LastCleanedSequence and e.updateStatus accordingly.
---
Nitpick comments:
In `@orm/migrations/20260324000000_add_tx_account_cleanup_status.sql`:
- Around line 2-6: The table tx_account_cleanup_status currently allows NULLs
while the Go struct CollectedTxAccountCleanupStatus uses non-pointer int64
fields; update the CREATE TABLE statement for tx_account_cleanup_status to make
last_cleaned_sequence, deleted_records, and inserted_records NOT NULL and set
DEFAULT 0 so the DB representation matches the non-nullable Go types and avoids
surprises on read/insert.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1146ce37-f06e-4d73-8599-3831fa4cba37
⛔ Files ignored due to path filters (1)
orm/migrations/atlas.sumis excluded by!**/*.sum
📒 Files selected for processing (9)
config/config.goconfig/txaccountcleanup.goindexer/collector/tx/address.goindexer/collector/tx/collect.goindexer/extension/manager.goindexer/extension/txaccountcleanup/cleanup.goindexer/extension/txaccountcleanup/extension.goorm/migrations/20260324000000_add_tx_account_cleanup_status.sqltypes/table.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@indexer/extension/txaccountcleanup/cleanup.go`:
- Around line 132-159: The reconciliation currently only checks membership by
AccountId and misses rows with stale Signer flags; after building actualSet
(from actual) iterate actualSet and detect entries where entry.Signer !=
(entry.AccountId == collectedTx.SignerId), collect those accountIds (or
CollectedTxAccount structs) as updateEntries, and apply UPDATEs for those rows
inside the same transaction (using AccountId and Sequence/collectedTx.Sequence
as keys) so existing rows with wrong Signer are corrected; keep the existing
extraIds and missingEntries logic but add this new update path referencing
actualSet, expectedSet, collectedTx, and types.CollectedTxAccount.
- Around line 86-123: The calls to tx.GrepAddressesFromTx,
cache.GetOrCreateAccountIds and evmret.FindRetOnlyAddresses use the base DB
handle so they won’t respect request cancellations; in reconcileTx create a
context-bound DB once (ctxDB := db.WithContext(ctx)) and pass ctxDB instead of
db into those helper functions (replace db in calls to tx.GrepAddressesFromTx,
cache.GetOrCreateAccountIds and the evmret.FindRetOnlyAddresses-related cache
call) so all helper lookups use the parent context.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: facddff2-5897-4655-ad9a-88e29bf50edb
📒 Files selected for processing (2)
indexer/extension/txaccountcleanup/cleanup.goindexer/extension/txaccountcleanup/extension.go
🚧 Files skipped from review as they are similar to previous changes (1)
- indexer/extension/txaccountcleanup/extension.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
indexer/extension/txaccountcleanup/extension.go (1)
117-142:⚠️ Potential issue | 🟠 MajorPartial batch failure checkpoints incorrectly, skipping failed sequences on restart.
When
ProcessBatchfails mid-batch (e.g., batch [100, 200] fails at sequence 150), the code:
- Returns
lastProcessedSeq=149,batchErr!=nil- Condition
149 >= 100is true, so we enter the checkpoint block- Persists
LastCleanedSequence = 99(startSeq - 1)- Returns with error
On restart,
currentSeq = 99, and the next batch processes[1, 99]— sequences 100-200 are never retried.The checkpoint should only be persisted on full batch success (
batchErr == nil), or at minimum, the failed batch should be retried.🐛 Proposed fix: Only checkpoint on successful batch completion
lastProcessedSeq, deleted, inserted, batchErr := ProcessBatch(ctx, e.db.DB, e.logger, startSeq, currentSeq) - // Checkpoint progress if any sequences were successfully processed - if lastProcessedSeq >= startSeq { + // Only checkpoint on full batch success to ensure failed sequences are retried + if batchErr != nil { + return fmt.Errorf("failed to process batch [%d-%d]: %w", startSeq, currentSeq, batchErr) + } + + // Batch completed successfully - checkpoint progress + if lastProcessedSeq >= startSeq { status.LastCleanedSequence = startSeq - 1 status.DeletedRecords += deleted status.InsertedRecords += inserted if err := e.updateStatus(ctx, status); err != nil { return fmt.Errorf("failed to update status: %w", err) } if deleted > 0 || inserted > 0 { e.logger.Info("tx account cleanup processed batch", slog.Int64("start_sequence", startSeq), slog.Int64("end_sequence", currentSeq), slog.Int64("batch_deleted", deleted), slog.Int64("batch_inserted", inserted)) } currentSeq = startSeq - 1 } - - if batchErr != nil { - return fmt.Errorf("failed to process batch [%d-%d]: %w", startSeq, currentSeq, batchErr) - } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@indexer/extension/txaccountcleanup/extension.go` around lines 117 - 142, The code is checkpointing progress even when ProcessBatch returned a partial failure (batchErr != nil), causing failed sequences to be skipped on restart; change the checkpoint logic so it only persists status and updates currentSeq when the batch completed successfully — e.g., require batchErr == nil (or require lastProcessedSeq == currentSeq) in addition to lastProcessedSeq >= startSeq before setting status.LastCleanedSequence, incrementing DeletedRecords/InsertedRecords, calling e.updateStatus, logging the batch, and updating currentSeq; leave error return behavior unchanged so partial failures are retried on restart.
🧹 Nitpick comments (2)
indexer/extension/txaccountcleanup/cleanup.go (1)
184-194: Consider batching stale signer updates.The current implementation executes N separate UPDATE queries for N stale signers. While typically small (0-1 entries), this could be optimized with a single query when there are multiple entries to update.
♻️ Optional: Batch update for stale signers
If performance becomes a concern with larger batches:
// Fix stale signer flags if len(staleSigner) > 0 { - for _, accountId := range staleSigner { - expectedSigner := accountId == collectedTx.SignerId - result := txDB. - Model(&types.CollectedTxAccount{}). - Where("account_id = ? AND sequence = ?", accountId, collectedTx.Sequence). - Update("signer", expectedSigner) - if result.Error != nil { - return fmt.Errorf("failed to update signer flag: %w", result.Error) - } - } + // Since only one account can be the signer, set all to false first, then set the signer to true + if err := txDB. + Model(&types.CollectedTxAccount{}). + Where("account_id IN ? AND sequence = ?", staleSigner, collectedTx.Sequence). + Update("signer", false).Error; err != nil { + return fmt.Errorf("failed to update signer flags: %w", err) + } + if collectedTx.SignerId != 0 { + for _, accountId := range staleSigner { + if accountId == collectedTx.SignerId { + if err := txDB. + Model(&types.CollectedTxAccount{}). + Where("account_id = ? AND sequence = ?", accountId, collectedTx.Sequence). + Update("signer", true).Error; err != nil { + return fmt.Errorf("failed to update signer flag: %w", err) + } + break + } + } + } logger.Info("fixed stale signer flags",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@indexer/extension/txaccountcleanup/cleanup.go` around lines 184 - 194, The loop issues N separate UPDATEs for staleSigner; instead batch them: collect the account IDs from staleSigner and perform one UPDATE on types.CollectedTxAccount with Where("account_id IN ? AND sequence = ?", ids, collectedTx.Sequence) to set signer = false, then (if collectedTx.SignerId is present in the list) run a single UPDATE to set signer = true for that specific account_id, or alternatively use a single UPDATE with a SQL expression (e.g., signer = account_id = ?) via GORM's Expr to set the flag correctly in one query; update references: staleSigner, collectedTx, txDB, types.CollectedTxAccount, and the Update("signer", ...) calls.indexer/extension/txaccountcleanup/extension.go (1)
146-155: Consider using primary key instead ofWHERE 1 = 1.The current approach works for a singleton table, but using the model's primary key would be more conventional.
♻️ Optional: Use primary key for update
func (e *TxAccountCleanupExtension) updateStatus(ctx context.Context, status *types.CollectedTxAccountCleanupStatus) error { return e.db.WithContext(ctx). - Model(&types.CollectedTxAccountCleanupStatus{}). - Where("1 = 1"). - Updates(map[string]any{ - "last_cleaned_sequence": status.LastCleanedSequence, - "deleted_records": status.DeletedRecords, - "inserted_records": status.InsertedRecords, - }).Error + Save(status).Error }This assumes
statushas a valid primary key set fromInitialize().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@indexer/extension/txaccountcleanup/extension.go` around lines 146 - 155, The updateStatus function currently uses a blanket Where("1 = 1"); change it to update by the model's primary key instead: use the CollectedTxAccountCleanupStatus primary key from the passed-in status (set during Initialize()) when calling e.db.WithContext(ctx).Model(&types.CollectedTxAccountCleanupStatus{}).Where(...).Updates(...) so the update targets the correct row; reference the status object's primary key field (e.g., ID or the actual PK field name on types.CollectedTxAccountCleanupStatus) in the WHERE clause or simply use Model(status).Updates(...) to let GORM use the PK automatically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 117-142: The code is checkpointing progress even when ProcessBatch
returned a partial failure (batchErr != nil), causing failed sequences to be
skipped on restart; change the checkpoint logic so it only persists status and
updates currentSeq when the batch completed successfully — e.g., require
batchErr == nil (or require lastProcessedSeq == currentSeq) in addition to
lastProcessedSeq >= startSeq before setting status.LastCleanedSequence,
incrementing DeletedRecords/InsertedRecords, calling e.updateStatus, logging the
batch, and updating currentSeq; leave error return behavior unchanged so partial
failures are retried on restart.
---
Nitpick comments:
In `@indexer/extension/txaccountcleanup/cleanup.go`:
- Around line 184-194: The loop issues N separate UPDATEs for staleSigner;
instead batch them: collect the account IDs from staleSigner and perform one
UPDATE on types.CollectedTxAccount with Where("account_id IN ? AND sequence =
?", ids, collectedTx.Sequence) to set signer = false, then (if
collectedTx.SignerId is present in the list) run a single UPDATE to set signer =
true for that specific account_id, or alternatively use a single UPDATE with a
SQL expression (e.g., signer = account_id = ?) via GORM's Expr to set the flag
correctly in one query; update references: staleSigner, collectedTx, txDB,
types.CollectedTxAccount, and the Update("signer", ...) calls.
In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 146-155: The updateStatus function currently uses a blanket
Where("1 = 1"); change it to update by the model's primary key instead: use the
CollectedTxAccountCleanupStatus primary key from the passed-in status (set
during Initialize()) when calling
e.db.WithContext(ctx).Model(&types.CollectedTxAccountCleanupStatus{}).Where(...).Updates(...)
so the update targets the correct row; reference the status object's primary key
field (e.g., ID or the actual PK field name on
types.CollectedTxAccountCleanupStatus) in the WHERE clause or simply use
Model(status).Updates(...) to let GORM use the PK automatically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 176990bd-39ed-4c8a-a73b-b6d3ef3b5ade
📒 Files selected for processing (4)
indexer/collector/tx/address.goindexer/extension/txaccountcleanup/cleanup.goindexer/extension/txaccountcleanup/extension.goutil/cache/cache.go
🚧 Files skipped from review as they are similar to previous changes (1)
- indexer/collector/tx/address.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@indexer/extension/txaccountcleanup/cleanup.go`:
- Around line 90-119: cache.GetOrCreateAccountIds is being called with
createNew=false which can silently drop unresolved addresses and cause valid
types.CollectedTxAccount rows to be flagged as "extra" and deleted; before
building expectedSet/extraIds (around use of accountIdMap, expectedSet,
actualSet, extraIds and collectedTx.SignerId) detect partial resolution (e.g.
any requested address missing from accountIdMap or mapped to 0) and abort or
skip the delete-reconciliation path (return without deleting or re-run
resolution with createNew=true/log and bail) so deletion only proceeds when all
input addresses are resolved.
In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 142-150: The updateStatus function currently issues a table-wide
update by using Where("1 = 1") on
Model(&types.CollectedTxAccountCleanupStatus{}); change it to target the
specific status row instead of all rows — e.g., scope the WHERE to the primary
key of types.CollectedTxAccountCleanupStatus (use the appropriate key field from
the status instance, e.g. status.ID or status.SomePrimaryKey) or call an
instance-scoped update/Save on the status object so only that row is updated;
update the Updates call to include the same fields (LastCleanedSequence,
DeletedRecords, InsertedRecords) but with a WHERE that filters by the primary
key.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b823a108-3369-49e0-8b73-9b71b04110ef
📒 Files selected for processing (2)
indexer/extension/txaccountcleanup/cleanup.goindexer/extension/txaccountcleanup/extension.go
… checkpointing partial progress in cleanup.
…se, fixing stale signer flags, and centralizing address deduplication.
…optimize stale signer flag corrections.
…and implement logic to retrieve and expose them.
9c07ec7 to
e2b9d02
Compare
Summary by CodeRabbit
New Features
Improvements