Skip to content

feat: tx account cleanup extension#127

Merged
songwongtp merged 9 commits into
mainfrom
feat/tx-account-cleanup-extension
Apr 16, 2026
Merged

feat: tx account cleanup extension#127
songwongtp merged 9 commits into
mainfrom
feat/tx-account-cleanup-extension

Conversation

@songwongtp
Copy link
Copy Markdown
Collaborator

@songwongtp songwongtp commented Mar 24, 2026

Summary by CodeRabbit

  • New Features

    • Transaction account cleanup background extension to reconcile account records.
    • New config flag TX_ACCOUNT_CLEANUP (enabled by default) to toggle the feature.
    • Persistent cleanup status tracking (last cleaned sequence, deleted/inserted counters) surfaced in GET /status.
  • Improvements

    • Deduplicated account lookups to reduce redundant work and improve performance.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new tx-account-cleanup extension: config toggle, DB migration and GORM status model, exported address extractor, cache deduplication, extension manager integration, and batch reconciliation logic that reconciles CollectedTxAccount rows against expected accounts per transaction.

Changes

Cohort / File(s) Summary
Configuration
config/config.go, config/txaccountcleanup.go
Added TxAccountCleanupConfig and txAccountCleanupConfig on Config, default env TX_ACCOUNT_CLEANUP=true, and accessor TxAccountCleanupEnabled().
Address & Transaction Collection
indexer/collector/tx/address.go, indexer/collector/tx/collect.go
Exported GrepAddressesFromTx (was grepAddressesFromTx) and changed internal map type to map[string]any; updated collector call site.
Extension Manager Integration
indexer/extension/manager.go
Imported txaccountcleanup extension, instantiate and conditionally append it during manager construction.
TX Account Cleanup Extension
indexer/extension/txaccountcleanup/...
indexer/extension/txaccountcleanup/extension.go, indexer/extension/txaccountcleanup/cleanup.go
New extension tx-account-cleanup with New, Initialize, Run, ProcessBatch, and reconcileTx: reverse-batch processing, per-transaction reconciliation (compute extras/missing/stale signer), DB transactions for deletes/inserts/updates, and status persistence.
Database Schema & Types
orm/migrations/20260324000000_add_tx_account_cleanup_status.sql, types/table.go
Added migration and GORM model CollectedTxAccountCleanupStatus mapping last_cleaned_sequence, deleted_records, inserted_records.
Cache Optimization
util/cache/cache.go
GetOrCreateAccountIds now deduplicates input accounts before cache/DB lookup to avoid redundant work.
Status Endpoint
util/common-handler/status/status.go, util/common-handler/status/types.go
Added optional TxAccountCleanup status fields to StatusResponse, handler reads latest cleanup status, exposes sequence/deleted/inserted, and maintains an atomic in-memory cached status.

Sequence Diagram

sequenceDiagram
    participant Manager as ExtensionManager
    participant Ext as TxAccountCleanupExtension
    participant DB as Database
    participant Addr as AddressExtractor
    participant Cache as AccountIDCache

    Manager->>Ext: New(cfg, logger, db)
    Manager->>Ext: Run(ctx)
    Ext->>DB: Load or create cleanup status
    loop Reverse batches
        Ext->>DB: Query CollectedTx range (startSeq..endSeq)
        Ext->>DB: Load CollectedTxAccounts for range
        loop Per CollectedTx
            Ext->>DB: Load CollectedTx (stored Data)
            Ext->>Addr: GrepAddressesFromTx(events)
            Addr-->>Ext: addresses[]
            Ext->>Cache: GetOrCreateAccountIds(addresses)
            Cache-->>Ext: accountID map
            Ext->>DB: Reconcile (delete extras, insert missing, update signer) in transaction
            DB-->>Ext: deleted/inserted counts
        end
        Ext->>DB: Update cleanup status counters
    end
    Ext-->>Manager: nil or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I nibble through sequences, tidy whiskers keen,
Prune stray accounts and plant the missing green,
Duplicates hop away, caches hum in tune,
Counters saved beneath the moon,
A rabbit's cleanup, tidy and serene.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: tx account cleanup extension' directly and accurately summarizes the main change: adding a new tx account cleanup extension feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/tx-account-cleanup-extension

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
orm/migrations/20260324000000_add_tx_account_cleanup_status.sql (1)

2-6: Consider adding NOT NULL constraints with defaults.

The columns are nullable, but the Go struct CollectedTxAccountCleanupStatus uses non-pointer int64 fields. While the extension initializes with zeros, adding explicit NOT NULL constraints with default values would make the schema more defensive.

Suggested schema improvement
 CREATE TABLE "public"."tx_account_cleanup_status" (
-  "last_cleaned_sequence" bigint NULL,
-  "deleted_records" bigint NULL,
-  "inserted_records" bigint NULL
+  "last_cleaned_sequence" bigint NOT NULL DEFAULT 0,
+  "deleted_records" bigint NOT NULL DEFAULT 0,
+  "inserted_records" bigint NOT NULL DEFAULT 0
 );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@orm/migrations/20260324000000_add_tx_account_cleanup_status.sql` around lines
2 - 6, The table tx_account_cleanup_status currently allows NULLs while the Go
struct CollectedTxAccountCleanupStatus uses non-pointer int64 fields; update the
CREATE TABLE statement for tx_account_cleanup_status to make
last_cleaned_sequence, deleted_records, and inserted_records NOT NULL and set
DEFAULT 0 so the DB representation matches the non-nullable Go types and avoids
surprises on read/insert.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 109-119: ProcessBatch currently returns (deleted, inserted, err)
but hides per-transaction reconcileTx failures while the caller unconditionally
advances status.LastCleanedSequence to endSeq; change the behavior so the caller
only checkpoints up to the highest contiguous successful sequence (or fails the
batch) instead of endSeq: have ProcessBatch return the highestSuccessfulSeq (or
an explicit partial-success error) alongside deleted/inserted and any error,
then in the caller (where ProcessBatch is invoked) set
status.LastCleanedSequence = highestSuccessfulSeq and only call e.updateStatus
when there is at least one contiguous success; alternatively propagate a non-nil
error from ProcessBatch when any reconcileTx failed so the outer loop does not
advance LastCleanedSequence past failures. Ensure you update the signatures and
handling of ProcessBatch, reconcileTx error propagation, and the use of
status.LastCleanedSequence and e.updateStatus accordingly.

---

Nitpick comments:
In `@orm/migrations/20260324000000_add_tx_account_cleanup_status.sql`:
- Around line 2-6: The table tx_account_cleanup_status currently allows NULLs
while the Go struct CollectedTxAccountCleanupStatus uses non-pointer int64
fields; update the CREATE TABLE statement for tx_account_cleanup_status to make
last_cleaned_sequence, deleted_records, and inserted_records NOT NULL and set
DEFAULT 0 so the DB representation matches the non-nullable Go types and avoids
surprises on read/insert.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1146ce37-f06e-4d73-8599-3831fa4cba37

📥 Commits

Reviewing files that changed from the base of the PR and between 09662eb and 6263ac9.

⛔ Files ignored due to path filters (1)
  • orm/migrations/atlas.sum is excluded by !**/*.sum
📒 Files selected for processing (9)
  • config/config.go
  • config/txaccountcleanup.go
  • indexer/collector/tx/address.go
  • indexer/collector/tx/collect.go
  • indexer/extension/manager.go
  • indexer/extension/txaccountcleanup/cleanup.go
  • indexer/extension/txaccountcleanup/extension.go
  • orm/migrations/20260324000000_add_tx_account_cleanup_status.sql
  • types/table.go

Comment thread indexer/extension/txaccountcleanup/extension.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@indexer/extension/txaccountcleanup/cleanup.go`:
- Around line 132-159: The reconciliation currently only checks membership by
AccountId and misses rows with stale Signer flags; after building actualSet
(from actual) iterate actualSet and detect entries where entry.Signer !=
(entry.AccountId == collectedTx.SignerId), collect those accountIds (or
CollectedTxAccount structs) as updateEntries, and apply UPDATEs for those rows
inside the same transaction (using AccountId and Sequence/collectedTx.Sequence
as keys) so existing rows with wrong Signer are corrected; keep the existing
extraIds and missingEntries logic but add this new update path referencing
actualSet, expectedSet, collectedTx, and types.CollectedTxAccount.
- Around line 86-123: The calls to tx.GrepAddressesFromTx,
cache.GetOrCreateAccountIds and evmret.FindRetOnlyAddresses use the base DB
handle so they won’t respect request cancellations; in reconcileTx create a
context-bound DB once (ctxDB := db.WithContext(ctx)) and pass ctxDB instead of
db into those helper functions (replace db in calls to tx.GrepAddressesFromTx,
cache.GetOrCreateAccountIds and the evmret.FindRetOnlyAddresses-related cache
call) so all helper lookups use the parent context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: facddff2-5897-4655-ad9a-88e29bf50edb

📥 Commits

Reviewing files that changed from the base of the PR and between 6263ac9 and 2793965.

📒 Files selected for processing (2)
  • indexer/extension/txaccountcleanup/cleanup.go
  • indexer/extension/txaccountcleanup/extension.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • indexer/extension/txaccountcleanup/extension.go

Comment thread indexer/extension/txaccountcleanup/cleanup.go Outdated
Comment thread indexer/extension/txaccountcleanup/cleanup.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
indexer/extension/txaccountcleanup/extension.go (1)

117-142: ⚠️ Potential issue | 🟠 Major

Partial batch failure checkpoints incorrectly, skipping failed sequences on restart.

When ProcessBatch fails mid-batch (e.g., batch [100, 200] fails at sequence 150), the code:

  1. Returns lastProcessedSeq=149, batchErr!=nil
  2. Condition 149 >= 100 is true, so we enter the checkpoint block
  3. Persists LastCleanedSequence = 99 (startSeq - 1)
  4. Returns with error

On restart, currentSeq = 99, and the next batch processes [1, 99] — sequences 100-200 are never retried.

The checkpoint should only be persisted on full batch success (batchErr == nil), or at minimum, the failed batch should be retried.

🐛 Proposed fix: Only checkpoint on successful batch completion
 		lastProcessedSeq, deleted, inserted, batchErr := ProcessBatch(ctx, e.db.DB, e.logger, startSeq, currentSeq)
 
-		// Checkpoint progress if any sequences were successfully processed
-		if lastProcessedSeq >= startSeq {
+		// Only checkpoint on full batch success to ensure failed sequences are retried
+		if batchErr != nil {
+			return fmt.Errorf("failed to process batch [%d-%d]: %w", startSeq, currentSeq, batchErr)
+		}
+
+		// Batch completed successfully - checkpoint progress
+		if lastProcessedSeq >= startSeq {
 			status.LastCleanedSequence = startSeq - 1
 			status.DeletedRecords += deleted
 			status.InsertedRecords += inserted
 
 			if err := e.updateStatus(ctx, status); err != nil {
 				return fmt.Errorf("failed to update status: %w", err)
 			}
 
 			if deleted > 0 || inserted > 0 {
 				e.logger.Info("tx account cleanup processed batch",
 					slog.Int64("start_sequence", startSeq),
 					slog.Int64("end_sequence", currentSeq),
 					slog.Int64("batch_deleted", deleted),
 					slog.Int64("batch_inserted", inserted))
 			}
 
 			currentSeq = startSeq - 1
 		}
-
-		if batchErr != nil {
-			return fmt.Errorf("failed to process batch [%d-%d]: %w", startSeq, currentSeq, batchErr)
-		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@indexer/extension/txaccountcleanup/extension.go` around lines 117 - 142, The
code is checkpointing progress even when ProcessBatch returned a partial failure
(batchErr != nil), causing failed sequences to be skipped on restart; change the
checkpoint logic so it only persists status and updates currentSeq when the
batch completed successfully — e.g., require batchErr == nil (or require
lastProcessedSeq == currentSeq) in addition to lastProcessedSeq >= startSeq
before setting status.LastCleanedSequence, incrementing
DeletedRecords/InsertedRecords, calling e.updateStatus, logging the batch, and
updating currentSeq; leave error return behavior unchanged so partial failures
are retried on restart.
🧹 Nitpick comments (2)
indexer/extension/txaccountcleanup/cleanup.go (1)

184-194: Consider batching stale signer updates.

The current implementation executes N separate UPDATE queries for N stale signers. While typically small (0-1 entries), this could be optimized with a single query when there are multiple entries to update.

♻️ Optional: Batch update for stale signers

If performance becomes a concern with larger batches:

 // Fix stale signer flags
 if len(staleSigner) > 0 {
-    for _, accountId := range staleSigner {
-        expectedSigner := accountId == collectedTx.SignerId
-        result := txDB.
-            Model(&types.CollectedTxAccount{}).
-            Where("account_id = ? AND sequence = ?", accountId, collectedTx.Sequence).
-            Update("signer", expectedSigner)
-        if result.Error != nil {
-            return fmt.Errorf("failed to update signer flag: %w", result.Error)
-        }
-    }
+    // Since only one account can be the signer, set all to false first, then set the signer to true
+    if err := txDB.
+        Model(&types.CollectedTxAccount{}).
+        Where("account_id IN ? AND sequence = ?", staleSigner, collectedTx.Sequence).
+        Update("signer", false).Error; err != nil {
+        return fmt.Errorf("failed to update signer flags: %w", err)
+    }
+    if collectedTx.SignerId != 0 {
+        for _, accountId := range staleSigner {
+            if accountId == collectedTx.SignerId {
+                if err := txDB.
+                    Model(&types.CollectedTxAccount{}).
+                    Where("account_id = ? AND sequence = ?", accountId, collectedTx.Sequence).
+                    Update("signer", true).Error; err != nil {
+                    return fmt.Errorf("failed to update signer flag: %w", err)
+                }
+                break
+            }
+        }
+    }

     logger.Info("fixed stale signer flags",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@indexer/extension/txaccountcleanup/cleanup.go` around lines 184 - 194, The
loop issues N separate UPDATEs for staleSigner; instead batch them: collect the
account IDs from staleSigner and perform one UPDATE on types.CollectedTxAccount
with Where("account_id IN ? AND sequence = ?", ids, collectedTx.Sequence) to set
signer = false, then (if collectedTx.SignerId is present in the list) run a
single UPDATE to set signer = true for that specific account_id, or
alternatively use a single UPDATE with a SQL expression (e.g., signer =
account_id = ?) via GORM's Expr to set the flag correctly in one query; update
references: staleSigner, collectedTx, txDB, types.CollectedTxAccount, and the
Update("signer", ...) calls.
indexer/extension/txaccountcleanup/extension.go (1)

146-155: Consider using primary key instead of WHERE 1 = 1.

The current approach works for a singleton table, but using the model's primary key would be more conventional.

♻️ Optional: Use primary key for update
 func (e *TxAccountCleanupExtension) updateStatus(ctx context.Context, status *types.CollectedTxAccountCleanupStatus) error {
 	return e.db.WithContext(ctx).
-		Model(&types.CollectedTxAccountCleanupStatus{}).
-		Where("1 = 1").
-		Updates(map[string]any{
-			"last_cleaned_sequence": status.LastCleanedSequence,
-			"deleted_records":       status.DeletedRecords,
-			"inserted_records":      status.InsertedRecords,
-		}).Error
+		Save(status).Error
 }

This assumes status has a valid primary key set from Initialize().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@indexer/extension/txaccountcleanup/extension.go` around lines 146 - 155, The
updateStatus function currently uses a blanket Where("1 = 1"); change it to
update by the model's primary key instead: use the
CollectedTxAccountCleanupStatus primary key from the passed-in status (set
during Initialize()) when calling
e.db.WithContext(ctx).Model(&types.CollectedTxAccountCleanupStatus{}).Where(...).Updates(...)
so the update targets the correct row; reference the status object's primary key
field (e.g., ID or the actual PK field name on
types.CollectedTxAccountCleanupStatus) in the WHERE clause or simply use
Model(status).Updates(...) to let GORM use the PK automatically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 117-142: The code is checkpointing progress even when ProcessBatch
returned a partial failure (batchErr != nil), causing failed sequences to be
skipped on restart; change the checkpoint logic so it only persists status and
updates currentSeq when the batch completed successfully — e.g., require
batchErr == nil (or require lastProcessedSeq == currentSeq) in addition to
lastProcessedSeq >= startSeq before setting status.LastCleanedSequence,
incrementing DeletedRecords/InsertedRecords, calling e.updateStatus, logging the
batch, and updating currentSeq; leave error return behavior unchanged so partial
failures are retried on restart.

---

Nitpick comments:
In `@indexer/extension/txaccountcleanup/cleanup.go`:
- Around line 184-194: The loop issues N separate UPDATEs for staleSigner;
instead batch them: collect the account IDs from staleSigner and perform one
UPDATE on types.CollectedTxAccount with Where("account_id IN ? AND sequence =
?", ids, collectedTx.Sequence) to set signer = false, then (if
collectedTx.SignerId is present in the list) run a single UPDATE to set signer =
true for that specific account_id, or alternatively use a single UPDATE with a
SQL expression (e.g., signer = account_id = ?) via GORM's Expr to set the flag
correctly in one query; update references: staleSigner, collectedTx, txDB,
types.CollectedTxAccount, and the Update("signer", ...) calls.

In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 146-155: The updateStatus function currently uses a blanket
Where("1 = 1"); change it to update by the model's primary key instead: use the
CollectedTxAccountCleanupStatus primary key from the passed-in status (set
during Initialize()) when calling
e.db.WithContext(ctx).Model(&types.CollectedTxAccountCleanupStatus{}).Where(...).Updates(...)
so the update targets the correct row; reference the status object's primary key
field (e.g., ID or the actual PK field name on
types.CollectedTxAccountCleanupStatus) in the WHERE clause or simply use
Model(status).Updates(...) to let GORM use the PK automatically.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 176990bd-39ed-4c8a-a73b-b6d3ef3b5ade

📥 Commits

Reviewing files that changed from the base of the PR and between 2793965 and 97c8e1a.

📒 Files selected for processing (4)
  • indexer/collector/tx/address.go
  • indexer/extension/txaccountcleanup/cleanup.go
  • indexer/extension/txaccountcleanup/extension.go
  • util/cache/cache.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • indexer/collector/tx/address.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@indexer/extension/txaccountcleanup/cleanup.go`:
- Around line 90-119: cache.GetOrCreateAccountIds is being called with
createNew=false which can silently drop unresolved addresses and cause valid
types.CollectedTxAccount rows to be flagged as "extra" and deleted; before
building expectedSet/extraIds (around use of accountIdMap, expectedSet,
actualSet, extraIds and collectedTx.SignerId) detect partial resolution (e.g.
any requested address missing from accountIdMap or mapped to 0) and abort or
skip the delete-reconciliation path (return without deleting or re-run
resolution with createNew=true/log and bail) so deletion only proceeds when all
input addresses are resolved.

In `@indexer/extension/txaccountcleanup/extension.go`:
- Around line 142-150: The updateStatus function currently issues a table-wide
update by using Where("1 = 1") on
Model(&types.CollectedTxAccountCleanupStatus{}); change it to target the
specific status row instead of all rows — e.g., scope the WHERE to the primary
key of types.CollectedTxAccountCleanupStatus (use the appropriate key field from
the status instance, e.g. status.ID or status.SomePrimaryKey) or call an
instance-scoped update/Save on the status object so only that row is updated;
update the Updates call to include the same fields (LastCleanedSequence,
DeletedRecords, InsertedRecords) but with a WHERE that filters by the primary
key.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b823a108-3369-49e0-8b73-9b71b04110ef

📥 Commits

Reviewing files that changed from the base of the PR and between 97c8e1a and 8e9248f.

📒 Files selected for processing (2)
  • indexer/extension/txaccountcleanup/cleanup.go
  • indexer/extension/txaccountcleanup/extension.go

Comment thread indexer/extension/txaccountcleanup/cleanup.go
Comment thread indexer/extension/txaccountcleanup/extension.go
@songwongtp songwongtp requested a review from Benzbeeb March 25, 2026 08:38
Comment thread indexer/extension/txaccountcleanup/cleanup.go
@songwongtp songwongtp force-pushed the feat/tx-account-cleanup-extension branch from 9c07ec7 to e2b9d02 Compare April 8, 2026 10:11
@songwongtp songwongtp merged commit ac08c8c into main Apr 16, 2026
6 checks passed
@songwongtp songwongtp deleted the feat/tx-account-cleanup-extension branch April 16, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants