Skip to content

perf: replace O(n²) transaction matching with O(n) Map/Set lookups#205

Merged
hoiekim merged 4 commits intohoiekim:mainfrom
moltboie:perf/fix-on2-transaction-matching-200
Mar 28, 2026
Merged

perf: replace O(n²) transaction matching with O(n) Map/Set lookups#205
hoiekim merged 4 commits intohoiekim:mainfrom
moltboie:perf/fix-on2-transaction-matching-200

Conversation

@moltboie
Copy link
Copy Markdown
Contributor

Summary

Replaces nested Array.find() inside forEach loops with pre-built Map/Set lookups in the Plaid and SimpleFin sync paths.

Changes

sync-simple-fin.ts

  • getRemovedTransactions: pre-build Set<transaction_id> → O(1) lookup
  • getRemovedInvestmentTransactions: pre-build Set<investment_transaction_id> → O(1) lookup

sync-plaid.ts

  • modelize: pre-build three maps before added.map(modelize):
    • Map<transaction_id, JSONTransaction>
    • Map<pending_transaction_id, JSONTransaction>
    • Map<${account_id}:${name}:${amount}, JSONTransaction> (for the compound-key fallback)
  • Investment removed detection: pre-build Set<investment_transaction_id> → O(1) lookup

Impact

With hundreds of transactions per sync and multi-year history, the O(n²) pattern could cause noticeable slowdown. The Maps/Sets are built once per sync call (O(n)) rather than for each incoming transaction.

Testing

  • bun run build passes ✅

Closes #200

@moltboie
Copy link
Copy Markdown
Contributor Author

Self-Review

Discussion thread status:

  • New PR. No prior feedback. Performance fix replacing O(n²) Array.find() loops with O(n) Map/Set lookups in Plaid and SimpleFin sync.

Checked:

  • Logic: Three separate O(n²) → O(n) conversions. Maps are built once before the inner loop runs — correct.
  • Plaid transaction matching: Original used find() with a compound OR condition (idMatches || (accountMatches && nameMatches && amountMatches)). New version uses three Map lookups with ?? chaining — semantically equivalent. Correct.
  • Plaid investment transactions: find() replaced by Set.has(). Correct.
  • SimpleFin getRemovedTransactions: find() replaced by Set.has(). Correct.
  • SimpleFin getRemovedInvestmentTransactions: Same pattern. Correct.
  • Label preservation: if (existing) result.label = existing.label is unchanged — no behavioral regression.
  • Edge cases: Duplicate transaction_id values in storedTransactions would cause Map entries to be overwritten (last-write-wins). The original find() would return the first match. This is a potential subtle behavior change, but duplicate IDs shouldn't exist in practice (unique constraint in DB).
  • Types: Clean. No any.

E2E Testing:

  • No functional change — performance optimization only. Verified logic equivalence by reading both old and new code. CI passing.

Issues found:

  • Theoretical: Map overwrites on duplicate transaction_id (vs find returning first match). Not a real concern given DB uniqueness constraints.

Confidence: High

@hoiekim
Copy link
Copy Markdown
Owner

hoiekim commented Mar 26, 2026

@moltboie Add unit testing to ensure the same behavior

@moltboie moltboie force-pushed the perf/fix-on2-transaction-matching-200 branch from 29f749b to 160b2d0 Compare March 26, 2026 08:39
@moltboie
Copy link
Copy Markdown
Contributor Author

Added unit tests. Exported getRemovedTransactions and getRemovedInvestmentTransactions and added 12 tests in sync-simple-fin.test.ts covering: both items present, removals identified, items before startDate ignored, items from different accounts ignored, multiple removals, empty stored list. All 12 pass.

Comment thread src/server/lib/compute-tools/sync-plaid.ts
moltboie and others added 2 commits March 27, 2026 21:48
getRemovedTransactions and getRemovedInvestmentTransactions in
sync-simple-fin.ts used Array.find() inside forEach, making them O(n²).
The Plaid modelize and investment matching had the same pattern.

Fix: pre-build lookup Maps/Sets before the loops so each lookup is O(1).

- sync-simple-fin.ts: Set<transaction_id> and Set<investment_transaction_id>
- sync-plaid.ts modelize: Map by transaction_id, pending_transaction_id,
  and compound key (account_id:name:amount) for the fallback case
- sync-plaid.ts investments: Set<investment_transaction_id>

Closes hoiekim#200
…entTransactions

Exports both functions and adds 12 tests covering:
- No removals (all present)
- Identifying removed items
- Ignoring items before startDate
- Ignoring items from accounts not in incoming
- Multiple removals
- Empty stored list
@moltboie moltboie force-pushed the perf/fix-on2-transaction-matching-200 branch from 160b2d0 to ed47eed Compare March 28, 2026 04:48
investmentTransactions is PlaidInvestmentTransaction[] but the function
expects JSONInvestmentTransaction[]. filledInvestments is the correctly
typed mapped version (date strings filled in).
@hoiekim hoiekim merged commit a91c103 into hoiekim:main Mar 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: O(n²) transaction matching in sync causes slowdown with large datasets

2 participants