Skip to content

[Slice 3] Sync/Query Performance Optimization #64

@sneg55

Description

@sneg55

Summary

Performance optimization slice for sync/scoring paths and expensive contact-list endpoints.

Why this slice

Current sync jobs and list endpoints contain avoidable N-query and in-memory filtering patterns that will degrade as tenant size grows.

Problem statements

  • Sync tasks recompute scores contact-by-contact even though batch scorer exists.
    • backend/app/services/task_jobs/gmail.py
    • backend/app/services/task_jobs/google.py
    • backend/app/services/task_jobs/twitter.py
    • backend/app/services/scoring.py (batch_update_scores already available)
  • Birthdays/overdue APIs load large contact sets then filter in Python.
    • backend/app/api/contacts_routes/listing.py
  • LinkedIn push does full contact preload into memory per request.
    • backend/app/api/linkedin.py

Scope

  • Replace per-contact rescoring loops with batched update paths where semantically safe.
  • Move birthdays/overdue selection and sorting into SQL.
  • Reduce LinkedIn push matching overhead with narrowed prefetch/index-driven lookup strategy.
  • Add lightweight metrics/logging for sync runtime and rows processed.

Out of scope

  • Re-architecting whole sync pipeline.
  • New queueing system.

Work items

  • Update Gmail/Google/Twitter task jobs to use batch scoring strategy.
  • Add SQL-based birthday and overdue queries with tested ordering semantics.
  • Optimize LinkedIn push matching path for large contact sets.
  • Add timing and row-count instrumentation per sync job.
  • Benchmark before/after on representative datasets.
  • Add regression tests for correctness + query behavior.

Acceptance criteria

  • Sync jobs avoid per-contact score loops in targeted paths.
  • Birthday/overdue endpoints no longer require full in-memory filtering.
  • LinkedIn push memory footprint and latency improve on large-user fixtures.
  • No behavior regression in response ordering and filtering.

Test plan

  • Backend tests for scoring and listing endpoints.
  • Add/expand performance-focused tests or benchmark scripts.
  • Manual profiling with seeded dataset sizes (small/medium/large).

Risks / rollout

  • Query rewrites can subtly change ordering/edge-case semantics.
  • Use snapshot and golden-case tests before replacing existing paths.

Definition of done

  • Measurable runtime/query-count reductions documented in PR notes.
  • Correctness parity validated by tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions