Skip to content

fix: coherent demo data — compliance, audit trail, currency, charts#33

Merged
jeffgicharu merged 2 commits into
mainfrom
fix/data-realism-reseed
May 19, 2026
Merged

fix: coherent demo data — compliance, audit trail, currency, charts#33
jeffgicharu merged 2 commits into
mainfrom
fix/data-realism-reseed

Conversation

@jeffgicharu
Copy link
Copy Markdown
Owner

Why

A visitor judging the live demo currently hits numbers that read as bugs:

Screen Before After
Dashboard 0% Compliance Rate ~85% (active/suspended only)
Audit Log 5 rows, all notifications / read ~360 events over ~5 months, 7 entity types
Settings Default Currency EUR while every amount renders $ USD, consistent with display
Contractor → Overview vs Documents "W-9 on file" while Documents tab shows it Expired Cannot disagree (expiry-aware) + coherent seed
Dashboard revenue chart Tapers to $0 (only months with data returned) Clean fixed 6-month spine, zero-filled
Dashboard risk distribution Low 1 / Medium 23 / High 1 (stale) Realistic spread (most low, few high/critical)

These were verified against the live site before this change (0% compliance, EUR currency, audit log with only "read" rows confirmed via screenshots).

What

  • Compliance: generateContractorDocuments builds a coherent per-contractor document set with a deterministic ~85/12/13 compliant/expiring/missing distribution. getComplianceReport scoped to engaged (active/suspended) contractors — onboarding contractors are not "non-compliant".
  • W-9 consistency: contractors.repository documentStatus now treats an expired doc as not-on-file, so Overview and Documents tabs can never contradict.
  • Audit trail: new generateAuditEvents — believable ~5-month stream (logins, full invoice lifecycle pinned to invoice timestamps, contractor/document/offboarding/settings/classification), real actors, no future-dated rows.
  • Currency: full seed sets org defaultCurrency to USD.
  • Invoices: generateInvoice rebuilt so the submit→approve→schedule→pay chain always lands in the past (no future paid_at); net-30 due date.
  • Charts: monthly revenue / contractor-growth / contractor-earnings zero-fill a fixed N-month spine via generate_series.
  • Tooling: seed:demo-accounts script + README documents the two-step reseed.

Verification

  • Local fresh Postgres reseed: 31 onboarded → 27 compliant (87%); 0 W-9 Overview/Documents contradictions; 0 future-dated invoice/audit rows; 0 invoice lifecycle order violations; audit spans ~5 months with 7 entity types; revenue spine returns all 6 months populated.
  • tsc, eslint, and affected unit suites (documents/contractors/organizations/audit/classification — 187 tests) green.
  • AFTER live screenshots will be captured post-deploy reseed and included in the final verification report.

The demo seed rebuilds every org, so after pnpm --filter @contractor-os/api seed the idempotent seed:demo-accounts must be re-run (documented; required to keep the E2E live-smoke accounts).

The seeded demo dataset produced several numbers that read as bugs rather
than a working business. This reseeds the demo data so every screen tells
one coherent story, and hardens a few queries so the views can't disagree.

Compliance
- Documents were assigned random types across contractors, so almost no
  contractor held both required documents and the dashboard showed
  "0% Compliance Rate". Documents are now generated as a coherent per-
  contractor set with a deterministic ~85% compliant / ~12% expiring /
  ~13% missing-one-doc distribution (stable across reseeds).
- getComplianceReport is scoped to active/suspended contractors: someone
  still in onboarding is not "non-compliant", they're just not done.

W-9 consistency
- contractors.repository documentStatus now treats an expired document as
  not-on-file, so the Overview tab can no longer say "W-9 on file" while
  the Documents tab shows it "Expired". Seed data is coherent on both
  views (compliant = far-future expiry; expiring = inside 30 days).

Audit trail
- audit_events was empty except for "notifications / read". A believable
  ~5-month activity stream is now seeded: staff logins, the full invoice
  lifecycle (pinned to each invoice's own timestamps so nothing is "paid"
  before it was "submitted"), contractor/document/offboarding/settings/
  classification events, with real actors and no future-dated rows.

Currency
- The full seed sets the organization's default currency to USD, matching
  the USD-everywhere display formatting.

Invoices & charts
- generateInvoice anchored the lifecycle off a small offset, so recent
  "paid" invoices could land a future paid_at. The chain is now built
  with guaranteed past headroom and a net-30 due date.
- Monthly revenue / contractor-growth / contractor-earnings queries only
  returned months that had data, making the dashboard revenue chart taper
  to $0. They now zero-fill a fixed N-month spine via generate_series.

Tooling
- Add seed:demo-accounts script and document the two-step reseed (the full
  seed rebuilds every org, so the idempotent demo/E2E accounts must be
  re-seeded immediately after).
Comment thread apps/api/src/database/seeds/fixtures/generators.ts Fixed
Comment thread apps/api/src/database/seeds/fixtures/generators.ts Fixed
Comment thread apps/api/src/database/seeds/fixtures/generators.ts Fixed
Comment thread apps/api/src/database/seeds/fixtures/generators.ts Fixed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the demo data seeding process by introducing more realistic invoice lifecycles, coherent document compliance distributions, and a comprehensive audit trail. It also improves dashboard data consistency by using SQL spines to ensure months with zero activity are represented and updates repository queries to better handle document expiration and contractor statuses. My feedback focuses on improving the robustness of the seeding logic by addressing brittle array indexing, replacing unsafe non-null assertions with explicit error handling, and ensuring data coherence in the audit logs.

createdDaysAgo: 430 + (idx % 60),
}),
);
docs[0]!.version = 2;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing the document by a hardcoded index docs[0] is brittle and assumes the tax document is always the first element pushed to the array in the compliant bucket. It's safer to find the document by type or keep a reference to the object when it's created.

      const taxDoc = docs.find((d) => d.documentType === taxType);
      if (taxDoc) taxDoc.version = 2;

Comment on lines +430 to +431
const admin = users.find((u) => u.role === 'admin')!;
const manager = users.find((u) => u.role === 'manager')!;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using non-null assertions (!) on the results of find can cause the seed script to crash with a TypeError if the expected users are missing from the input array. Adding an explicit check with a descriptive error message would improve the robustness of the seeding process.

Suggested change
const admin = users.find((u) => u.role === 'admin')!;
const manager = users.find((u) => u.role === 'manager')!;
const admin = users.find((u) => u.role === 'admin');
const manager = users.find((u) => u.role === 'manager');
if (!admin || !manager) {
throw new Error('Audit seed requires both an admin and a manager user');
}

userId: admin.id,
entityType: 'organizations',
entityId: admin.id,
action: 'update',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The entityId for the organization update event is currently set to admin.id. For data coherence in the audit log, this should ideally be the actual organization ID.

CodeQL flagged the new seed code's Math.random() as insecure randomness.
It's seed-only fixture data, but route all fixture randomness through
crypto.randomInt (via the existing randomPick/randomBetween helpers and a
new randomBool) so static analysis is clean and the alert can't recur.
@jeffgicharu jeffgicharu merged commit 712e583 into main May 19, 2026
19 checks passed
@jeffgicharu jeffgicharu deleted the fix/data-realism-reseed branch May 19, 2026 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants