Baseline is a political intelligence platform that tracks 106 public figures using 4 independent AI providers. Each statement goes through structured extraction (Gemini), then independent analysis by three providers (Claude, GPT, Grok). A consensus engine reconciles the results and flags disagreements between models. Built solo using a documented multi-model orchestration methodology across 500+ build sessions.
- Architecture documentation and system design (SYSTEM_DESIGN.md, ARCHITECTURE.mermaid)
- 10 Flutter screens with full UI logic (src/flutter/screens/)
- 15 custom widgets including hand-painted icon sets, CustomPainter data visualizations, and animated components (src/flutter/widgets/)
- 9 data models with defensive JSON parsing, factory constructors, and null safety (src/flutter/models/)
- 5 service classes showing Supabase RPC integration patterns (src/flutter/services/)
- 2 state management providers (src/flutter/providers/)
- All 22 Supabase Edge Functions: analysis pipeline, consensus engine, legislative tracking, feed API, subscription management (src/edge-functions/)
- 30 PostgreSQL migrations showing schema evolution from MVP to production (src/sql/)
- App configuration: constants, routing, theme tokens (src/config/)
This is a public showcase representing the majority of the production codebase. Remaining private code is available for hiring review.
- AI prompt text sent to each provider (redacted in-place, control flow visible)
- Consensus scoring weights and threshold values (redacted as constants)
- 18 additional Flutter screens (settings, onboarding, auth, paywall)
- Production environment configuration (API keys, Supabase URLs)
- Trademarked icon assets
Contact preston@baseline.marketing for private repo access.
Political discourse analysis is dominated by single-model tools with no bias correction. When one AI provider hallucinates or carries a political lean, there is no check on the output. Baseline's multi-provider consensus approach treats model disagreement as signal, not noise. Statements where providers fundamentally disagree are surfaced, not hidden.
Official Sources
|
v
Ingestion Pipeline (n8n workflows, source hashing, dedup)
|
v
Extraction (Gemini structured output)
|
v
Multi-Provider Analysis (Claude + GPT + Grok in parallel)
| | |
v v v
[Each returns: repetition, novelty, affect, entropy, framing]
| | |
v v v
Consensus Engine (agreement measurement, variance detection, outlier flagging)
|
v
Scored Statement (signal rank, agreement level, framing consensus)
|
v
Feed / Profiles / Trends / Legislative Correlation
See ARCHITECTURE.mermaid for the full component diagram.
Decision: Route every statement to Claude, GPT, and Grok independently for analysis. Gemini handles structured extraction.
Why: Single models hallucinate, carry political bias, and degrade silently between versions. With 2 providers, a disagreement is a coin flip. With 3+ independent analyses, you can identify the outlier and measure confidence.
What we observed: GPT and Claude agreed on framing classification roughly 78% of the time. When they disagreed, Grok broke the tie in a consistent direction roughly 60% of the time, which was enough to flag genuinely ambiguous statements rather than forcing a binary call.
Tradeoff: 3x the API cost per statement. Cost tracking (11,224 log entries) revealed GPT-4 was roughly 10x more expensive per analysis than Gemini, which led to using Gemini for extraction (where structured output matters more than reasoning depth) and reserving GPT/Claude/Grok for analysis.
Decision: 22 Deno Edge Functions deployed globally via Supabase.
Why: Serverless, TypeScript-native, globally distributed, scales to zero. No server management for a solo developer.
Tradeoff: Cold start latency on infrequently-called functions. No long-running processes, so the analysis pipeline is event-driven rather than streaming. Functions that need to orchestrate multiple steps (like the full ingestion pipeline) rely on n8n workflows to chain calls.
Decision: Flutter 3.x with Dart.
Why: Dart's type safety caught schema mismatches at compile time during rapid iteration. Custom painting (CustomPainter) was essential for the consensus badge: 37 visual treatments based on agreement level, model count, and variance. React Native would have required a bridge to native canvas.
Tradeoff: Smaller ecosystem, fewer third-party packages. Hiring pool is smaller for Flutter than React Native.
Decision: Row Level Security enabled on all 30 tables, even for a solo developer.
Why: RLS pushes access control into PostgreSQL. Every query is filtered at the database level. There is no class of bug where a missing middleware check exposes data.
Tradeoff: Query complexity increases. Debugging RLS policy interactions is harder than debugging application-layer auth. Some queries required restructuring to work within RLS constraints.
Decision: Every AI API call logs to cost_log with provider, model, input/output tokens, and estimated USD.
Why: Without per-call cost tracking, budget overruns are invisible until the invoice arrives. Early logging revealed that GPT-4 was roughly 10x more expensive per analysis than Gemini. This data drove the decision to use Gemini for extraction and reserve expensive providers for analysis only.
Tradeoff: Additional write per API call. At 11,224 log entries, storage cost is negligible, but it adds latency to the analysis path (mitigated by logging asynchronously after the response is stored).
Decision: Source hashing, content fingerprinting, ingestion-time dedup, statement-level dedup, and analysis-level dedup.
Why: Early versions ingested duplicates that corrupted consensus scores. A statement analyzed twice would double-count one provider's output, skewing the consensus. Each dedup layer catches a different failure mode: same URL re-crawled, same text from different sources, same statement re-triggered by automation, same analysis re-run after a partial failure.
Tradeoff: Pipeline complexity. Five dedup checks add latency to ingestion. But the cost of a false duplicate (missed statement) is lower than the cost of a false unique (corrupted consensus).
Decision: Compute inter-provider agreement, variance detection, outlier flagging, and framing consensus rather than averaging scores.
Why: Averaging hides disagreement. If Claude scores a statement at 80 and GPT scores it at 20, the average (50) tells you nothing. Variance detection surfaces these cases as high-signal: the providers fundamentally disagree about what the statement means, which is often more interesting than the cases where they agree.
Tradeoff: More complex data model (stddev columns, framing_split JSONB, variance_detected boolean). More complex frontend rendering (37 visual treatments in the consensus badge). But the alternative, a single number, would lose the most valuable signal in the system.
| File | What it shows |
|---|---|
analyze-statement.sample.ts |
Multi-provider routing: 4 providers called via Promise.allSettled, response normalization, cost logging. Prompt text redacted. |
compute-consensus.sample.ts |
Consensus algorithm: per-metric averages/stddev, variance detection, outlier flagging, framing majority vote. Threshold values redacted. |
get-feed.ts |
Smart feed ranking: signal x recency_decay x variance_boost x novelty_boost, figure diversity cap, 5 sort modes. |
get-narrative-sync.ts |
865-line narrative consistency engine. Measures how a figure's positions shift over time. |
compute-bill-mutation.ts |
857-line bill version differ. Tracks legislative changes between bill revisions. |
summarize-bill.ts |
AI-powered bill summarization with provision extraction and category classification. Prompt text redacted. |
check-entitlement.ts |
Rate limiting, feature gating, signed entitlement tokens with HMAC-SHA256. |
revenuecat-webhook.ts |
Subscription lifecycle: idempotency, PII sanitization, constant-time auth, store mapping. |
| File | What it shows |
|---|---|
screens/figure_profile.dart |
3,900+ line figure profile with historical trends, framing radar, and signal timeline. |
screens/today_feed.dart |
3,200+ line main feed screen with smart ranking, pull-to-refresh, and infinite scroll. |
screens/lens_lab.dart |
2,600+ line multi-model comparison view. Side-by-side provider analysis. |
widgets/baseline_icons.dart |
1,200+ lines of hand-painted custom icons via CustomPainter. |
widgets/convergence_painter.dart |
1,600+ line narrative convergence visualization. |
widgets/framing_fingerprint.dart |
1,900+ line framing fingerprint data visualization. |
widgets/constellation_nav.dart |
1,300+ line custom navigation system. |
consensus_badge.dart |
1,100+ line CustomPainter with 37 visual treatments and animated ring gauge. |
| File | What it shows |
|---|---|
models/ |
9 data models: analysis, bill_summary, consensus, feed_statement, figure, framing, lens_lab, trends, vote. |
services/ |
5 service classes: Supabase RPC integration, error handling, response parsing. |
providers/ |
State management for feed and figures. |
utils/gate_state_machine.dart |
Feature gating state machine for tier-based access control. |
sql/ |
30 PostgreSQL migrations: schema evolution from MVP to production, RLS policies, custom RPCs. |
A statement enters the system. analyze-statement.ts validates the request, then sends the extracted statement to Claude, GPT, and Grok in parallel via Promise.allSettled. Each provider returns structured JSON (repetition, novelty, affective language rate, topic entropy, framing label). Responses are validated, clamped to 0-100 ranges, and normalized into a common schema. Each result is inserted into the analyses table with a mirror row in analyses_audit. Token usage and estimated USD cost are logged to cost_log. Then compute-consensus.ts loads all provider results, computes per-metric averages and standard deviations, measures inter-provider spread, detects variance (any metric stddev above threshold), identifies the outlier provider, determines framing consensus via majority vote, and upserts the result to the consensus table.
A feed request hits get-feed.ts. The function fetches 3x the requested limit from v_feed_ranked, then scores each row: signal x recency_decay (36-hour half-life) x variance_boost (+30% if variance detected) x novelty_boost (+15% max). A diversification pass caps any single figure at 20% of results (minimum 3 per figure). The paginated slice is returned with metadata. Five sort modes available: smart (default), recency, signal, novelty, divergence.
consensus_badge.dart receives a Consensus model. Based on agreement level, model count, and whether variance was detected, it selects one of 37 visual treatments. The widget renders an animated ring gauge via CustomPainter, with the ring fill representing consensus score, color encoding agreement level, and optional variance indicators. The painting logic handles edge cases: missing consensus (shimmer placeholder), single-provider results (partial ring), and split decisions (segmented ring with per-provider colors).
30 tables across 5 domains, all with Row Level Security enabled.
Core Intelligence:
figures (106 rows), statements (5,454), analyses (16,251), analyses_audit (16,251), consensus (5,361)
Legislative Tracking:
bill_versions (364), bill_summaries (354), version_comparisons, mutation_diffs (109), bill_spending_summary (50)
Data Pipeline:
raw_ingestion_jobs (20,921), pipeline_events (38,658), source_hashes (2,277), gemini_structured_output (2,644), cost_log (11,224)
Platform:
user_profiles, subscriptions, subscription_events, tier_features (52), feature_flags (16), rate_limit_entries, votes (161), annotations, waitlist
Social:
posted_tweets, trending_topics
- Same statement sent to Claude, GPT, and Grok independently (Gemini handles extraction, not analysis)
- Each provider returns structured metrics: repetition score, novelty score, affect intensity, entropy, framing label
- Providers never see each other's output
- Consensus engine loads all results, measures inter-model agreement, flags high-variance cases
- Every API call is cost-tracked: provider, model, input/output tokens, estimated USD (11,224 log entries across all providers)
- 4 providers: Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), Grok (xAI)
| Metric | Count |
|---|---|
| Tracked figures | 106 |
| Statements analyzed | 5,454 |
| AI analyses computed | 16,251 |
| Consensus scores | 5,361 |
| Pipeline events | 38,658 |
| Cost log entries | 11,224 |
| Bill versions tracked | 364 |
| Edge Functions | 22 |
| Database tables (all RLS) | 30 |
| Flutter screens | 28 |
| PostgreSQL version | 17 |
| n8n workflows (724 executions) | 10 |
- Add Gemini as a 4th analysis provider (currently handles extraction only)
- Add automated regression tests for Edge Functions (audit flagged this gap)
- Add structured logging and observability beyond pipeline_events (audit flagged this gap)
- Implement real-time WebSocket feed updates (currently polling)
- Build a public API for researchers
- Performance optimization on feed query for 50K+ statements
- Formalize the tradeoffs section into an ADR (Architecture Decision Record) pack
Built solo by Preston Winters using multi-model AI orchestration (Claude, GPT, Gemini, Grok) with 390+ documented institutional knowledge rules across 500+ build sessions.
Previously: CEO of Cyber Hornets Colony Club (2021-2023). Built a Web3 community from zero to 40,900 followers and 10,000+ Discord members. Led product vision, creative direction, and go-to-market strategy for an 8,888-piece NFT collection that generated 7-figure primary sales revenue with 2,000+ holders. 3-person founding team.
Current portfolio: 6 application projects. 1 published (StainSlayer AI, iOS + Android), 1 in closed testing (Baseline, Google Play), 4 in development.
Methodology: github.com/Preston2012/ai-council
Baseline is in closed testing on Google Play. Contact preston@baseline.marketing for reviewer access.
- trading-toolkit - Python automation: options scanner, trading bots, VPS infrastructure
- ai-council - Multi-model AI orchestration methodology (how this was built)
- baseline.marketing - Marketing site and portfolio
- Portfolio: baseline.marketing/built
- LinkedIn: linkedin.com/in/prestonwinters
- GitHub: github.com/Preston2012
- Contact: preston@baseline.marketing