Skip to content

Latest commit

 

History

History
106 lines (76 loc) · 2.87 KB

File metadata and controls

106 lines (76 loc) · 2.87 KB

SentimentPulse — Pipeline Throughput Benchmark

Test date: 2026-03-25 Duration: 4-hour steady-state run Environment: MacBook Pro M-series (MPS), 32 GB RAM, Docker Desktop Sources active: SEC EDGAR (8-K/6-K), Yahoo Finance RSS (50 tickers), Reuters RSS


Summary

Metric Result Target Status
Headlines ingested / hour ~420
Headlines classified / hour ~410
End-to-end latency (p50) 38 s < 3 min
End-to-end latency (p95) 98 s < 3 min
Kafka consumer lag (steady-state) 2–6 msgs < 100
Dedup cache hit rate 71 %
DLQ rate 0.4 % < 1 %

Ingestion breakdown (per source)

Source Headlines / hour Notes
SEC EDGAR (8-K/6-K) ~90 Peaks at market open; filings across all 50 tracked tickers
Yahoo Finance RSS ~280 50 tickers × ~6 headlines/tick/hour average
Reuters RSS ~50 Lower cadence; high-quality wire stories
GNews (optional) Not active during test run (API key not set)

Latency distribution

Measured as now() - publishedAt at the point the enriched event is visible in Redis:

p10  :  12 s
p50  :  38 s
p75  :  61 s
p90  :  82 s
p95  :  98 s
p99  : 147 s
max  : 203 s   (EDGAR filing fetch + FinBERT cold batch)

The p99 spike corresponds to the first batch after FinBERT model warmup on MPS and a large EDGAR full-text fetch. Subsequent batches run at 12–15 ms/headline on MPS.


Classifier throughput

FinBERT runs in batches of 64 headlines. On Apple MPS:

Batch size Time / batch Throughput
64 ~0.9 s ~4,270 headlines/min
32 ~0.5 s ~3,840 headlines/min
16 ~0.3 s ~3,200 headlines/min

At ~420 headlines/hour ingestion rate the classifier is not the bottleneck. The consumer lag stayed 2–6 messages throughout the run.


Redis memory usage

After 4 hours with 50 tickers tracked:

  • sentiment:{ticker} HASH keys: 50 keys × ~400 bytes = ~20 KB
  • sentimentpulse:scores:{ticker} drift window LISTs: 50 × 720 × 8 bytes ≈ 288 KB
  • sentimentpulse:seen_urls dedup SET (48h TTL): ~4,200 members ≈ ~340 KB
  • Total Redis footprint: < 1 MB

TimescaleDB row count

After 4 hours:

SELECT count(*) FROM sentiment_events;
-- 1,673 rows

SELECT ticker, count(*) FROM sentiment_events
GROUP BY ticker ORDER BY count DESC LIMIT 5;
--  AAPL | 68
--  MSFT | 61
--  NVDA | 57
--  AMZN | 54
--  TSLA | 52

Compression (after 7-day retention policy kicks in) reduces on-disk size ~11×.


Notes

  • Headlines with non-ASCII ratio > 40% are routed to sentiment.dlq (DLQ rate: 0.4%).
  • The deduplication TTL of 48 hours eliminates ~71% of re-fetched RSS items across polling cycles.
  • FinBERT inference on CPU (no MPS/CUDA) runs ~8× slower; the classifier service should be deployed on GPU in production.