SentimentPulse — Pipeline Throughput Benchmark

Test date: 2026-03-25 Duration: 4-hour steady-state run Environment: MacBook Pro M-series (MPS), 32 GB RAM, Docker Desktop Sources active: SEC EDGAR (8-K/6-K), Yahoo Finance RSS (50 tickers), Reuters RSS

Summary

Metric	Result	Target	Status
Headlines ingested / hour	~420	—	✓
Headlines classified / hour	~410	—	✓
End-to-end latency (p50)	38 s	< 3 min	✓
End-to-end latency (p95)	98 s	< 3 min	✓
Kafka consumer lag (steady-state)	2–6 msgs	< 100	✓
Dedup cache hit rate	71 %	—	—
DLQ rate	0.4 %	< 1 %	✓

Ingestion breakdown (per source)

Source	Headlines / hour	Notes
SEC EDGAR (8-K/6-K)	~90	Peaks at market open; filings across all 50 tracked tickers
Yahoo Finance RSS	~280	50 tickers × ~6 headlines/tick/hour average
Reuters RSS	~50	Lower cadence; high-quality wire stories
GNews (optional)	—	Not active during test run (API key not set)

Latency distribution

Measured as now() - publishedAt at the point the enriched event is visible in Redis:

p10  :  12 s
p50  :  38 s
p75  :  61 s
p90  :  82 s
p95  :  98 s
p99  : 147 s
max  : 203 s   (EDGAR filing fetch + FinBERT cold batch)

The p99 spike corresponds to the first batch after FinBERT model warmup on MPS and a large EDGAR full-text fetch. Subsequent batches run at 12–15 ms/headline on MPS.

Classifier throughput

FinBERT runs in batches of 64 headlines. On Apple MPS:

Batch size	Time / batch	Throughput
64	~0.9 s	~4,270 headlines/min
32	~0.5 s	~3,840 headlines/min
16	~0.3 s	~3,200 headlines/min

At ~420 headlines/hour ingestion rate the classifier is not the bottleneck. The consumer lag stayed 2–6 messages throughout the run.

Redis memory usage

After 4 hours with 50 tickers tracked:

sentiment:{ticker} HASH keys: 50 keys × ~400 bytes = ~20 KB
sentimentpulse:scores:{ticker} drift window LISTs: 50 × 720 × 8 bytes ≈ 288 KB
sentimentpulse:seen_urls dedup SET (48h TTL): ~4,200 members ≈ ~340 KB
Total Redis footprint: < 1 MB

TimescaleDB row count

After 4 hours:

SELECT count(*) FROM sentiment_events;
-- 1,673 rows

SELECT ticker, count(*) FROM sentiment_events
GROUP BY ticker ORDER BY count DESC LIMIT 5;
--  AAPL | 68
--  MSFT | 61
--  NVDA | 57
--  AMZN | 54
--  TSLA | 52

Compression (after 7-day retention policy kicks in) reduces on-disk size ~11×.

Notes

Headlines with non-ASCII ratio > 40% are routed to sentiment.dlq (DLQ rate: 0.4%).
The deduplication TTL of 48 hours eliminates ~71% of re-fetched RSS items across polling cycles.
FinBERT inference on CPU (no MPS/CUDA) runs ~8× slower; the classifier service should be deployed on GPU in production.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SentimentPulse — Pipeline Throughput Benchmark

Summary

Ingestion breakdown (per source)

Latency distribution

Classifier throughput

Redis memory usage

TimescaleDB row count

Notes

FilesExpand file tree

sentiment_throughput.md

Latest commit

History

sentiment_throughput.md

File metadata and controls

SentimentPulse — Pipeline Throughput Benchmark

Summary

Ingestion breakdown (per source)

Latency distribution

Classifier throughput

Redis memory usage

TimescaleDB row count

Notes