A real-time data streaming pipeline inspired by CERN's LHC data acquisition infrastructure. I built this to teach myself how detector data actually flows from readout to analysisβfrom event generation all the way through streaming, triggering, and live monitoring.
- Phase 1: Core C++ Event Generator & Kafka Producer.
- Phase 2: Python-based Trigger Engine & SQLite Storage.
- Phase 3: Real-time Dashboard with Chart.js visualization.
- Phase 4: CI/CD Integration and Physics Validation (Z-mass peak).
- Phase 5: Scalability testing with Kubernetes (K8s) deployment.
- Phase 6: Integrating secondary AI-based anomaly detection (see QuarkStream).
The pipeline simulates proton-proton collision events in C++, streams them through Apache Kafka, applies physics trigger logic (similar to ATLAS/CMS HLT paths), stores results in SQLite, and visualizes everything on a live dashboard.
Technologies: C++17 Β· Apache Kafka Β· Python Β· SQLite Β· JavaScript Β· Chart.js Β· Flask Β· Docker
See also: LHCEventAnalysis β my batch analysis pipeline (C++/ROOT/FastJet) that complements this streaming project.
graph LR
subgraph "Event Generation"
A["C++ Event Generator<br/><small>ZβΞΌΞΌ, ttΜ, QCD</small>"] -->|JSON stdout| B["Python Producer"]
end
subgraph "Streaming Layer"
B -->|publish| C[("Apache Kafka<br/><small>lhc-raw-events</small>")]
end
subgraph "Processing"
C -->|consume| D["Stream Processor<br/><small>Trigger Logic</small>"]
D -->|store| E[("SQLite<br/><small>Events + Stats</small>")]
end
subgraph "Monitoring"
E -->|query| F["Flask + SocketIO<br/><small>Dashboard Server</small>"]
F -->|WebSocket| G["Live Dashboard<br/><small>Chart.js</small>"]
end
style A fill:#6366f1,stroke:#4f46e5,color:#fff
style C fill:#f59e0b,stroke:#d97706,color:#fff
style D fill:#10b981,stroke:#059669,color:#fff
style G fill:#06b6d4,stroke:#0891b2,color:#fff
βββββββββββββββββββ ββββββββββββ ββββββββββββββββββββ βββββββββββββ βββββββββββββ
β C++ Generator ββββββΆβ Kafka ββββββΆβ Stream Processor ββββββΆβ SQLite ββββββΆβ Dashboard β
β ~10β΅ evt/s β β Producer β β Trigger Engine β β Storage β β Live UI β
βββββββββββββββββββ ββββββββββββ ββββββββββββββββββββ βββββββββββββ βββββββββββββ
- Simulates three physics processes: ZβΞΌβΊΞΌβ», ttΜββ+jets, QCD multi-jet
- Uses Breit-Wigner mass distributions and physically motivated kinematics
- Configurable pileup, event rates, and random seeds
- Modern C++17 with CMake build system (FetchContent for nlohmann/json)
- Outputs JSON at ~10β΅ events/second
- Producer: Reads JSON events from stdin/file, publishes to Kafka with gzip compression
- Consumer: Subscribes to raw events, applies trigger logic, stores to SQLite
- Retry logic and configurable batch sizes for production-grade reliability
- Docker Compose setup with KRaft mode (no Zookeeper required)
- HLT_mu25: Single isolated muon with pT > 25 GeV
- HLT_2mu_Zmass: Opposite-sign dimuon pair in Z mass window (75β105 GeV)
- HLT_4j50: β₯4 jets with pT > 50 GeV
- HLT_met50: Missing transverse energy > 50 GeV
- Computes invariant masses and ΞR separation
- Live event rate monitoring with 30-second rolling window
- Dimuon invariant mass spectrum (Z boson peak visualization)
- Missing transverse energy distribution
- Per-trigger efficiency bars with live percentages
- Dark theme inspired by CERN control rooms
- WebSocket + REST API with automatic fallback
- C++ compiler with C++17 support (g++ β₯ 7 or clang++ β₯ 5)
- CMake β₯ 3.14
- Python β₯ 3.8
- Docker & Docker Compose (for Kafka)
git clone https://github.com/Divij-Bhoj/lhc-data-pipeline.git
cd lhc-data-pipeline
chmod +x scripts/setup.sh
./scripts/setup.sh
docker compose up -d
# Terminal 1: Generate events and publish to Kafka
./event_generator/build/event_generator -n 50000 --rate 500 | python -m pipeline.producer
# Terminal 2: Consume from Kafka, apply triggers, store in SQLite
python -m pipeline.consumer
# Terminal 3: Start the monitoring dashboard
python -m dashboard.server
# Open http://localhost:5000
lhc-data-pipeline/
βββ event_generator/ # C++17 event simulation
β βββ CMakeLists.txt # CMake build with FetchContent
β βββ include/
β β βββ event_generator.h # Core classes and physics constants
β βββ src/
β βββ event_generator.cpp # Physics process implementations
β βββ main.cpp # CLI entry point
βββ pipeline/ # Python streaming pipeline
β βββ config.py # Centralized configuration
β βββ producer.py # Kafka producer (stdin β Kafka)
β βββ consumer.py # Kafka consumer + SQLite writer
β βββ trigger.py # Physics trigger algorithms
βββ dashboard/ # Real-time monitoring UI
β βββ server.py # Flask + SocketIO backend
β βββ templates/
β β βββ index.html # Dashboard layout
β βββ static/
β βββ css/style.css # Dark theme styling
β βββ js/app.js # Chart.js visualizations
βββ database/
β βββ schema.sql # SQLite schema
βββ scripts/
β βββ setup.sh # Automated setup
βββ docker-compose.yml # Kafka (KRaft mode)
βββ requirements.txt # Python dependencies
βββ data/ # Runtime data (git-ignored)
# Generate 50,000 events at max speed
./event_generator/build/event_generator -n 50000
# Stream infinite events at 100 Hz
./event_generator/build/event_generator -n 0 --rate 100
# Custom run number and seed
./event_generator/build/event_generator -n 10000 -r 42 -s 12345
# Adjust pileup conditions
./event_generator/build/event_generator -n 10000 --pileup 40
{
"event_id": 1,
"run_number": 1,
"timestamp_ms": 1708800000000,
"event_type": "z_mumu",
"num_particles": 28,
"num_jets": 2,
"num_muons": 2,
"met": 18.432,
"sum_et": 342.17,
"particles": [
{"pdg_id": -13, "pt": 45.2, "eta": -0.83, "phi": 1.24, "energy": 52.1, "mass": 0.106, "is_isolated": true},
{"pdg_id": 13, "pt": 42.8, "eta": 1.12, "phi": -1.89, "energy": 68.3, "mass": 0.106, "is_isolated": true}
]
}
All pipeline parameters are configurable via pipeline/config.py or environment variables:
| Parameter | Default | Env Variable |
|---|---|---|
| Kafka Bootstrap | localhost:9092 |
KAFKA_BOOTSTRAP |
| Database Path | data/lhc_events.db |
LHC_DB_PATH |
| Dashboard Port | 5000 |
DASHBOARD_PORT |
| Muon pT Threshold | 25 GeV | β |
| Z Mass Window | 75β105 GeV | β |
| Jet pT Threshold | 50 GeV | β |
| MET Threshold | 50 GeV | β |
| Component | Technology | Purpose |
|---|---|---|
| Event Generator | C++17, CMake, nlohmann/json | High-performance physics simulation |
| Message Broker | Apache Kafka (KRaft) | Distributed event streaming |
| Stream Processing | Python, kafka-python | Trigger logic and filtering |
| Storage | SQLite | Event and statistics persistence |
| Dashboard Backend | Flask, Flask-SocketIO | REST API + WebSocket server |
| Dashboard Frontend | JavaScript, Chart.js | Real-time data visualization |
| Infrastructure | Docker Compose | Container orchestration |
- Author: Divij Bhoj
- Purpose: CERN Technical Studentship Application Portfolio
- Technologies: C++17, Apache Kafka, Python, Flask, SQLite, JavaScript, Chart.js, Docker
- Disclaimer: This project uses simulated data for educational purposes. Not affiliated with official LHC experiments.
This is a personal portfolio project, but suggestions are welcome:
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-trigger) - Commit changes (
git commit -am 'Add dielectron trigger path') - Push to branch (
git push origin feature/new-trigger) - Open a Pull Request
MIT License : see LICENSE for details.
If used in academic work, please cite:
Divij Bhoj (2026). LHC Data Streaming & Monitoring Pipeline.
GitHub: https://github.com/Divij-Bhoj/lhc-data-pipeline
Divij Bhoj Β· π§ divijbhoj@gmail.com Β· π GitHub