A Python analytics project that scrapes CS2 statistics from HLTV.org and serves them through an interactive Streamlit dashboard.
Status: v1.0 — functional with the limitations described below.
cs_matches_analysys/
├── app.py # Streamlit dashboard (entry point)
├── pipeline.py # Data-fetch orchestrator (also runnable standalone)
├── config.py # All URLs, file paths, and scraper settings
├── build_ca_bundle.py # One-time script to build the corporate CA bundle
├── requirements.txt
├── scraper/
│ ├── __init__.py
│ └── hltv_scraper.py # Selenium + BeautifulSoup parsers
└── data/ # Auto-created; holds cached CSV files + last_update.txt
├── ca_bundle.pem # Merged CA bundle (generated by build_ca_bundle.py)
├── recent_results.csv
├── top_teams.csv
├── events.csv
└── last_update.txt
cd cs_matches_analysys
python3 -m ensurepip --upgrade # only needed if pip is not yet available
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activatepip install -r requirements.txtIf you are behind a corporate HTTPS proxy (e.g. a Zscaler or PG proxy) you must run this once before fetching data, and again whenever the corporate CA rotates:
python build_ca_bundle.pyThis merges the default certifi trust store with all certificates in your macOS Keychain and saves the result to data/ca_bundle.pem.
Download the chromedriver that matches your installed Chrome version and place it in ~/bin/:
# Example for Chrome 149 on Apple Silicon
curl -L -o /tmp/cd.zip \
"https://storage.googleapis.com/chrome-for-testing-public/149.0.7827.116/mac-arm64/chromedriver-mac-arm64.zip"
unzip /tmp/cd.zip -d /tmp/cd/ && mkdir -p ~/bin
cp /tmp/cd/chromedriver-mac-arm64/chromedriver ~/bin/chromedriver
chmod +x ~/bin/chromedriver
xattr -d com.apple.quarantine ~/bin/chromedriver # remove macOS quarantineThe scraper looks for ~/bin/chromedriver first and falls back to chromedriver on $PATH.
python pipeline.pyThis writes CSV files to data/ and a data/last_update.txt timestamp.
streamlit run app.pyOpen the URL printed in the terminal (usually http://localhost:8501).
| Page | Content |
|---|---|
| Overview | KPI cards, top-10 teams bar chart, latest results snippet |
| Best Players | Not populated (see limitations) |
| Best Teams | Win % ranking derived from recent results, full team table |
| Recent Results | Win distribution, matches-per-event pie, filterable results table |
| Events | Ongoing and upcoming events with location |
Click 🔄 Refresh Data in the sidebar at any time to pull fresh stats from HLTV.
| Limitation | Cause | Workaround |
|---|---|---|
| Player stats not available | HLTV /stats/players is behind a Cloudflare Managed Challenge that blocks headless Chrome when going through a corporate HTTPS proxy (TLS fingerprint changes) |
Not addressed in v1 — the dashboard shows an empty state for the Players page |
| Team stats are derived, not official | Same Cloudflare block on /stats/teams |
Win % and match counts are computed from the scraped /results page (~100 most recent matches) |
| Chrome must be restarted between pages | Loading the heavy events page crashes Chrome if run in the same session as the results page | The pipeline automatically restarts the driver before fetching events — no user action needed |
| macOS only | CA bundle generation uses the security CLI tool from macOS |
On Linux/Windows, run build_ca_bundle.py without Keychain support and manually add your proxy CA PEM |
Because HLTV data changes slowly (tournament by tournament), a refresh frequency of once per week or less is recommended.
You can schedule the pipeline with cron:
# refresh every Sunday at 08:00
0 8 * * 0 /path/to/.venv/bin/python /path/to/cs_matches_analysys/pipeline.pyHLTV is protected by Cloudflare. The scraper uses Selenium with headless Chrome, which renders JavaScript and presents a real browser fingerprint, bypassing the Cloudflare challenge on most pages.
A randomised 3–7 s delay between requests is applied to avoid rate-limiting.
If HLTV changes its HTML structure, update the CSS selectors in scraper/hltv_scraper.py. The pipeline always preserves the last successfully cached CSV if a fetch fails, so the dashboard never shows an empty state.
| Package | Purpose |
|---|---|
streamlit |
Dashboard framework |
plotly |
Interactive charts |
pandas |
Data wrangling |
selenium |
Headless Chrome automation |
beautifulsoup4 + lxml |
HTML parsing |
requests |
Used internally by the CA bundle builder |
| requests | HTTP (used internally by cloudscraper) |