CS2 Matches Analytics

A Python analytics project that scrapes CS2 statistics from HLTV.org and serves them through an interactive Streamlit dashboard.

Status: v1.0 — functional with the limitations described below.

Project structure

cs_matches_analysys/
├── app.py                # Streamlit dashboard (entry point)
├── pipeline.py           # Data-fetch orchestrator (also runnable standalone)
├── config.py             # All URLs, file paths, and scraper settings
├── build_ca_bundle.py    # One-time script to build the corporate CA bundle
├── requirements.txt
├── scraper/
│   ├── __init__.py
│   └── hltv_scraper.py   # Selenium + BeautifulSoup parsers
└── data/                 # Auto-created; holds cached CSV files + last_update.txt
    ├── ca_bundle.pem       # Merged CA bundle (generated by build_ca_bundle.py)
    ├── recent_results.csv
    ├── top_teams.csv
    ├── events.csv
    └── last_update.txt

Quick start

1. Create a virtual environment

cd cs_matches_analysys
python3 -m ensurepip --upgrade      # only needed if pip is not yet available
python3 -m venv .venv
source .venv/bin/activate            # Windows: .venv\Scripts\activate

2. Install dependencies

pip install -r requirements.txt

3. Build the CA bundle (corporate network only)

If you are behind a corporate HTTPS proxy (e.g. a Zscaler or PG proxy) you must run this once before fetching data, and again whenever the corporate CA rotates:

python build_ca_bundle.py

This merges the default certifi trust store with all certificates in your macOS Keychain and saves the result to data/ca_bundle.pem.

4. Install chromedriver

Download the chromedriver that matches your installed Chrome version and place it in ~/bin/:

# Example for Chrome 149 on Apple Silicon
curl -L -o /tmp/cd.zip \
  "https://storage.googleapis.com/chrome-for-testing-public/149.0.7827.116/mac-arm64/chromedriver-mac-arm64.zip"
unzip /tmp/cd.zip -d /tmp/cd/ && mkdir -p ~/bin
cp /tmp/cd/chromedriver-mac-arm64/chromedriver ~/bin/chromedriver
chmod +x ~/bin/chromedriver
xattr -d com.apple.quarantine ~/bin/chromedriver   # remove macOS quarantine

The scraper looks for ~/bin/chromedriver first and falls back to chromedriver on $PATH.

5. Fetch data (first run)

python pipeline.py

This writes CSV files to data/ and a data/last_update.txt timestamp.

6. Launch the dashboard

streamlit run app.py

Open the URL printed in the terminal (usually http://localhost:8501).

Dashboard pages

Page	Content
Overview	KPI cards, top-10 teams bar chart, latest results snippet
Best Players	Not populated (see limitations)
Best Teams	Win % ranking derived from recent results, full team table
Recent Results	Win distribution, matches-per-event pie, filterable results table
Events	Ongoing and upcoming events with location

Click 🔄 Refresh Data in the sidebar at any time to pull fresh stats from HLTV.

Known limitations (v1.0)

Limitation	Cause	Workaround
Player stats not available	HLTV `/stats/players` is behind a Cloudflare Managed Challenge that blocks headless Chrome when going through a corporate HTTPS proxy (TLS fingerprint changes)	Not addressed in v1 — the dashboard shows an empty state for the Players page
Team stats are derived, not official	Same Cloudflare block on `/stats/teams`	Win % and match counts are computed from the scraped `/results` page (~100 most recent matches)
Chrome must be restarted between pages	Loading the heavy events page crashes Chrome if run in the same session as the results page	The pipeline automatically restarts the driver before fetching events — no user action needed
macOS only	CA bundle generation uses the `security` CLI tool from macOS	On Linux/Windows, run `build_ca_bundle.py` without Keychain support and manually add your proxy CA PEM

Updating data

Because HLTV data changes slowly (tournament by tournament), a refresh frequency of once per week or less is recommended.

You can schedule the pipeline with cron:

# refresh every Sunday at 08:00
0 8 * * 0  /path/to/.venv/bin/python /path/to/cs_matches_analysys/pipeline.py

Notes on scraping

HLTV is protected by Cloudflare. The scraper uses Selenium with headless Chrome, which renders JavaScript and presents a real browser fingerprint, bypassing the Cloudflare challenge on most pages.

A randomised 3–7 s delay between requests is applied to avoid rate-limiting.

If HLTV changes its HTML structure, update the CSS selectors in scraper/hltv_scraper.py. The pipeline always preserves the last successfully cached CSV if a fetch fails, so the dashboard never shows an empty state.

Dependencies

Package	Purpose
`streamlit`	Dashboard framework
`plotly`	Interactive charts
`pandas`	Data wrangling
`selenium`	Headless Chrome automation
`beautifulsoup4` + `lxml`	HTML parsing
`requests`	Used internally by the CA bundle builder

| requests | HTTP (used internally by cloudscraper) |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS2 Matches Analytics

Project structure

Quick start

1. Create a virtual environment

2. Install dependencies

3. Build the CA bundle (corporate network only)

4. Install chromedriver

5. Fetch data (first run)

6. Launch the dashboard

Dashboard pages

Known limitations (v1.0)

Updating data

Notes on scraping

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
data		data
scraper		scraper
README.md		README.md
app.py		app.py
build_ca_bundle.py		build_ca_bundle.py
config.py		config.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CS2 Matches Analytics

Project structure

Quick start

1. Create a virtual environment

2. Install dependencies

3. Build the CA bundle (corporate network only)

4. Install chromedriver

5. Fetch data (first run)

6. Launch the dashboard

Dashboard pages

Known limitations (v1.0)

Updating data

Notes on scraping

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages