Skip to content

thalicsouza/cs2-dash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS2 Matches Analytics

A Python analytics project that scrapes CS2 statistics from HLTV.org and serves them through an interactive Streamlit dashboard.

Status: v1.0 — functional with the limitations described below.


Project structure

cs_matches_analysys/
├── app.py                # Streamlit dashboard (entry point)
├── pipeline.py           # Data-fetch orchestrator (also runnable standalone)
├── config.py             # All URLs, file paths, and scraper settings
├── build_ca_bundle.py    # One-time script to build the corporate CA bundle
├── requirements.txt
├── scraper/
│   ├── __init__.py
│   └── hltv_scraper.py   # Selenium + BeautifulSoup parsers
└── data/                 # Auto-created; holds cached CSV files + last_update.txt
    ├── ca_bundle.pem       # Merged CA bundle (generated by build_ca_bundle.py)
    ├── recent_results.csv
    ├── top_teams.csv
    ├── events.csv
    └── last_update.txt

Quick start

1. Create a virtual environment

cd cs_matches_analysys
python3 -m ensurepip --upgrade      # only needed if pip is not yet available
python3 -m venv .venv
source .venv/bin/activate            # Windows: .venv\Scripts\activate

2. Install dependencies

pip install -r requirements.txt

3. Build the CA bundle (corporate network only)

If you are behind a corporate HTTPS proxy (e.g. a Zscaler or PG proxy) you must run this once before fetching data, and again whenever the corporate CA rotates:

python build_ca_bundle.py

This merges the default certifi trust store with all certificates in your macOS Keychain and saves the result to data/ca_bundle.pem.

4. Install chromedriver

Download the chromedriver that matches your installed Chrome version and place it in ~/bin/:

# Example for Chrome 149 on Apple Silicon
curl -L -o /tmp/cd.zip \
  "https://storage.googleapis.com/chrome-for-testing-public/149.0.7827.116/mac-arm64/chromedriver-mac-arm64.zip"
unzip /tmp/cd.zip -d /tmp/cd/ && mkdir -p ~/bin
cp /tmp/cd/chromedriver-mac-arm64/chromedriver ~/bin/chromedriver
chmod +x ~/bin/chromedriver
xattr -d com.apple.quarantine ~/bin/chromedriver   # remove macOS quarantine

The scraper looks for ~/bin/chromedriver first and falls back to chromedriver on $PATH.

5. Fetch data (first run)

python pipeline.py

This writes CSV files to data/ and a data/last_update.txt timestamp.

6. Launch the dashboard

streamlit run app.py

Open the URL printed in the terminal (usually http://localhost:8501).


Dashboard pages

Page Content
Overview KPI cards, top-10 teams bar chart, latest results snippet
Best Players Not populated (see limitations)
Best Teams Win % ranking derived from recent results, full team table
Recent Results Win distribution, matches-per-event pie, filterable results table
Events Ongoing and upcoming events with location

Click 🔄 Refresh Data in the sidebar at any time to pull fresh stats from HLTV.


Known limitations (v1.0)

Limitation Cause Workaround
Player stats not available HLTV /stats/players is behind a Cloudflare Managed Challenge that blocks headless Chrome when going through a corporate HTTPS proxy (TLS fingerprint changes) Not addressed in v1 — the dashboard shows an empty state for the Players page
Team stats are derived, not official Same Cloudflare block on /stats/teams Win % and match counts are computed from the scraped /results page (~100 most recent matches)
Chrome must be restarted between pages Loading the heavy events page crashes Chrome if run in the same session as the results page The pipeline automatically restarts the driver before fetching events — no user action needed
macOS only CA bundle generation uses the security CLI tool from macOS On Linux/Windows, run build_ca_bundle.py without Keychain support and manually add your proxy CA PEM

Updating data

Because HLTV data changes slowly (tournament by tournament), a refresh frequency of once per week or less is recommended.

You can schedule the pipeline with cron:

# refresh every Sunday at 08:00
0 8 * * 0  /path/to/.venv/bin/python /path/to/cs_matches_analysys/pipeline.py

Notes on scraping

HLTV is protected by Cloudflare. The scraper uses Selenium with headless Chrome, which renders JavaScript and presents a real browser fingerprint, bypassing the Cloudflare challenge on most pages.

A randomised 3–7 s delay between requests is applied to avoid rate-limiting.

If HLTV changes its HTML structure, update the CSS selectors in scraper/hltv_scraper.py. The pipeline always preserves the last successfully cached CSV if a fetch fails, so the dashboard never shows an empty state.


Dependencies

Package Purpose
streamlit Dashboard framework
plotly Interactive charts
pandas Data wrangling
selenium Headless Chrome automation
beautifulsoup4 + lxml HTML parsing
requests Used internally by the CA bundle builder

| requests | HTTP (used internally by cloudscraper) |

About

Repo created to build a dashboard that collects and store data from recent CS2 matches results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages