Skip to content

CNaught-Inc/coding-agent-emissions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coding Agent Adoption

Multi-signal estimate of AI coding agent adoption from publicly observable GitHub data. Tracks ~30 agents (Cursor, Claude Code, OpenAI Codex, GitHub Copilot, Devin, Aider, Jules, OpenHands, and more) across ~450 days of commit attribution, branch-prefix activity, push volume, and a worked intervention analysis around the October 20, 2025 Claude Code web launch.

Originally built at CNaught for AI-coding-adoption research.

What's in the box

Data (~2 MB, all CSV):

  • daily_ai_commits.csv — daily commit counts per tool, ~450 days × 32 tools.
  • branch_activity_daily.csv, branch_creates_daily.csv, agent_branch_creates_daily.csv — branch-prefix signals from GH Archive via BigQuery.
  • push_events_daily.csv — total daily push volume (denominator for AI share %).
  • daily_carbon_estimates.csv — derived carbon/energy estimates (Jegham et al. 2025 framework).
  • bot_donors_daily.csv — never-treated donor pool for the intervention analysis (Dependabot, Renovate, pre-commit-ci, etc.).

Pipeline scripts:

  • github_ai_daily.py — daily commit-attribution fetcher (GitHub Search API).
  • fetch_branch_activity.py, fetch_branch_creates.py, fetch_daily_totals.py — BigQuery fetchers.
  • estimate_carbon.py — carbon/energy estimation.
  • dashboard.py — Plotly HTML dashboard generator.
  • run_pipeline.py — orchestrator (daily / weekly modes, Slack alerting, git commit-back).
  • anomaly_analysis.py — signature-drift scanner.

Intervention analysis:

  • run_intervention_analysis.py and the intervention_*.py modules — event-window CAR (primary), BEAST changepoint detection, first-difference ITS, BSTS, synthetic DiD. The Oct 20 Claude Code web launch is the worked example.
  • vendor/synthdid/ — vendored synthdid.py (PyPI build is broken on Python 3.14).
  • events/model_releases.csv — dated event calendar.

Documentation:

  • METHODOLOGY.md — full methodology, signal definitions, caveats.
  • AGENTS.md — operating manual for AI coding agents reproducing the analysis.
  • references/ — 9 PDFs of cited academic papers.

Quick look (no setup)

The CSV files are self-describing — open daily_ai_commits.csv in any spreadsheet, or:

import pandas as pd
df = pd.read_csv("daily_ai_commits.csv")
print(df.groupby("tool")["commits"].sum().sort_values(ascending=False).head(10))

Quick dashboard (Python only, no credentials)

pip install -r requirements.txt
python dashboard.py
open dashboard.html        # macOS; on Linux: xdg-open dashboard.html

This works against the shipped CSVs — no GitHub or GCP credentials needed.

Quick intervention analysis (Python only, no credentials)

pip install -r requirements.txt
python run_intervention_analysis.py

Writes results + charts to outputs/intervention/. The headline finding (Oct 20 Claude Code web launch) is in the CAR section: cumulative abnormal commits ≈ +60K over the 8-day window, z ≈ 5.6.

Full setup (refresh data from source)

Required if you want to update the CSVs beyond their shipped end-date.

1. Python environment

On macOS, you may need python3 instead of python. The same applies to pip3 instead of pip.

python --version       # 3.12 recommended; 3.13/3.14 work but see AGENTS.md footguns for caveats
pip install -r requirements.txt

2. Credentials

Copy the template and fill it in:

cp .env.example .env

You need a GITHUB_TOKEN and a GCP_PROJECT. Slack is optional.

3. GitHub fine-grained PAT (for github_ai_daily.py and fetch_bot_donors.py)

  1. Go to https://github.com/settings/personal-access-tokens/new
  2. Set a name, expiration, and "Public repositories" access.
  3. No additional permissions needed — public-search reads work without them.
  4. Copy the token (starts with github_pat_...) into .env as GITHUB_TOKEN=....

Rate limit: 30 req/min with a token (vs 10 req/min unauthenticated). A full historical refresh takes a few hours.

4. GCP project + BigQuery (for fetch_branch_activity.py, fetch_branch_creates.py, fetch_daily_totals.py)

  1. Create a GCP project at https://console.cloud.google.com/projectcreate (or use an existing one).
  2. Enable BigQuery: https://console.cloud.google.com/apis/library/bigquery.googleapis.com (select your project, click Enable).
  3. Attach a billing account: https://console.cloud.google.com/billing/linkedaccount (BigQuery has a free tier — 1 TB/month — but a billing account must be attached even to use the free tier).
  4. Put the project ID in .env as GCP_PROJECT=your-project-id.
  5. Authenticate locally:
    gcloud auth application-default login
    (Or set GOOGLE_APPLICATION_CREDENTIALS=/abs/path/to/sa-key.json if you'd rather use a service account.)

Cost expectation: the GH Archive dataset (githubarchive.day.*) is public; you pay for the scan, not the storage. A full backfill of all four BigQuery signals is ~$5–15. Incremental weekly runs are pennies. Always bound your date ranges (--start-date / --end-date).

5. Optional: Slack webhook

If you want anomaly/failure alerts:

  1. Create an incoming webhook: https://api.slack.com/messaging/webhooks
  2. Put the URL in .env as SLACK_WEBHOOK_URL=https://hooks.slack.com/services/....

Leave blank to disable. Clean pipeline runs are quiet either way.

Running the pipeline

Daily refresh (commits only)

python run_pipeline.py --mode daily --dry-run

Fetches the last 7 days of commit data, regenerates the carbon estimates and dashboard, scans for anomalies, and (without --dry-run) commits the updated CSVs back to git.

Weekly refresh (adds BigQuery signals)

python run_pipeline.py --mode weekly --dry-run

Adds the three BigQuery fetchers (branch activity, branch creates, daily totals) on a 14-day window before the daily steps. Requires GCP auth.

Individual scripts

Each script accepts --help:

python github_ai_daily.py --help
python fetch_branch_creates.py --help
python estimate_carbon.py --help
python dashboard.py

Tests

python -m pytest tests/ -v

CI runs the same on every push and PR.

Methodology

See METHODOLOGY.md for the full treatment. Three things are worth flagging up front:

  1. Commit attribution measures autonomous agent coding, not "tools developers use." Copilot autocomplete and Cursor editor mode use the developer's git identity and leave zero trace. Only tools that make commits with their own author/committer identity are detectable here. That's why the multi-signal approach (branch prefixes, push volume) exists — to catch tools that are invisible in commit data.

  2. Tool signatures occasionally change. Aider switched to Co-authored-by trailers in v0.85.0 (May 2025), making ~95% of its commits invisible to this method. Copilot SWE Agent was renamed in March 2026. Warp's CLI was renamed to Oz around the same time. The anomaly scanner flags suspected drift, but new changes will happen — if you notice a cliff in the data and there's no announcement on file, please open an issue.

  3. GH Archive (BigQuery source data) has three known issues in 2025: a permanent ~35% push-volume drop on May 24, a one-day brownout on Sep 8, and a 99.5% outage on Oct 8–14. Those dates are nulled in the shipped CSVs and flagged on the dashboard.

Working with an AI coding agent

If you point Claude Code, Codex, Cursor, or another agent at this repo, see AGENTS.md — it's the operating manual: file map, setup checklist, reproduction walkthrough, guardrails, and a step-the-user-through-it script for when a human is in the loop.

Contributing

See CONTRIBUTING.md. The most valuable contributions are flagging tool-signature changes and proposing new tool detectors — see that doc for the format.

License

MIT. See LICENSE.

Attribution

Originally built at CNaught for AI-coding-adoption research. If you build on this work, a citation or link is appreciated but not required.

About

Multi-signal estimate of AI coding agent adoption from public GitHub data, with carbon/energy footprint estimates and a worked intervention analysis.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages