Skip to content

Add tech-stats to repo-intel: languages, frameworks, timeline & activity patterns#14

Merged
tyom merged 14 commits into
masterfrom
tech-stats
May 21, 2026
Merged

Add tech-stats to repo-intel: languages, frameworks, timeline & activity patterns#14
tyom merged 14 commits into
masterfrom
tech-stats

Conversation

@tyom
Copy link
Copy Markdown
Owner

@tyom tyom commented May 21, 2026

Summary

Builds out the repo-intel dashboard with a richer set of repository insights and interactions.

  • Languages & frameworks — per-contributor and repo-wide language breakdowns plus detected frameworks (Technologies section), with improved language detection.
  • Commit timeline — a 14-year scrollable timeline with a years bar, drag-to-zoom rubber band, pan/inertia, and hover details. Click a commit to open it on GitHub.
  • Activity patterns — commit-time (hour of day) and day-of-week histograms per contributor, with weekends marked. Clicking a bar opens a popover listing that bucket's commits, each linking to its GitHub commit page. Buckets are reproduced client-side from the commit's author-local wall-clock so they match the server-side histogram.
  • Contributions heatmap — weekends marked.
  • Resilience & cleanup — tolerate GitHub history-fetch 502s, avoid collapsing non-retryable HTTP errors, dedupe HEAD tree listing, cache path classification, and unify framework detection.

Test plan

  • Generated dashboards from local and remote repos via make repo-intel-dev / the bundled stow/bin/repo-intel.
  • Verified in a headless browser that pattern-chart bar clicks open the popover, popover row counts exactly match each bar's value across all hour/day buckets, rows link to valid /commit/<sha> URLs, and the displayed times fall within the clicked bucket.
  • Rebuilt the bundled stow/bin/repo-intel via make repo-intel-build.

Summary by CodeRabbit

  • New Features

    • Dashboard now bundles comprehensive technology detection: per-repo languages, frameworks and per-author language churn; new Technologies sidebar with stacked language bars and framework/tool groups. Timeline and interactions improved (year-band overlay, refined zoom/selection, enriched tooltips and commit-bucket popovers).
  • Documentation

    • README clarifies how detection data is produced, embedded for offline single-file use, and how to refresh it; explains develop-against-source live mode and embedding behaviour.
  • Chores

    • Added a make target to regenerate the detection dataset and updated developer help text.

Review Change Stack

tyom added 9 commits May 21, 2026 19:05
Surface what kind of work was done and what stack a repo uses:

- Per-commit file types in the timeline tooltip.
- Per-author language bar in the contributor popover.
- A repo-wide "Technologies" section: whole-repo language bar plus
  frameworks grouped by language. Always shown; each column explains
  itself when empty (e.g. the remote path's language note).

Detection data is generated, not hand-maintained:

- Languages come from GitHub Linguist (extension/filename -> language,
  official colors, vendored-path noise filter), with fine-grained
  languages folded into their group (TSX -> TypeScript) and a small
  ambiguous-extension override (.md -> Markdown, .h -> C, ...).
- Frameworks are a curated dependency -> framework map. Vercel/Netlify
  target deploy presets, not the libraries a repo uses, so they were a
  poor fit and aren't scraped.
- gen_techdata.py (make repo-intel-techdata) regenerates techdata.json
  from Linguist; the committed JSON is embedded into the artifact by
  build.py, keeping the shipped tool offline and single-file. Not a
  build dependency, so normal builds stay offline/reproducible.

Coverage: languages need a clone (numstat), so they're local/bare-clone
only; frameworks read dependency manifests, which are cheap to fetch via
the REST tree + GraphQL blobs, so they work on the remote path too.

Also enable git rename detection (-M) in the commit walk, so renamed
files are counted once instead of as a delete + add. This slightly
shifts added/deleted totals on repos where diff.renames was disabled.
- Count only files present at HEAD, so churn against deleted files (e.g.
  vendored bash_completion.d scripts) no longer inflates "Other"
- Detect extensionless scripts by shebang (bin/* → Shell, repo-intel → Python)
- Surface Docker/Make as detected tools; rename the section to
  "Frameworks & tools" and give the Tools group a distinct color
- Link each language in the bar to GitHub code search for the repo
- Tighten framework row spacing; move Technologies above Summary
Overlay a faint orange tint on Sat/Sun cells, matching the weekend bands in the commit timeline.
The local path listed `git ls-tree -r HEAD` twice (head_languages +
detect_frameworks) and re-ran the Linguist vendor regex for every
numstat row, re-scanning the same paths across thousands of commits.

- List the HEAD tree once in collect_local; pass the path set into both
  helpers. Rename head_languages -> head_shebangs (no longer owns the tree).
- Cache classify_path per unique numstat field (present/shebang are fixed
  per run), turning per-row work into per-unique-path work.
- Extract the duplicated fetch_failed bail-out into a bail_partial closure.
Render a per-year band between the tags strip and the histogram, with
centered labels that stay on-screen at any zoom and contrasting
alternating fills. Year-boundary verticals now run uninterrupted from
the lanes through the tag strip into the bar (inter-row margins removed,
lines drawn on each surface). Drop the redundant year suffix from axis
labels and the TAGS label, and prefix the tag popover with "TAG".
When the timeline is fully zoomed out (nothing to pan), dragging the
main view now draws a selection rectangle and zooms to that time
window on release, with a crosshair cursor — mirroring the histogram
minimap's select-zoom. Once zoomed in, dragging pans as before.

Rebuilds stow/bin/repo-intel from the template.
Replace per-framework pill chips with comma-separated names in
divider-separated per-language rows, a calmer layout for the
Technologies section. Rebuild the bundled artifact to match.
- Collapse the four near-identical Python/Ruby/Go/Rust framework blocks
  into one config-driven loop in _frameworks_from_files
- Reuse encodeBranch() in the heatmap instead of inlining the same split
- Cache the .year-label NodeList in rebuildYears rather than re-querying
  the DOM every scroll frame in positionYearLabels
- Trim two comments that narrated what the code does

Rebuild stow/bin/repo-intel artifact.
Clicking a bar in the commit-time or day-of-week charts opens a popover
listing that bucket's commits, each row linking to its GitHub commit
page. Buckets are reproduced client-side from the ISO string's
author-local wall-clock to match the server-side histogram, rather than
new Date().getHours() which would shift to the viewer's timezone.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc49d4fe-e219-4f34-96c9-e345b8c2e063

📥 Commits

Reviewing files that changed from the base of the PR and between f62c5d9 and 34d8cb7.

📒 Files selected for processing (4)
  • src/repo-intel/gen_techdata.py
  • src/repo-intel/repo-intel.py
  • src/repo-intel/techdata.json
  • stow/bin/repo-intel
💤 Files with no reviewable changes (1)
  • src/repo-intel/techdata.json
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🧰 Additional context used
🪛 Ruff (0.15.13)
src/repo-intel/repo-intel.py

[error] 470-470: subprocess call: check for execution of untrusted input

(S603)


[error] 471-471: Starting a process with a partial executable path

(S607)

🔇 Additional comments (6)
src/repo-intel/gen_techdata.py (1)

46-51: LGTM!

Also applies to: 260-261

src/repo-intel/repo-intel.py (5)

8-8: LGTM!

Also applies to: 23-25, 40-40, 52-53, 181-181, 207-210, 266-266


467-475: LGTM!


728-728: LGTM!


774-779: LGTM!


1820-1821: LGTM!

Also applies to: 1828-1835, 1841-1842, 1852-1855, 1870-1870


📝 Walkthrough

Walkthrough

Adds a Linguist-backed techdata generator and committed dataset, embeds techdata into the bundled repo-intel script, integrates per-file language classification and manifest-driven framework detection (local and remote) into data generation, and updates the dashboard template with a Technologies section, timeline year-band, drag-to-zoom selection, and enriched tooltips.

Changes

Language and Framework Detection System

Layer / File(s) Summary
Techdata generation infrastructure
Makefile, src/repo-intel/build.py, src/repo-intel/gen_techdata.py
New repo-intel-techdata make target regenerates techdata.json; build.py now requires and embeds both template.html and techdata.json via distinct placeholders; gen_techdata.py fetches Linguist YAML, builds extension/filename/vendor tables, and emits curated framework mappings.
Techdata JSON dataset
src/repo-intel/techdata.json
Committed snapshot containing _source, fw_deps, fw_sentinels_js, fw_sentinels_other, lang.color, ext, filename, and vendor patterns used for offline detection.
Detection core and utilities
src/repo-intel/repo-intel.py (embedded TECHDATA & utilities)
Loads embedded TECHDATA, compiles vendor regexes, defines shebang mappings, implements rename-aware numstat path resolution and classify_path heuristics, and provides top_languages aggregation.
Local repository language aggregation
src/repo-intel/repo-intel.py (collect_local changes)
collect_local builds HEAD present set and shebang map, classifies numstat paths to produce per-commit lang_stats and HEAD-only frameworks, and returns extras including those values.
Remote framework detection via manifests
src/repo-intel/repo-intel.py (REST/GraphQL helpers)
Adds gh_rest_get/fetch_blob_texts and remote fetch paths to list repo tree, fetch relevant manifest blobs, detect frameworks without cloning, and fetch repo-wide Linguist breakdown.
Consistent partial-fetch bail handling
src/repo-intel/repo-intel.py (collect_remote)
Introduces bail_partial(nodes) to persist a contiguous partial prefix and standardise early-exit caching/exit behaviour for repeated fetch failures.
Language and framework data integration
src/repo-intel/repo-intel.py (build_data/main)
build_data accepts extras, merges per-commit lang_stats into per-author and repo totals, adds per-commit file-type metadata, and includes repoLanguages, repoLanguagesBasis, and frameworks in the final dashboard model; main() forwards extras.
Technologies dashboard section and author language display
src/repo-intel/template.html
Adds Technologies sidebar link/section, renderTech() and langBarHtml(), per-author stacked language bars in popovers, and empty-state messaging when per-file data is unavailable.
Timeline year-band overlay and rendering
src/repo-intel/template.html
Adds years-band DOM/CSS, builds/caches year labels, repositions labels per scroll frame, and draws continuous year-boundary verticals on lane and tag canvases.
Timeline drag-to-zoom selection interaction
src/repo-intel/template.html
Implements drag-to-zoom selection when zoomed out (SEL_THRESHOLD), draws selection overlay during drag, converts selection to new zoom on release, and updates cursor/interaction modes.
Enhanced commit tooltips and pattern-chart popovers
src/repo-intel/template.html
Commit hover tooltips aggregate per-file-type counts (top-N + remainder); tag tooltips gain a "TAG" kicker; pattern charts add commit-bucket popovers with UTC bucketing and interactive dismissal handlers.
Documentation updates
src/repo-intel/README.md
Clarifies local vs remote detection differences, documents techdata generation from Linguist, adds make repo-intel-techdata guidance, and explains how build.py embeds TEMPLATE and TECHDATA placeholders.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • tyom/dotfiles#12: Both PRs modify src/repo-intel/repo-intel.py data-model and build integration points; this PR adds techdata-backed language stats, remote framework detection, and UI rendering for technologies.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the main changes: adding tech-stats (languages, frameworks) and interactive features (timeline, activity patterns) to repo-intel. It is specific, concise, and directly reflects the primary objectives of the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tech-stats

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/repo-intel/gen_techdata.py`:
- Around line 203-224: The code assigns extensions and filenames to the folded
group name stored in eff (from info.get("group") or name) which may not exist in
name_color, creating language entries without colors; update the logic in the
loops that set ext_lang and filename_lang so you only assign ext_lang[key] = eff
and filename_lang.setdefault(fn.lower(), eff) when eff is present in name_color
(or otherwise fall back to name instead of the group), and ensure ext_meta
updates still use the same guarded eff check; adjust the code paths around eff,
ext_lang, ext_meta, filename_lang and name_color to perform this presence check
before writing mappings.
- Around line 141-144: Add a network timeout to the remote fetch to avoid hangs
by updating fetch() to pass a sensible timeout (e.g., timeout=10) into
urllib.request.urlopen(req, timeout=...) and handle exceptions as needed; and
fix group-folding logic by changing the computation of eff (where currently eff
= info.get("group") or name) to only fold to the group when that group exists in
the name_color mapping (e.g., eff = group if group and group in name_color else
name), referencing the eff variable, info.get("group") call, and the name_color
map so invalid group names (like the "Checksums" mismatch) are not used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b52fb6b6-8fd5-4f5d-94c1-6968729bb6db

📥 Commits

Reviewing files that changed from the base of the PR and between d17302e and 20bbecd.

📒 Files selected for processing (8)
  • Makefile
  • src/repo-intel/README.md
  • src/repo-intel/build.py
  • src/repo-intel/gen_techdata.py
  • src/repo-intel/repo-intel.py
  • src/repo-intel/techdata.json
  • src/repo-intel/template.html
  • stow/bin/repo-intel
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🧰 Additional context used
🪛 LanguageTool
src/repo-intel/README.md

[grammar] ~151-~151: Make sure that the adjective ‘small’ is correct. Possibly, it should be an adverb (typically ~ly) that modifies ‘curated’. Possibly, it should be the first word in a compound adjective (hyphenated adjective). Possibly, it is correct.
Context: ...pt) and vendor.yml`. Frameworks are a small curated dependency → framework map maintained i...

(ADVERB_OR_HYPHENATED_ADJECTIVE)

🪛 Ruff (0.15.13)
src/repo-intel/build.py

[warning] 34-34: Loop control variable name not used within loop body

Rename unused name to _name

(B007)

src/repo-intel/gen_techdata.py

[error] 142-142: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


[error] 143-143: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

src/repo-intel/repo-intel.py

[error] 458-458: subprocess call: check for execution of untrusted input

(S603)


[error] 458-458: Starting a process with a partial executable path

(S607)


[error] 473-473: subprocess call: check for execution of untrusted input

(S603)


[error] 474-474: Starting a process with a partial executable path

(S607)


[error] 1131-1131: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🔇 Additional comments (33)
src/repo-intel/README.md (1)

11-20: LGTM!

Also applies to: 138-155, 167-177, 186-204

src/repo-intel/repo-intel.py (16)

279-352: LGTM!


355-369: LGTM!


372-385: LGTM!


388-416: LGTM!


419-454: LGTM!


457-479: LGTM!


482-600: LGTM!


603-622: LGTM!


773-848: LGTM!


1121-1132: LGTM!


1135-1182: LGTM!


1185-1212: LGTM!


1357-1366: LGTM!


1465-1480: LGTM!


1532-1691: LGTM!


1786-1846: LGTM!

src/repo-intel/template.html (14)

137-262: LGTM!


273-292: LGTM!


433-435: LGTM!


502-530: LGTM!


535-561: LGTM!


632-633: LGTM!


774-778: LGTM!

Also applies to: 996-1035


936-951: LGTM!


1098-1123: LGTM!


1515-1598: LGTM!


1466-1481: LGTM!


171-171: LGTM!

Also applies to: 1707-1707


1873-1947: LGTM!


1949-1961: LGTM!

Makefile (1)

53-54: LGTM!

Also applies to: 56-56, 59-59

src/repo-intel/build.py (1)

4-6: LGTM!

Also applies to: 13-14, 26-45

Comment thread src/repo-intel/gen_techdata.py
Comment thread src/repo-intel/gen_techdata.py Outdated
tyom added 2 commits May 21, 2026 21:50
Folding into a non-colored group (e.g. Checksums) seeded color-less
language entries. Only fold when the group is itself a colored language,
else fall back to the language's own name (guaranteed colored). Also add
a 10s timeout to the Linguist fetch to avoid hangs.
The GraphQL remote path left the language bar empty since per-file
languages are too expensive to fetch over the network. GitHub already
runs Linguist and exposes repo-wide bytes-per-language at HEAD, so
fetch the aggregate instead: it fills the repo bar (not per-author or
per-commit stats, which still need a clone). Label distinguishes the
byte-snapshot basis ('by code size') from local line churn.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/repo-intel/repo-intel.py`:
- Around line 1577-1580: The code currently treats a present-but-empty remote
key the same as missing data by using truthiness on repo_languages ((extras or
{}).get("repo_languages") or []), which breaks remote-language semantics; change
the logic so you detect presence of the key itself instead of truthiness:
replace the assignment and any later truthy checks with an explicit presence
check (e.g., extras = extras or {}; if "repo_languages" in extras:
repo_languages = extras.get("repo_languages") else: repo_languages = None or []
depending on downstream expectations) and update subsequent conditionals that
currently do truthy checks on repo_languages to instead check for key presence
(e.g., if "repo_languages" in extras) so an empty list from remote is preserved
as a remote-run indicator rather than treated as missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6c44e833-a18a-41a8-b619-da5327266ef9

📥 Commits

Reviewing files that changed from the base of the PR and between f2aa7b3 and 58ead89.

📒 Files selected for processing (3)
  • src/repo-intel/repo-intel.py
  • src/repo-intel/template.html
  • stow/bin/repo-intel
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (2)
src/repo-intel/repo-intel.py (1)

1215-1250: LGTM!

Also applies to: 1503-1509, 1520-1520

src/repo-intel/template.html (1)

543-550: LGTM!

Comment thread src/repo-intel/repo-intel.py
tyom added 3 commits May 21, 2026 22:11
Add Storybook (canonical scoped adapters), Testing Library, Puppeteer,
TestCafe, Biome, Turborepo, Nx, SWC, Babel, GraphQL, tRPC, Drizzle to the
npm map; add pnpm/Yarn/Bun, GitLab CI, Vercel, Netlify, and GitHub Actions
as Tools-bucket sentinels. Support a directory-prefix sentinel shape
(trailing slash) for .github/workflows/. Regenerate techdata + artifact.
Forces a bare git clone instead of the GitHub GraphQL API even when a
token is present, unlocking per-author language churn the API can't
provide. Also fixes the previously-broken no-token fallback path:
collect_local crashed on bare repos via rev-parse --show-toplevel, and
expected-failure git probes leaked stderr.
The canonical Linguist owner of .txt is the colorless 'Text' language,
which the table builder drops — letting a niche colored claimant (Adblock
Filter List, type=data) win the extension. LICENSE.txt and friends were
mislabeled on the local/clone path. Add an EXT_EXCLUDE set so generic
'Text' extensions stay unassigned and fall into 'Other', matching GitHub.
@tyom tyom merged commit c87a0d2 into master May 21, 2026
3 checks passed
@tyom tyom deleted the tech-stats branch May 21, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant