Add tech-stats to repo-intel: languages, frameworks, timeline & activity patterns by tyom · Pull Request #14 · tyom/dotfiles

tyom · 2026-05-21T19:48:14Z

Summary

Builds out the repo-intel dashboard with a richer set of repository insights and interactions.

Languages & frameworks — per-contributor and repo-wide language breakdowns plus detected frameworks (Technologies section), with improved language detection.
Commit timeline — a 14-year scrollable timeline with a years bar, drag-to-zoom rubber band, pan/inertia, and hover details. Click a commit to open it on GitHub.
Activity patterns — commit-time (hour of day) and day-of-week histograms per contributor, with weekends marked. Clicking a bar opens a popover listing that bucket's commits, each linking to its GitHub commit page. Buckets are reproduced client-side from the commit's author-local wall-clock so they match the server-side histogram.
Contributions heatmap — weekends marked.
Resilience & cleanup — tolerate GitHub history-fetch 502s, avoid collapsing non-retryable HTTP errors, dedupe HEAD tree listing, cache path classification, and unify framework detection.

Test plan

Generated dashboards from local and remote repos via make repo-intel-dev / the bundled stow/bin/repo-intel.
Verified in a headless browser that pattern-chart bar clicks open the popover, popover row counts exactly match each bar's value across all hour/day buckets, rows link to valid /commit/<sha> URLs, and the displayed times fall within the clicked bucket.
Rebuilt the bundled stow/bin/repo-intel via make repo-intel-build.

Summary by CodeRabbit

New Features
- Dashboard now bundles comprehensive technology detection: per-repo languages, frameworks and per-author language churn; new Technologies sidebar with stacked language bars and framework/tool groups. Timeline and interactions improved (year-band overlay, refined zoom/selection, enriched tooltips and commit-bucket popovers).
Documentation
- README clarifies how detection data is produced, embedded for offline single-file use, and how to refresh it; explains develop-against-source live mode and embedding behaviour.
Chores
- Added a make target to regenerate the detection dataset and updated developer help text.

Surface what kind of work was done and what stack a repo uses: - Per-commit file types in the timeline tooltip. - Per-author language bar in the contributor popover. - A repo-wide "Technologies" section: whole-repo language bar plus frameworks grouped by language. Always shown; each column explains itself when empty (e.g. the remote path's language note). Detection data is generated, not hand-maintained: - Languages come from GitHub Linguist (extension/filename -> language, official colors, vendored-path noise filter), with fine-grained languages folded into their group (TSX -> TypeScript) and a small ambiguous-extension override (.md -> Markdown, .h -> C, ...). - Frameworks are a curated dependency -> framework map. Vercel/Netlify target deploy presets, not the libraries a repo uses, so they were a poor fit and aren't scraped. - gen_techdata.py (make repo-intel-techdata) regenerates techdata.json from Linguist; the committed JSON is embedded into the artifact by build.py, keeping the shipped tool offline and single-file. Not a build dependency, so normal builds stay offline/reproducible. Coverage: languages need a clone (numstat), so they're local/bare-clone only; frameworks read dependency manifests, which are cheap to fetch via the REST tree + GraphQL blobs, so they work on the remote path too. Also enable git rename detection (-M) in the commit walk, so renamed files are counted once instead of as a delete + add. This slightly shifts added/deleted totals on repos where diff.renames was disabled.

- Count only files present at HEAD, so churn against deleted files (e.g. vendored bash_completion.d scripts) no longer inflates "Other" - Detect extensionless scripts by shebang (bin/* → Shell, repo-intel → Python) - Surface Docker/Make as detected tools; rename the section to "Frameworks & tools" and give the Tools group a distinct color - Link each language in the bar to GitHub code search for the repo - Tighten framework row spacing; move Technologies above Summary

Overlay a faint orange tint on Sat/Sun cells, matching the weekend bands in the commit timeline.

The local path listed `git ls-tree -r HEAD` twice (head_languages + detect_frameworks) and re-ran the Linguist vendor regex for every numstat row, re-scanning the same paths across thousands of commits. - List the HEAD tree once in collect_local; pass the path set into both helpers. Rename head_languages -> head_shebangs (no longer owns the tree). - Cache classify_path per unique numstat field (present/shebang are fixed per run), turning per-row work into per-unique-path work. - Extract the duplicated fetch_failed bail-out into a bail_partial closure.

Render a per-year band between the tags strip and the histogram, with centered labels that stay on-screen at any zoom and contrasting alternating fills. Year-boundary verticals now run uninterrupted from the lanes through the tag strip into the bar (inter-row margins removed, lines drawn on each surface). Drop the redundant year suffix from axis labels and the TAGS label, and prefix the tag popover with "TAG".

When the timeline is fully zoomed out (nothing to pan), dragging the main view now draws a selection rectangle and zooms to that time window on release, with a crosshair cursor — mirroring the histogram minimap's select-zoom. Once zoomed in, dragging pans as before. Rebuilds stow/bin/repo-intel from the template.

Replace per-framework pill chips with comma-separated names in divider-separated per-language rows, a calmer layout for the Technologies section. Rebuild the bundled artifact to match.

- Collapse the four near-identical Python/Ruby/Go/Rust framework blocks into one config-driven loop in _frameworks_from_files - Reuse encodeBranch() in the heatmap instead of inlining the same split - Cache the .year-label NodeList in rebuildYears rather than re-querying the DOM every scroll frame in positionYearLabels - Trim two comments that narrated what the code does Rebuild stow/bin/repo-intel artifact.

Clicking a bar in the commit-time or day-of-week charts opens a popover listing that bucket's commits, each row linking to its GitHub commit page. Buckets are reproduced client-side from the ISO string's author-local wall-clock to match the server-side histogram, rather than new Date().getHours() which would shift to the viewer's timezone.

coderabbitai · 2026-05-21T19:48:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
✅ Review completed - (🔄 Check again to review again)

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc49d4fe-e219-4f34-96c9-e345b8c2e063

📥 Commits

Reviewing files that changed from the base of the PR and between f62c5d9 and 34d8cb7.

📒 Files selected for processing (4)

src/repo-intel/gen_techdata.py
src/repo-intel/repo-intel.py
src/repo-intel/techdata.json
stow/bin/repo-intel

💤 Files with no reviewable changes (1)

src/repo-intel/techdata.json

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🧰 Additional context used

🪛 Ruff (0.15.13)

src/repo-intel/repo-intel.py

[error] 470-470: subprocess call: check for execution of untrusted input

(S603)

[error] 471-471: Starting a process with a partial executable path

(S607)

🔇 Additional comments (6)

src/repo-intel/gen_techdata.py (1)

46-51: LGTM!

Also applies to: 260-261

src/repo-intel/repo-intel.py (5)

8-8: LGTM!

Also applies to: 23-25, 40-40, 52-53, 181-181, 207-210, 266-266

467-475: LGTM!

728-728: LGTM!

774-779: LGTM!

1820-1821: LGTM!

Also applies to: 1828-1835, 1841-1842, 1852-1855, 1870-1870

📝 Walkthrough

Walkthrough

Adds a Linguist-backed techdata generator and committed dataset, embeds techdata into the bundled repo-intel script, integrates per-file language classification and manifest-driven framework detection (local and remote) into data generation, and updates the dashboard template with a Technologies section, timeline year-band, drag-to-zoom selection, and enriched tooltips.

Changes

Language and Framework Detection System

Layer / File(s)	Summary
Techdata generation infrastructure `Makefile`, `src/repo-intel/build.py`, `src/repo-intel/gen_techdata.py`	New `repo-intel-techdata` make target regenerates `techdata.json`; `build.py` now requires and embeds both `template.html` and `techdata.json` via distinct placeholders; `gen_techdata.py` fetches Linguist YAML, builds extension/filename/vendor tables, and emits curated framework mappings.
Techdata JSON dataset `src/repo-intel/techdata.json`	Committed snapshot containing `_source`, `fw_deps`, `fw_sentinels_js`, `fw_sentinels_other`, `lang.color`, `ext`, `filename`, and `vendor` patterns used for offline detection.
Detection core and utilities `src/repo-intel/repo-intel.py` (embedded TECHDATA & utilities)	Loads embedded TECHDATA, compiles vendor regexes, defines shebang mappings, implements rename-aware numstat path resolution and classify_path heuristics, and provides top_languages aggregation.
Local repository language aggregation `src/repo-intel/repo-intel.py` (collect_local changes)	collect_local builds HEAD present set and shebang map, classifies numstat paths to produce per-commit `lang_stats` and HEAD-only `frameworks`, and returns `extras` including those values.
Remote framework detection via manifests `src/repo-intel/repo-intel.py` (REST/GraphQL helpers)	Adds `gh_rest_get`/`fetch_blob_texts` and remote fetch paths to list repo tree, fetch relevant manifest blobs, detect frameworks without cloning, and fetch repo-wide Linguist breakdown.
Consistent partial-fetch bail handling `src/repo-intel/repo-intel.py` (collect_remote)	Introduces `bail_partial(nodes)` to persist a contiguous partial prefix and standardise early-exit caching/exit behaviour for repeated fetch failures.
Language and framework data integration `src/repo-intel/repo-intel.py` (build_data/main)	`build_data` accepts `extras`, merges per-commit `lang_stats` into per-author and repo totals, adds per-commit file-type metadata, and includes `repoLanguages`, `repoLanguagesBasis`, and `frameworks` in the final dashboard model; `main()` forwards `extras`.
Technologies dashboard section and author language display `src/repo-intel/template.html`	Adds Technologies sidebar link/section, `renderTech()` and `langBarHtml()`, per-author stacked language bars in popovers, and empty-state messaging when per-file data is unavailable.
Timeline year-band overlay and rendering `src/repo-intel/template.html`	Adds years-band DOM/CSS, builds/caches year labels, repositions labels per scroll frame, and draws continuous year-boundary verticals on lane and tag canvases.
Timeline drag-to-zoom selection interaction `src/repo-intel/template.html`	Implements drag-to-zoom selection when zoomed out (`SEL_THRESHOLD`), draws selection overlay during drag, converts selection to new zoom on release, and updates cursor/interaction modes.
Enhanced commit tooltips and pattern-chart popovers `src/repo-intel/template.html`	Commit hover tooltips aggregate per-file-type counts (top-N + remainder); tag tooltips gain a "TAG" kicker; pattern charts add commit-bucket popovers with UTC bucketing and interactive dismissal handlers.
Documentation updates `src/repo-intel/README.md`	Clarifies local vs remote detection differences, documents techdata generation from Linguist, adds `make repo-intel-techdata` guidance, and explains how `build.py` embeds `TEMPLATE` and `TECHDATA` placeholders.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tyom/dotfiles#12: Both PRs modify src/repo-intel/repo-intel.py data-model and build integration points; this PR adds techdata-backed language stats, remote framework detection, and UI rendering for technologies.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarises the main changes: adding tech-stats (languages, frameworks) and interactive features (timeline, activity patterns) to repo-intel. It is specific, concise, and directly reflects the primary objectives of the pull request.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch tech-stats

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/repo-intel/gen_techdata.py`:
- Around line 203-224: The code assigns extensions and filenames to the folded
group name stored in eff (from info.get("group") or name) which may not exist in
name_color, creating language entries without colors; update the logic in the
loops that set ext_lang and filename_lang so you only assign ext_lang[key] = eff
and filename_lang.setdefault(fn.lower(), eff) when eff is present in name_color
(or otherwise fall back to name instead of the group), and ensure ext_meta
updates still use the same guarded eff check; adjust the code paths around eff,
ext_lang, ext_meta, filename_lang and name_color to perform this presence check
before writing mappings.
- Around line 141-144: Add a network timeout to the remote fetch to avoid hangs
by updating fetch() to pass a sensible timeout (e.g., timeout=10) into
urllib.request.urlopen(req, timeout=...) and handle exceptions as needed; and
fix group-folding logic by changing the computation of eff (where currently eff
= info.get("group") or name) to only fold to the group when that group exists in
the name_color mapping (e.g., eff = group if group and group in name_color else
name), referencing the eff variable, info.get("group") call, and the name_color
map so invalid group names (like the "Checksums" mismatch) are not used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b52fb6b6-8fd5-4f5d-94c1-6968729bb6db

📥 Commits

Reviewing files that changed from the base of the PR and between d17302e and 20bbecd.

📒 Files selected for processing (8)

Makefile
src/repo-intel/README.md
src/repo-intel/build.py
src/repo-intel/gen_techdata.py
src/repo-intel/repo-intel.py
src/repo-intel/techdata.json
src/repo-intel/template.html
stow/bin/repo-intel

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🧰 Additional context used

🪛 LanguageTool

src/repo-intel/README.md

[grammar] ~151-~151: Make sure that the adjective ‘small’ is correct. Possibly, it should be an adverb (typically ~ly) that modifies ‘curated’. Possibly, it should be the first word in a compound adjective (hyphenated adjective). Possibly, it is correct.
Context: ...pt) and vendor.yml`. Frameworks are a small curated dependency → framework map maintained i...

(ADVERB_OR_HYPHENATED_ADJECTIVE)

🪛 Ruff (0.15.13)

src/repo-intel/build.py

[warning] 34-34: Loop control variable name not used within loop body

Rename unused name to _name

(B007)

src/repo-intel/gen_techdata.py

[error] 142-142: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

[error] 143-143: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

src/repo-intel/repo-intel.py

[error] 458-458: subprocess call: check for execution of untrusted input

(S603)

[error] 458-458: Starting a process with a partial executable path

(S607)

[error] 473-473: subprocess call: check for execution of untrusted input

(S603)

[error] 474-474: Starting a process with a partial executable path

(S607)

[error] 1131-1131: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🔇 Additional comments (33)

src/repo-intel/README.md (1)

11-20: LGTM!

Also applies to: 138-155, 167-177, 186-204

src/repo-intel/repo-intel.py (16)

279-352: LGTM!

355-369: LGTM!

372-385: LGTM!

388-416: LGTM!

419-454: LGTM!

457-479: LGTM!

482-600: LGTM!

603-622: LGTM!

773-848: LGTM!

1121-1132: LGTM!

1135-1182: LGTM!

1185-1212: LGTM!

1357-1366: LGTM!

1465-1480: LGTM!

1532-1691: LGTM!

1786-1846: LGTM!

src/repo-intel/template.html (14)

137-262: LGTM!

273-292: LGTM!

433-435: LGTM!

502-530: LGTM!

535-561: LGTM!

632-633: LGTM!

774-778: LGTM!

Also applies to: 996-1035

936-951: LGTM!

1098-1123: LGTM!

1515-1598: LGTM!

1466-1481: LGTM!

171-171: LGTM!

Also applies to: 1707-1707

1873-1947: LGTM!

1949-1961: LGTM!

Makefile (1)

53-54: LGTM!

Also applies to: 56-56, 59-59

src/repo-intel/build.py (1)

4-6: LGTM!

Also applies to: 13-14, 26-45

Folding into a non-colored group (e.g. Checksums) seeded color-less language entries. Only fold when the group is itself a colored language, else fall back to the language's own name (guaranteed colored). Also add a 10s timeout to the Linguist fetch to avoid hangs.

The GraphQL remote path left the language bar empty since per-file languages are too expensive to fetch over the network. GitHub already runs Linguist and exposes repo-wide bytes-per-language at HEAD, so fetch the aggregate instead: it fills the repo bar (not per-author or per-commit stats, which still need a clone). Label distinguishes the byte-snapshot basis ('by code size') from local line churn.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/repo-intel/repo-intel.py`:
- Around line 1577-1580: The code currently treats a present-but-empty remote
key the same as missing data by using truthiness on repo_languages ((extras or
{}).get("repo_languages") or []), which breaks remote-language semantics; change
the logic so you detect presence of the key itself instead of truthiness:
replace the assignment and any later truthy checks with an explicit presence
check (e.g., extras = extras or {}; if "repo_languages" in extras:
repo_languages = extras.get("repo_languages") else: repo_languages = None or []
depending on downstream expectations) and update subsequent conditionals that
currently do truthy checks on repo_languages to instead check for key presence
(e.g., if "repo_languages" in extras) so an empty list from remote is preserved
as a remote-run indicator rather than treated as missing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6c44e833-a18a-41a8-b619-da5327266ef9

📥 Commits

Reviewing files that changed from the base of the PR and between f2aa7b3 and 58ead89.

📒 Files selected for processing (3)

src/repo-intel/repo-intel.py
src/repo-intel/template.html
stow/bin/repo-intel

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (2)

src/repo-intel/repo-intel.py (1)

1215-1250: LGTM!

Also applies to: 1503-1509, 1520-1520

src/repo-intel/template.html (1)

543-550: LGTM!

Add Storybook (canonical scoped adapters), Testing Library, Puppeteer, TestCafe, Biome, Turborepo, Nx, SWC, Babel, GraphQL, tRPC, Drizzle to the npm map; add pnpm/Yarn/Bun, GitLab CI, Vercel, Netlify, and GitHub Actions as Tools-bucket sentinels. Support a directory-prefix sentinel shape (trailing slash) for .github/workflows/. Regenerate techdata + artifact.

Forces a bare git clone instead of the GitHub GraphQL API even when a token is present, unlocking per-author language churn the API can't provide. Also fixes the previously-broken no-token fallback path: collect_local crashed on bare repos via rev-parse --show-toplevel, and expected-failure git probes leaked stderr.

The canonical Linguist owner of .txt is the colorless 'Text' language, which the table builder drops — letting a niche colored claimant (Adblock Filter List, type=data) win the extension. LICENSE.txt and friends were mislabeled on the local/clone path. Add an EXT_EXCLUDE set so generic 'Text' extensions stay unassigned and fall into 'Other', matching GitHub.

tyom added 9 commits May 21, 2026 19:05

Mark weekends in repo-intel contributions heatmap

f725270

Overlay a faint orange tint on Sat/Sun cells, matching the weekend bands in the commit timeline.

Restyle repo-intel framework list as comma-separated rows

cc3fb02

Replace per-framework pill chips with comma-separated names in divider-separated per-language rows, a calmer layout for the Technologies section. Rebuild the bundled artifact to match.

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/repo-intel/gen_techdata.py

Comment thread src/repo-intel/gen_techdata.py Outdated

tyom added 2 commits May 21, 2026 21:50

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/repo-intel/repo-intel.py

tyom added 3 commits May 21, 2026 22:11

tyom merged commit c87a0d2 into master May 21, 2026
3 checks passed

tyom deleted the tech-stats branch May 21, 2026 22:01

coderabbitai Bot mentioned this pull request May 21, 2026

Extract repo-intel into standalone repo #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tech-stats to repo-intel: languages, frameworks, timeline & activity patterns#14

Add tech-stats to repo-intel: languages, frameworks, timeline & activity patterns#14
tyom merged 14 commits into
masterfrom
tech-stats

tyom commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tyom commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tyom commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading