Add tech-stats to repo-intel: languages, frameworks, timeline & activity patterns#14
Conversation
Surface what kind of work was done and what stack a repo uses: - Per-commit file types in the timeline tooltip. - Per-author language bar in the contributor popover. - A repo-wide "Technologies" section: whole-repo language bar plus frameworks grouped by language. Always shown; each column explains itself when empty (e.g. the remote path's language note). Detection data is generated, not hand-maintained: - Languages come from GitHub Linguist (extension/filename -> language, official colors, vendored-path noise filter), with fine-grained languages folded into their group (TSX -> TypeScript) and a small ambiguous-extension override (.md -> Markdown, .h -> C, ...). - Frameworks are a curated dependency -> framework map. Vercel/Netlify target deploy presets, not the libraries a repo uses, so they were a poor fit and aren't scraped. - gen_techdata.py (make repo-intel-techdata) regenerates techdata.json from Linguist; the committed JSON is embedded into the artifact by build.py, keeping the shipped tool offline and single-file. Not a build dependency, so normal builds stay offline/reproducible. Coverage: languages need a clone (numstat), so they're local/bare-clone only; frameworks read dependency manifests, which are cheap to fetch via the REST tree + GraphQL blobs, so they work on the remote path too. Also enable git rename detection (-M) in the commit walk, so renamed files are counted once instead of as a delete + add. This slightly shifts added/deleted totals on repos where diff.renames was disabled.
- Count only files present at HEAD, so churn against deleted files (e.g. vendored bash_completion.d scripts) no longer inflates "Other" - Detect extensionless scripts by shebang (bin/* → Shell, repo-intel → Python) - Surface Docker/Make as detected tools; rename the section to "Frameworks & tools" and give the Tools group a distinct color - Link each language in the bar to GitHub code search for the repo - Tighten framework row spacing; move Technologies above Summary
Overlay a faint orange tint on Sat/Sun cells, matching the weekend bands in the commit timeline.
The local path listed `git ls-tree -r HEAD` twice (head_languages + detect_frameworks) and re-ran the Linguist vendor regex for every numstat row, re-scanning the same paths across thousands of commits. - List the HEAD tree once in collect_local; pass the path set into both helpers. Rename head_languages -> head_shebangs (no longer owns the tree). - Cache classify_path per unique numstat field (present/shebang are fixed per run), turning per-row work into per-unique-path work. - Extract the duplicated fetch_failed bail-out into a bail_partial closure.
Render a per-year band between the tags strip and the histogram, with centered labels that stay on-screen at any zoom and contrasting alternating fills. Year-boundary verticals now run uninterrupted from the lanes through the tag strip into the bar (inter-row margins removed, lines drawn on each surface). Drop the redundant year suffix from axis labels and the TAGS label, and prefix the tag popover with "TAG".
When the timeline is fully zoomed out (nothing to pan), dragging the main view now draws a selection rectangle and zooms to that time window on release, with a crosshair cursor — mirroring the histogram minimap's select-zoom. Once zoomed in, dragging pans as before. Rebuilds stow/bin/repo-intel from the template.
Replace per-framework pill chips with comma-separated names in divider-separated per-language rows, a calmer layout for the Technologies section. Rebuild the bundled artifact to match.
- Collapse the four near-identical Python/Ruby/Go/Rust framework blocks into one config-driven loop in _frameworks_from_files - Reuse encodeBranch() in the heatmap instead of inlining the same split - Cache the .year-label NodeList in rebuildYears rather than re-querying the DOM every scroll frame in positionYearLabels - Trim two comments that narrated what the code does Rebuild stow/bin/repo-intel artifact.
Clicking a bar in the commit-time or day-of-week charts opens a popover listing that bucket's commits, each row linking to its GitHub commit page. Buckets are reproduced client-side from the ISO string's author-local wall-clock to match the server-side histogram, rather than new Date().getHours() which would shift to the viewer's timezone.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
💤 Files with no reviewable changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🧰 Additional context used🪛 Ruff (0.15.13)src/repo-intel/repo-intel.py[error] 470-470: (S603) [error] 471-471: Starting a process with a partial executable path (S607) 🔇 Additional comments (6)
📝 WalkthroughWalkthroughAdds a Linguist-backed techdata generator and committed dataset, embeds techdata into the bundled repo-intel script, integrates per-file language classification and manifest-driven framework detection (local and remote) into data generation, and updates the dashboard template with a Technologies section, timeline year-band, drag-to-zoom selection, and enriched tooltips. ChangesLanguage and Framework Detection System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/repo-intel/gen_techdata.py`:
- Around line 203-224: The code assigns extensions and filenames to the folded
group name stored in eff (from info.get("group") or name) which may not exist in
name_color, creating language entries without colors; update the logic in the
loops that set ext_lang and filename_lang so you only assign ext_lang[key] = eff
and filename_lang.setdefault(fn.lower(), eff) when eff is present in name_color
(or otherwise fall back to name instead of the group), and ensure ext_meta
updates still use the same guarded eff check; adjust the code paths around eff,
ext_lang, ext_meta, filename_lang and name_color to perform this presence check
before writing mappings.
- Around line 141-144: Add a network timeout to the remote fetch to avoid hangs
by updating fetch() to pass a sensible timeout (e.g., timeout=10) into
urllib.request.urlopen(req, timeout=...) and handle exceptions as needed; and
fix group-folding logic by changing the computation of eff (where currently eff
= info.get("group") or name) to only fold to the group when that group exists in
the name_color mapping (e.g., eff = group if group and group in name_color else
name), referencing the eff variable, info.get("group") call, and the name_color
map so invalid group names (like the "Checksums" mismatch) are not used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b52fb6b6-8fd5-4f5d-94c1-6968729bb6db
📒 Files selected for processing (8)
Makefilesrc/repo-intel/README.mdsrc/repo-intel/build.pysrc/repo-intel/gen_techdata.pysrc/repo-intel/repo-intel.pysrc/repo-intel/techdata.jsonsrc/repo-intel/template.htmlstow/bin/repo-intel
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🧰 Additional context used
🪛 LanguageTool
src/repo-intel/README.md
[grammar] ~151-~151: Make sure that the adjective ‘small’ is correct. Possibly, it should be an adverb (typically ~ly) that modifies ‘curated’. Possibly, it should be the first word in a compound adjective (hyphenated adjective). Possibly, it is correct.
Context: ...pt) and vendor.yml`. Frameworks are a small curated dependency → framework map maintained i...
(ADVERB_OR_HYPHENATED_ADJECTIVE)
🪛 Ruff (0.15.13)
src/repo-intel/build.py
[warning] 34-34: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
src/repo-intel/gen_techdata.py
[error] 142-142: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
[error] 143-143: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
src/repo-intel/repo-intel.py
[error] 458-458: subprocess call: check for execution of untrusted input
(S603)
[error] 458-458: Starting a process with a partial executable path
(S607)
[error] 473-473: subprocess call: check for execution of untrusted input
(S603)
[error] 474-474: Starting a process with a partial executable path
(S607)
[error] 1131-1131: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
🔇 Additional comments (33)
src/repo-intel/README.md (1)
11-20: LGTM!Also applies to: 138-155, 167-177, 186-204
src/repo-intel/repo-intel.py (16)
279-352: LGTM!
355-369: LGTM!
372-385: LGTM!
388-416: LGTM!
419-454: LGTM!
457-479: LGTM!
482-600: LGTM!
603-622: LGTM!
773-848: LGTM!
1121-1132: LGTM!
1135-1182: LGTM!
1185-1212: LGTM!
1357-1366: LGTM!
1465-1480: LGTM!
1532-1691: LGTM!
1786-1846: LGTM!src/repo-intel/template.html (14)
137-262: LGTM!
273-292: LGTM!
433-435: LGTM!
502-530: LGTM!
535-561: LGTM!
632-633: LGTM!
774-778: LGTM!Also applies to: 996-1035
936-951: LGTM!
1098-1123: LGTM!
1515-1598: LGTM!
1466-1481: LGTM!
171-171: LGTM!Also applies to: 1707-1707
1873-1947: LGTM!
1949-1961: LGTM!Makefile (1)
53-54: LGTM!Also applies to: 56-56, 59-59
src/repo-intel/build.py (1)
4-6: LGTM!Also applies to: 13-14, 26-45
Folding into a non-colored group (e.g. Checksums) seeded color-less language entries. Only fold when the group is itself a colored language, else fall back to the language's own name (guaranteed colored). Also add a 10s timeout to the Linguist fetch to avoid hangs.
The GraphQL remote path left the language bar empty since per-file
languages are too expensive to fetch over the network. GitHub already
runs Linguist and exposes repo-wide bytes-per-language at HEAD, so
fetch the aggregate instead: it fills the repo bar (not per-author or
per-commit stats, which still need a clone). Label distinguishes the
byte-snapshot basis ('by code size') from local line churn.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/repo-intel/repo-intel.py`:
- Around line 1577-1580: The code currently treats a present-but-empty remote
key the same as missing data by using truthiness on repo_languages ((extras or
{}).get("repo_languages") or []), which breaks remote-language semantics; change
the logic so you detect presence of the key itself instead of truthiness:
replace the assignment and any later truthy checks with an explicit presence
check (e.g., extras = extras or {}; if "repo_languages" in extras:
repo_languages = extras.get("repo_languages") else: repo_languages = None or []
depending on downstream expectations) and update subsequent conditionals that
currently do truthy checks on repo_languages to instead check for key presence
(e.g., if "repo_languages" in extras) so an empty list from remote is preserved
as a remote-run indicator rather than treated as missing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6c44e833-a18a-41a8-b619-da5327266ef9
📒 Files selected for processing (3)
src/repo-intel/repo-intel.pysrc/repo-intel/template.htmlstow/bin/repo-intel
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (2)
src/repo-intel/repo-intel.py (1)
1215-1250: LGTM!Also applies to: 1503-1509, 1520-1520
src/repo-intel/template.html (1)
543-550: LGTM!
Add Storybook (canonical scoped adapters), Testing Library, Puppeteer, TestCafe, Biome, Turborepo, Nx, SWC, Babel, GraphQL, tRPC, Drizzle to the npm map; add pnpm/Yarn/Bun, GitLab CI, Vercel, Netlify, and GitHub Actions as Tools-bucket sentinels. Support a directory-prefix sentinel shape (trailing slash) for .github/workflows/. Regenerate techdata + artifact.
Forces a bare git clone instead of the GitHub GraphQL API even when a token is present, unlocking per-author language churn the API can't provide. Also fixes the previously-broken no-token fallback path: collect_local crashed on bare repos via rev-parse --show-toplevel, and expected-failure git probes leaked stderr.
The canonical Linguist owner of .txt is the colorless 'Text' language, which the table builder drops — letting a niche colored claimant (Adblock Filter List, type=data) win the extension. LICENSE.txt and friends were mislabeled on the local/clone path. Add an EXT_EXCLUDE set so generic 'Text' extensions stay unassigned and fall into 'Other', matching GitHub.
Summary
Builds out the
repo-inteldashboard with a richer set of repository insights and interactions.Test plan
make repo-intel-dev/ the bundledstow/bin/repo-intel./commit/<sha>URLs, and the displayed times fall within the clicked bucket.stow/bin/repo-intelviamake repo-intel-build.Summary by CodeRabbit
New Features
Documentation
Chores