Make repo-intel resilient to GitHub history-fetch 502s by tyom · Pull Request #13 · tyom/dotfiles

tyom · 2026-05-21T12:47:52Z

Problem

repo-intel crashed with HTTP Error 502: Bad Gateway after fetching ~2200 commits — and re-running failed at the exact same place. That determinism is the tell: it's not transient flakiness.

Resolving Commit.history makes GitHub compute per-commit diff stats (additions/deletions). A page holding a few large commits consistently exceeds GitHub's backend timeout and returns 502 at the same cursor every run. The fetch had no retry handling, and the cache was saved only after a fully successful fetch — so the crash discarded all ~2200 fetched commits and every re-run restarted from scratch.

Fix

Adaptive page size + backoff — history page size is now a $pageSize GraphQL variable. A new fetch_history_page helper retries transient 5xx (429/500/502/503/504) on a plan that shrinks the page (100 → 100 → 25 → 25 → 10) with 0/2/4/8/15s backoff. Smaller pages cut the per-request diff-stat work that trips the timeout. Page size resets to 100 for the next page, so only the expensive pages pay the cost.
Partial progress persists on terminal failure — if retries are exhausted, the walk returns a fetch_failed reason with the contiguous run collected so far, which is written to the cache (complete=False) before exiting non-zero. A re-run resumes from the cache tail via the existing older-fetch path instead of refetching everything.

Verification

octocat/Hello-World — single page, dashboard generated ✓
cli/cli --commits 250 — multi-page pagination across the new $pageSize/cursor wiring, short-circuits correctly ✓
Source and rebuilt stow/bin/repo-intel bundle both compile ✓

Both the source (src/repo-intel/repo-intel.py) and the rebuilt bundle are included.

Summary by CodeRabbit

Improvements
- More resilient GitHub commit-history fetching with automatic retries for transient network/API failures.
- Adaptive backoff that reduces page size on repeated failures to improve success chances.
- On fetch failure, preserves contiguous partial commit data to cache and exits cleanly so subsequent runs can resume.
Other
- Page title is now explicitly set for improved browser tab labelling.

Commit.history makes GitHub compute per-commit diff stats, so a page with a few large commits deterministically times out (502) at the same cursor. The fetch had no retries and discarded all progress on crash, so every re-run restarted from scratch. - Parametrize history page size; retry transient 5xx with backoff and a shrinking page size (100 -> 25 -> 10) to stay under GitHub's timeout - Persist partial progress on terminal failure so re-runs resume from the cache tail instead of refetching everything

coderabbitai · 2026-05-21T12:48:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58275618-1c89-4f9b-a1df-25be80538c57

📥 Commits

Reviewing files that changed from the base of the PR and between c3c7aa8 and dd48017.

📒 Files selected for processing (2)

src/repo-intel/repo-intel.py
stow/bin/repo-intel

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (7)

src/repo-intel/repo-intel.py (7)

67-67: LGTM!

498-513: LGTM!

516-536: LGTM!

765-774: LGTM!

829-837: LGTM!

839-855: LGTM!

873-889: LGTM!

Also applies to: 932-944

📝 Walkthrough

Walkthrough

Adds retry/backoff and shrinking page-size logic for GitHub GraphQL Commit.history page fetches (fetch_history_page + RETRYABLE_STATUS/HISTORY_FETCH_PLAN), makes pagination recognise "fetch_failed" and persist partial contiguous results, and sets document.title in the template.

Changes

GitHub GraphQL history fetch retry logic

Layer / File(s)	Summary
Time import, retry constants and helper `src/repo-intel/repo-intel.py`, `stow/bin/repo-intel`	Adds `import time`. Introduces `RETRYABLE_STATUS` and `HISTORY_FETCH_PLAN`, and `fetch_history_page(query, variables, token, label)` that retries transient HTTP/URL failures by sleeping and shrinking `pageSize` per attempt.
GraphQL $pageSize parameterisation `src/repo-intel/repo-intel.py`, `stow/bin/repo-intel`	Replaces fixed `first: 100` with a `$pageSize` variable in `Commit.history`, and adds `$pageSize` parameters to top and bottom history queries so the helper can control batch size.
Pagination URLError handling `src/repo-intel/repo-intel.py`, `stow/bin/repo-intel`	`_paginate_history` now treats `urllib.error.URLError` during page traversal as reason `"fetch_failed"` and returns the contiguous nodes collected so far instead of raising.
Top/new history fetch integration `src/repo-intel/repo-intel.py`, `stow/bin/repo-intel`	Top fetch path now calls `fetch_history_page(..., label="new")`. On `"fetch_failed"` it caches the contiguous prefix with `complete=False` (when caching) and exits with an abort message.
Bottom/older history fetch integration `src/repo-intel/repo-intel.py`, `stow/bin/repo-intel`	Older fetch path now calls `fetch_history_page(..., label="older")`. On `"fetch_failed"` it caches the combined partial nodes with `complete=False` (when caching) and exits with an error.

Sequence Diagram

sequenceDiagram
  participant CollectRemote as collect_remote()
  participant PaginateHistory as _paginate_history()
  participant FetchHistoryPage as fetch_history_page()
  participant GHGraphQL as gh_graphql()
  participant Cache as cache/store

  CollectRemote->>PaginateHistory: initiate history walk (top and/or older)
  PaginateHistory->>FetchHistoryPage: request page (query + $pageSize, label)
  FetchHistoryPage->>GHGraphQL: execute GraphQL query
  GHGraphQL-->>FetchHistoryPage: HTTP error / URLError
  FetchHistoryPage->>FetchHistoryPage: sleep, shrink $pageSize, retry
  FetchHistoryPage->>GHGraphQL: retry GraphQL query
  GHGraphQL-->>FetchHistoryPage: success (nodes) or final exception
  FetchHistoryPage-->>PaginateHistory: nodes or exception
  PaginateHistory-->>CollectRemote: nodes or (nodes, "fetch_failed")
  CollectRemote->>Cache: persist partial nodes with complete=False (on fetch_failed)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

tyom/dotfiles#12: Refactors _paginate_history and collect_remote pagination/remote fetching flows that this retry/backoff work builds upon.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: adding retry/backoff resilience to GitHub history-fetch operations that were failing with 502 errors.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/repo-intel-commits

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/repo-intel/repo-intel.py`:
- Around line 765-769: In _paginate_history, don’t treat urllib.error.HTTPError
(a non-retryable subclass) as a generic URLError: catch urllib.error.HTTPError
from the fetch_page(cursor) call (or detect isinstance(exc,
urllib.error.HTTPError)) and re-raise it (or propagate it) instead of converting
it into the "fetch_failed" return value; keep the existing except
urllib.error.URLError handler for retryable network errors so
fetch_page/fetch_history_page non-retryable HTTP failures (401/403/404) are not
collapsed into "fetch_failed".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c94ddd9-763b-4d01-9c25-0208a7299176

📥 Commits

Reviewing files that changed from the base of the PR and between 33a4d51 and 8c287c4.

📒 Files selected for processing (2)

src/repo-intel/repo-intel.py
stow/bin/repo-intel

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (1)

stow/bin/repo-intel (1)

765-769: Same hard-HTTPError masking as in src/repo-intel/repo-intel.py.

Set document.title to "<owner/repo> · Repo Intel" from the injected data so the browser tab identifies which repo the dashboard is for.

_paginate_history caught urllib.error.URLError, which also catches its HTTPError subclass. A hard 401/403/404 surfaced by fetch_history_page was thus turned into a resumable fetch_failed — saving a bogus partial cache and telling the user to re-run. Propagate non-retryable statuses instead; retryable 5xx and network errors still resume as before.

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/repo-intel/repo-intel.py

tyom added 2 commits May 21, 2026 14:07

Include repo name in repo-intel page title

c3c7aa8

Set document.title to "<owner/repo> · Repo Intel" from the injected data so the browser tab identifies which repo the dashboard is for.

tyom merged commit d17302e into master May 21, 2026
3 checks passed

tyom deleted the fix/repo-intel-commits branch May 21, 2026 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make repo-intel resilient to GitHub history-fetch 502s#13

Make repo-intel resilient to GitHub history-fetch 502s#13
tyom merged 3 commits into
masterfrom
fix/repo-intel-commits

tyom commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tyom commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tyom commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading