Make repo-intel resilient to GitHub history-fetch 502s#13
Conversation
Commit.history makes GitHub compute per-commit diff stats, so a page with a few large commits deterministically times out (502) at the same cursor. The fetch had no retries and discarded all progress on crash, so every re-run restarted from scratch. - Parametrize history page size; retry transient 5xx with backoff and a shrinking page size (100 -> 25 -> 10) to stay under GitHub's timeout - Persist partial progress on terminal failure so re-runs resume from the cache tail instead of refetching everything
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (7)
📝 WalkthroughWalkthroughAdds retry/backoff and shrinking page-size logic for GitHub GraphQL Commit.history page fetches (fetch_history_page + RETRYABLE_STATUS/HISTORY_FETCH_PLAN), makes pagination recognise "fetch_failed" and persist partial contiguous results, and sets document.title in the template. ChangesGitHub GraphQL history fetch retry logic
Sequence DiagramsequenceDiagram
participant CollectRemote as collect_remote()
participant PaginateHistory as _paginate_history()
participant FetchHistoryPage as fetch_history_page()
participant GHGraphQL as gh_graphql()
participant Cache as cache/store
CollectRemote->>PaginateHistory: initiate history walk (top and/or older)
PaginateHistory->>FetchHistoryPage: request page (query + $pageSize, label)
FetchHistoryPage->>GHGraphQL: execute GraphQL query
GHGraphQL-->>FetchHistoryPage: HTTP error / URLError
FetchHistoryPage->>FetchHistoryPage: sleep, shrink $pageSize, retry
FetchHistoryPage->>GHGraphQL: retry GraphQL query
GHGraphQL-->>FetchHistoryPage: success (nodes) or final exception
FetchHistoryPage-->>PaginateHistory: nodes or exception
PaginateHistory-->>CollectRemote: nodes or (nodes, "fetch_failed")
CollectRemote->>Cache: persist partial nodes with complete=False (on fetch_failed)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/repo-intel/repo-intel.py`:
- Around line 765-769: In _paginate_history, don’t treat urllib.error.HTTPError
(a non-retryable subclass) as a generic URLError: catch urllib.error.HTTPError
from the fetch_page(cursor) call (or detect isinstance(exc,
urllib.error.HTTPError)) and re-raise it (or propagate it) instead of converting
it into the "fetch_failed" return value; keep the existing except
urllib.error.URLError handler for retryable network errors so
fetch_page/fetch_history_page non-retryable HTTP failures (401/403/404) are not
collapsed into "fetch_failed".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3c94ddd9-763b-4d01-9c25-0208a7299176
📒 Files selected for processing (2)
src/repo-intel/repo-intel.pystow/bin/repo-intel
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (1)
stow/bin/repo-intel (1)
765-769: Same hard-HTTPErrormasking as insrc/repo-intel/repo-intel.py.
Set document.title to "<owner/repo> · Repo Intel" from the injected data so the browser tab identifies which repo the dashboard is for.
_paginate_history caught urllib.error.URLError, which also catches its HTTPError subclass. A hard 401/403/404 surfaced by fetch_history_page was thus turned into a resumable fetch_failed — saving a bogus partial cache and telling the user to re-run. Propagate non-retryable statuses instead; retryable 5xx and network errors still resume as before.
Problem
repo-intelcrashed withHTTP Error 502: Bad Gatewayafter fetching ~2200 commits — and re-running failed at the exact same place. That determinism is the tell: it's not transient flakiness.Resolving
Commit.historymakes GitHub compute per-commit diff stats (additions/deletions). A page holding a few large commits consistently exceeds GitHub's backend timeout and returns 502 at the same cursor every run. The fetch had no retry handling, and the cache was saved only after a fully successful fetch — so the crash discarded all ~2200 fetched commits and every re-run restarted from scratch.Fix
Adaptive page size + backoff — history page size is now a
$pageSizeGraphQL variable. A newfetch_history_pagehelper retries transient 5xx (429/500/502/503/504) on a plan that shrinks the page (100 → 100 → 25 → 25 → 10) with0/2/4/8/15sbackoff. Smaller pages cut the per-request diff-stat work that trips the timeout. Page size resets to 100 for the next page, so only the expensive pages pay the cost.Partial progress persists on terminal failure — if retries are exhausted, the walk returns a
fetch_failedreason with the contiguous run collected so far, which is written to the cache (complete=False) before exiting non-zero. A re-run resumes from the cache tail via the existing older-fetch path instead of refetching everything.Verification
octocat/Hello-World— single page, dashboard generated ✓cli/cli --commits 250— multi-page pagination across the new$pageSize/cursor wiring, short-circuits correctly ✓stow/bin/repo-intelbundle both compile ✓Both the source (
src/repo-intel/repo-intel.py) and the rebuilt bundle are included.Summary by CodeRabbit
Improvements
Other