Skip to content

Make repo-intel resilient to GitHub history-fetch 502s#13

Merged
tyom merged 3 commits into
masterfrom
fix/repo-intel-commits
May 21, 2026
Merged

Make repo-intel resilient to GitHub history-fetch 502s#13
tyom merged 3 commits into
masterfrom
fix/repo-intel-commits

Conversation

@tyom
Copy link
Copy Markdown
Owner

@tyom tyom commented May 21, 2026

Problem

repo-intel crashed with HTTP Error 502: Bad Gateway after fetching ~2200 commits — and re-running failed at the exact same place. That determinism is the tell: it's not transient flakiness.

Resolving Commit.history makes GitHub compute per-commit diff stats (additions/deletions). A page holding a few large commits consistently exceeds GitHub's backend timeout and returns 502 at the same cursor every run. The fetch had no retry handling, and the cache was saved only after a fully successful fetch — so the crash discarded all ~2200 fetched commits and every re-run restarted from scratch.

Fix

  1. Adaptive page size + backoff — history page size is now a $pageSize GraphQL variable. A new fetch_history_page helper retries transient 5xx (429/500/502/503/504) on a plan that shrinks the page (100 → 100 → 25 → 25 → 10) with 0/2/4/8/15s backoff. Smaller pages cut the per-request diff-stat work that trips the timeout. Page size resets to 100 for the next page, so only the expensive pages pay the cost.

  2. Partial progress persists on terminal failure — if retries are exhausted, the walk returns a fetch_failed reason with the contiguous run collected so far, which is written to the cache (complete=False) before exiting non-zero. A re-run resumes from the cache tail via the existing older-fetch path instead of refetching everything.

Verification

  • octocat/Hello-World — single page, dashboard generated ✓
  • cli/cli --commits 250 — multi-page pagination across the new $pageSize/cursor wiring, short-circuits correctly ✓
  • Source and rebuilt stow/bin/repo-intel bundle both compile ✓

Both the source (src/repo-intel/repo-intel.py) and the rebuilt bundle are included.

Summary by CodeRabbit

  • Improvements

    • More resilient GitHub commit-history fetching with automatic retries for transient network/API failures.
    • Adaptive backoff that reduces page size on repeated failures to improve success chances.
    • On fetch failure, preserves contiguous partial commit data to cache and exits cleanly so subsequent runs can resume.
  • Other

    • Page title is now explicitly set for improved browser tab labelling.

Review Change Stack

Commit.history makes GitHub compute per-commit diff stats, so a page
with a few large commits deterministically times out (502) at the same
cursor. The fetch had no retries and discarded all progress on crash,
so every re-run restarted from scratch.

- Parametrize history page size; retry transient 5xx with backoff and a
  shrinking page size (100 -> 25 -> 10) to stay under GitHub's timeout
- Persist partial progress on terminal failure so re-runs resume from
  the cache tail instead of refetching everything
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58275618-1c89-4f9b-a1df-25be80538c57

📥 Commits

Reviewing files that changed from the base of the PR and between c3c7aa8 and dd48017.

📒 Files selected for processing (2)
  • src/repo-intel/repo-intel.py
  • stow/bin/repo-intel
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (7)
src/repo-intel/repo-intel.py (7)

67-67: LGTM!


498-513: LGTM!


516-536: LGTM!


765-774: LGTM!


829-837: LGTM!


839-855: LGTM!


873-889: LGTM!

Also applies to: 932-944


📝 Walkthrough

Walkthrough

Adds retry/backoff and shrinking page-size logic for GitHub GraphQL Commit.history page fetches (fetch_history_page + RETRYABLE_STATUS/HISTORY_FETCH_PLAN), makes pagination recognise "fetch_failed" and persist partial contiguous results, and sets document.title in the template.

Changes

GitHub GraphQL history fetch retry logic

Layer / File(s) Summary
Time import, retry constants and helper
src/repo-intel/repo-intel.py, stow/bin/repo-intel
Adds import time. Introduces RETRYABLE_STATUS and HISTORY_FETCH_PLAN, and fetch_history_page(query, variables, token, label) that retries transient HTTP/URL failures by sleeping and shrinking pageSize per attempt.
GraphQL $pageSize parameterisation
src/repo-intel/repo-intel.py, stow/bin/repo-intel
Replaces fixed first: 100 with a $pageSize variable in Commit.history, and adds $pageSize parameters to top and bottom history queries so the helper can control batch size.
Pagination URLError handling
src/repo-intel/repo-intel.py, stow/bin/repo-intel
_paginate_history now treats urllib.error.URLError during page traversal as reason "fetch_failed" and returns the contiguous nodes collected so far instead of raising.
Top/new history fetch integration
src/repo-intel/repo-intel.py, stow/bin/repo-intel
Top fetch path now calls fetch_history_page(..., label="new"). On "fetch_failed" it caches the contiguous prefix with complete=False (when caching) and exits with an abort message.
Bottom/older history fetch integration
src/repo-intel/repo-intel.py, stow/bin/repo-intel
Older fetch path now calls fetch_history_page(..., label="older"). On "fetch_failed" it caches the combined partial nodes with complete=False (when caching) and exits with an error.

Sequence Diagram

sequenceDiagram
  participant CollectRemote as collect_remote()
  participant PaginateHistory as _paginate_history()
  participant FetchHistoryPage as fetch_history_page()
  participant GHGraphQL as gh_graphql()
  participant Cache as cache/store

  CollectRemote->>PaginateHistory: initiate history walk (top and/or older)
  PaginateHistory->>FetchHistoryPage: request page (query + $pageSize, label)
  FetchHistoryPage->>GHGraphQL: execute GraphQL query
  GHGraphQL-->>FetchHistoryPage: HTTP error / URLError
  FetchHistoryPage->>FetchHistoryPage: sleep, shrink $pageSize, retry
  FetchHistoryPage->>GHGraphQL: retry GraphQL query
  GHGraphQL-->>FetchHistoryPage: success (nodes) or final exception
  FetchHistoryPage-->>PaginateHistory: nodes or exception
  PaginateHistory-->>CollectRemote: nodes or (nodes, "fetch_failed")
  CollectRemote->>Cache: persist partial nodes with complete=False (on fetch_failed)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • tyom/dotfiles#12: Refactors _paginate_history and collect_remote pagination/remote fetching flows that this retry/backoff work builds upon.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: adding retry/backoff resilience to GitHub history-fetch operations that were failing with 502 errors.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/repo-intel-commits

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/repo-intel/repo-intel.py`:
- Around line 765-769: In _paginate_history, don’t treat urllib.error.HTTPError
(a non-retryable subclass) as a generic URLError: catch urllib.error.HTTPError
from the fetch_page(cursor) call (or detect isinstance(exc,
urllib.error.HTTPError)) and re-raise it (or propagate it) instead of converting
it into the "fetch_failed" return value; keep the existing except
urllib.error.URLError handler for retryable network errors so
fetch_page/fetch_history_page non-retryable HTTP failures (401/403/404) are not
collapsed into "fetch_failed".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c94ddd9-763b-4d01-9c25-0208a7299176

📥 Commits

Reviewing files that changed from the base of the PR and between 33a4d51 and 8c287c4.

📒 Files selected for processing (2)
  • src/repo-intel/repo-intel.py
  • stow/bin/repo-intel
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (1)
stow/bin/repo-intel (1)

765-769: Same hard-HTTPError masking as in src/repo-intel/repo-intel.py.

Comment thread src/repo-intel/repo-intel.py
tyom added 2 commits May 21, 2026 14:07
Set document.title to "<owner/repo> · Repo Intel" from the injected
data so the browser tab identifies which repo the dashboard is for.
_paginate_history caught urllib.error.URLError, which also catches its
HTTPError subclass. A hard 401/403/404 surfaced by fetch_history_page
was thus turned into a resumable fetch_failed — saving a bogus partial
cache and telling the user to re-run. Propagate non-retryable statuses
instead; retryable 5xx and network errors still resume as before.
@tyom tyom merged commit d17302e into master May 21, 2026
3 checks passed
@tyom tyom deleted the fix/repo-intel-commits branch May 21, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant