Skip to content

perf(#2351): batch path-existence checks via Git Trees API#2360

Open
fullsend-ai-coder[bot] wants to merge 1 commit into
mainfrom
agent/2351-batch-path-presence
Open

perf(#2351): batch path-existence checks via Git Trees API#2360
fullsend-ai-coder[bot] wants to merge 1 commit into
mainfrom
agent/2351-batch-path-presence

Conversation

@fullsend-ai-coder

Copy link
Copy Markdown
Contributor

Add forge.Client.ListRepositoryFiles to retrieve all file paths in a repository's default branch with a single Git Trees API call (refs → commit → tree?recursive=1). This replaces the O(N) GetFileContent pattern used by ComparePathPresence, reducing 100+ sequential API calls to 3 fixed calls regardless of path count.

Changes:

  • forge.Client: add ListRepositoryFiles(ctx, owner, repo)
  • github.LiveClient: implement using Git Trees API (reuses the
    same refs/commits/trees pattern as CommitFiles)
  • forge.FakeClient: implement using FileContents map keys
  • scaffold.ComparePathPresence: new batch implementation that
    calls ListRepositoryFiles once and checks membership locally
  • Tests: 6 ComparePathPresence tests including a guard that
    GetFileContent is never called; error injection and thread
    safety coverage for the new forge method

PR #1954 introduces a naive ComparePathPresence in vendormanifest.go that loops GetFileContent per path. When that PR merges, its version should be replaced with this batch implementation.


Closes #2351

Post-script verification

  • Branch is not main/master (agent/2351-batch-path-presence)
  • Secret scan passed (gitleaks — 32f73a4f93301493d2c31be3970aa4c51a26acc7..HEAD)
  • Pre-commit hooks passed (authoritative run on runner)
  • Tests ran inside sandbox

Add forge.Client.ListRepositoryFiles to retrieve all file paths
in a repository's default branch with a single Git Trees API
call (refs → commit → tree?recursive=1). This replaces the O(N)
GetFileContent pattern used by ComparePathPresence, reducing
100+ sequential API calls to 3 fixed calls regardless of path
count.

Changes:
- forge.Client: add ListRepositoryFiles(ctx, owner, repo)
- github.LiveClient: implement using Git Trees API (reuses the
  same refs/commits/trees pattern as CommitFiles)
- forge.FakeClient: implement using FileContents map keys
- scaffold.ComparePathPresence: new batch implementation that
  calls ListRepositoryFiles once and checks membership locally
- Tests: 6 ComparePathPresence tests including a guard that
  GetFileContent is never called; error injection and thread
  safety coverage for the new forge method

PR #1954 introduces a naive ComparePathPresence in
vendormanifest.go that loops GetFileContent per path. When that
PR merges, its version should be replaced with this batch
implementation.

Closes #2351
@github-actions

Copy link
Copy Markdown

E2E tests did not run

E2E tests run automatically for org/repo members and collaborators on pull requests.

For other contributors, a maintainer must add the ok-to-test label after the latest push.

See E2E testing guide for details.

@github-actions

Copy link
Copy Markdown

Site preview

Preview: https://0324b99c-site.fullsend-ai.workers.dev

Commit: dd9fc105a1b9893253fbd5f4feee0f60646d56b6

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 16, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 7:30 PM UTC · Completed 7:41 PM UTC
Commit: dd9fc10 · View workflow run →

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 33.33333% with 50 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/forge/github/github.go 0.00% 50 Missing ⚠️

📢 Thoughts on this report? Let us know!

@fullsend-ai-review

Copy link
Copy Markdown

Review

Findings

Medium

  • [error-handling] internal/forge/github/github.go:1024 — When the Git Trees API response is truncated (repository too large), the function returns a hard error with no sentinel type, making it impossible for callers to detect and fall back to per-path checks. The interface comment in forge.go does not document this failure mode. If any repository legitimately exceeds the GitHub tree size limit (~100k entries or 7MB response), every call to ComparePathPresence will fail permanently with no recourse.
    Remediation: Add a sentinel error (e.g., ErrTreeTruncated) so callers can detect this specific failure and fall back to per-path checks. Document the truncation risk in the ListRepositoryFiles interface comment. See also: [api-documentation-completeness] finding at internal/forge/forge.go:195.

Low

  • [api-documentation-completeness] internal/forge/forge.go:195 — The interface documentation for ListRepositoryFiles states "Returns ErrNotFound if the repository does not exist" but does not document the truncation behavior when the repository tree is too large. See also: [error-handling] finding at internal/forge/github/github.go:1024.
  • [error-wrapping-inconsistency] internal/forge/github/github.go:968 — Error wrapping uses get repo: %w which is slightly abbreviated compared to some other methods that use more descriptive prefixes like get repo for default branch: %w.
  • [error-handling-inconsistency] internal/forge/github/github.go:970 — The retryOnTransient call uses label get branch ref and the inner function also wraps errors with get branch ref:, creating double-wrapping on exhaustion. This matches the existing pattern in commitFilesTo but is still a pre-existing code smell.
  • [non-deterministic-output] internal/forge/fake.go:411FakeClient.ListRepositoryFiles iterates over a map (f.FileContents), producing non-deterministic ordering of the returned paths. Current callers build a map from the result so order does not matter.
  • [retry-consistency] internal/forge/github/github.go:958 — Only step 2 (get branch ref) is wrapped in retryOnTransient. Steps 1, 3, and 4 are not retried. This is consistent with the existing pattern in commitFilesTo.

Info

  • [documentation-style] internal/scaffold/pathpresence.go:10 — The function comment mentions "Git Trees API call" which leaks GitHub-specific implementation details into a forge-agnostic layer.
  • [scope-alignment] PR scope aligns with issue Vendor analyze: batch path existence checks instead of O(N) GetFileContent calls #2351. No scope creep detected.
  • [architectural-coherence] The new forge.Client interface method correctly follows the forge abstraction pattern (ADR-0005). The method name ListRepositoryFiles is appropriately forge-neutral.
  • [internal-interface-evolution] Additive change to internal interface. No cross-repo contract risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

requires-manual-review Review requires human judgment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vendor analyze: batch path existence checks instead of O(N) GetFileContent calls

0 participants