Replace O(n) go-git tree walks with git diff-tree in post-commit hook#594
Replace O(n) go-git tree walks with git diff-tree in post-commit hook#594
Conversation
The post-commit hook was taking ~5.4s on large repos (~58k files) due to two go-git tree comparisons. Replace both with `git diff-tree` CLI calls which do the same work in ~60ms total. - Add `cmd/entire/cli/gitops/` package with DiffTreeFiles/DiffTreeFileList - Replace `filesChangedInCommit` go-git tree walk (703ms → ~25ms) - Replace `getAllChangedFilesBetweenTrees` in attribution (4685ms → ~36ms) - Keep go-git fallback (`getAllChangedFilesBetweenTreesSlow`) for doctor cmd - Remove timing instrumentation added during investigation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: fb2415d1a9a3
PR SummaryMedium Risk Overview Switches post-commit’s Adds comprehensive tests for diff-tree parsing and real-repo scenarios, updates existing attribution and post-commit tests for the new signatures, and removes a now-irrelevant cache-equivalence test. Written by Cursor Bugbot for commit 2fc6033. Configure here. |
|
bugbot run |
There was a problem hiding this comment.
Pull request overview
This PR improves post-commit hook performance by replacing expensive go-git tree walks with a new git diff-tree-based implementation for changed-file detection, while retaining a go-git fallback for doctor/CondenseSessionByID paths.
Changes:
- Add a new
cli/gitopspackage that wrapsgit diff-tree --no-commit-id -r -zand parses its output. - Switch post-commit “files changed in commit” detection to use
git diff-treeinstead of go-git tree diffs. - Switch attribution’s “non-agent changed files” detection to
git diff-treewhen commit hashes are available, keeping a slow go-git fallback and updating tests accordingly.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/gitops/diff.go | Adds DiffTreeFiles / DiffTreeFileList wrappers and output parser for git diff-tree -z. |
| cmd/entire/cli/gitops/diff_test.go | Adds real-repo tests for diff-tree behavior plus parser unit tests. |
| cmd/entire/cli/strategy/manual_commit_hooks.go | Replaces filesChangedInCommit implementation to use gitops.DiffTreeFiles; threads ctx through. |
| cmd/entire/cli/strategy/phase_postcommit_test.go | Updates post-commit tests for new filesChangedInCommit(ctx, commit) signature and removes prior cache-equivalence coverage. |
| cmd/entire/cli/strategy/manual_commit_attribution.go | Introduces a fast path via DiffTreeFileList for non-agent file detection with a slow go-git fallback. |
| cmd/entire/cli/strategy/manual_commit_attribution_test.go | Updates attribution tests for the new function signature; renames and parallelizes the slow-path test. |
| cmd/entire/cli/strategy/manual_commit_condensation.go | Plumbs commit hashes into attribution options to enable the new fast path when available. |
Comments suppressed due to low confidence (1)
cmd/entire/cli/strategy/manual_commit_hooks.go:2124
- filesChangedInCommit drops all committed file detection if gitops.DiffTreeFiles returns an error (returns an empty map) and the error is silently ignored. That can cause incorrect overlap checks / carry-forward decisions when the git CLI is unavailable or diff-tree fails; consider logging at least a debug/warn and falling back to the prior go-git tree diff (or propagating an error up to PostCommit so the hook can take a safe default).
slog.String("session_id", state.SessionID),
)
}
// Clear turn checkpoint IDs. Do NOT update CheckpointTranscriptStart here — it was
// already set correctly by PostCommit: condenseAndUpdateState sets it to the total
// transcript lines when condensing, and carryForwardToNewShadowBranch resets it to 0
// when carry-forward is active. Overwriting here would break carry-forward by making
// sessionHasNewContent think the transcript is fully consumed (no growth).
The function returns all changed files between base and HEAD — the non-agent filtering happens in the caller. Name the function and local variable for what they actually contain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 859b7462b2b9
|
bugbot run |
…test coverage - Fix shared memory.Storage data race in TestGetAllChangedFilesBetweenTreesSlow by giving each subtest its own storage via newTestTreeBuilder() - Add logging.Warn in filesChangedInCommit when DiffTreeFiles fails - Add logging.Warn in CalculateAttributionWithAccumulated when getAllChangedFiles fails - Add rename, copy, and mixed rename+modify parser tests for parseDiffTreeOutput - Add TestDiffTreeFiles_Rename integration test using git mv Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: ad85320a5783
|
bugbot run |
|
| Status | Paths | Appears without flags? | Parser | Verified |
|---|---|---|---|---|
| A (add) | 1 | Yes | ✅ | unit + integration test |
| D (delete) | 1 | Yes | ✅ | unit + integration test |
| M (modify) | 1 | Yes | ✅ | unit + integration test |
| M (chmod-only) | 1 | Yes (mode fields differ, same status) | ✅ | manual test |
| T (type change, e.g. file→symlink) | 1 | Yes | ✅ | manual test |
| R (rename) | 2 | No (needs -M flag) |
✅ | unit + integration test |
| C (copy) | 2 | No (needs -C flag) |
✅ | unit test |
| U (unmerged) | 1 | No (index-only, not between commits) | ✅ | n/a |
| X (unknown/git bug) | 1 | Theoretically | ✅ | n/a |
Since we don't pass -M or -C, renames always appear as D+A pairs in practice. The R/C handling is defensive.
Summary
cmd/entire/cli/gitops/package withDiffTreeFiles/DiffTreeFileListwrappers aroundgit diff-tree --no-commit-id -r -zfilesChangedInCommitgo-git tree walk withgit diff-tree(703ms → ~25ms on 58k-file repos)getAllChangedFilesBetweenTreesin attribution withgit diff-tree(4685ms → ~36ms), keeping go-git fallback forCondenseSessionByID(doctor command)Before/after (58k-file repo)
Test plan
cmd/entire/cli/gitops/diff_test.go— tests DiffTreeFiles with real git repos (normal commit, initial commit, multi-commit range, deleted files, no changes, subdirectory files)TestGetAllChangedFilesBetweenTreesSlow— preserved coverage for go-git fallback path (nil trees, identical trees, add/delete scenarios)CalculateAttributionWithAccumulatedtests updated and passing (exercise slow fallback via empty commit hashes)TestFilesChangedInCommit/TestFilesChangedInCommit_InitialCommitupdated for new signaturemise run fmt && mise run lint && mise run test:ciall green🤖 Generated with Claude Code