Skip to content

Replace O(n) go-git tree walks with git diff-tree in post-commit hook#594

Open
khaong wants to merge 5 commits intomainfrom
alex/investigate-commit-slowness
Open

Replace O(n) go-git tree walks with git diff-tree in post-commit hook#594
khaong wants to merge 5 commits intomainfrom
alex/investigate-commit-slowness

Conversation

@khaong
Copy link
Contributor

@khaong khaong commented Mar 4, 2026

Summary

  • Add cmd/entire/cli/gitops/ package with DiffTreeFiles/DiffTreeFileList wrappers around git diff-tree --no-commit-id -r -z
  • Replace filesChangedInCommit go-git tree walk with git diff-tree (703ms → ~25ms on 58k-file repos)
  • Replace getAllChangedFilesBetweenTrees in attribution with git diff-tree (4685ms → ~36ms), keeping go-git fallback for CondenseSessionByID (doctor command)
  • Remove timing instrumentation added during investigation

Before/after (58k-file repo)

origin/main:  5.95s
this branch:  0.30s  (20x faster)

Test plan

  • New cmd/entire/cli/gitops/diff_test.go — tests DiffTreeFiles with real git repos (normal commit, initial commit, multi-commit range, deleted files, no changes, subdirectory files)
  • TestGetAllChangedFilesBetweenTreesSlow — preserved coverage for go-git fallback path (nil trees, identical trees, add/delete scenarios)
  • All existing CalculateAttributionWithAccumulated tests updated and passing (exercise slow fallback via empty commit hashes)
  • TestFilesChangedInCommit / TestFilesChangedInCommit_InitialCommit updated for new signature
  • mise run fmt && mise run lint && mise run test:ci all green

🤖 Generated with Claude Code

The post-commit hook was taking ~5.4s on large repos (~58k files) due to
two go-git tree comparisons. Replace both with `git diff-tree` CLI calls
which do the same work in ~60ms total.

- Add `cmd/entire/cli/gitops/` package with DiffTreeFiles/DiffTreeFileList
- Replace `filesChangedInCommit` go-git tree walk (703ms → ~25ms)
- Replace `getAllChangedFilesBetweenTrees` in attribution (4685ms → ~36ms)
- Keep go-git fallback (`getAllChangedFilesBetweenTreesSlow`) for doctor cmd
- Remove timing instrumentation added during investigation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: fb2415d1a9a3
@khaong khaong requested a review from a team as a code owner March 4, 2026 00:51
Copilot AI review requested due to automatic review settings March 4, 2026 00:51
@cursor
Copy link

cursor bot commented Mar 4, 2026

PR Summary

Medium Risk
Medium risk because post-commit condensation/attribution correctness now depends on executing/parsing git diff-tree; failures degrade to warnings/empty results (and attribution may be skipped), potentially affecting which sessions/files are condensed and reported.

Overview
Introduces a new gitops helper (DiffTreeFiles/DiffTreeFileList) that shells out to git diff-tree -r -z and parses its null-delimited output (including rename/copy paths).

Switches post-commit’s filesChangedInCommit and attribution’s “changed files between base and head” enumeration to this fast path when commit hashes are available, threading headCommitHash/attributionBaseCommit through condensation/attribution plumbing; keeps the prior go-git tree-walk logic as an explicit slow fallback for flows without commit hashes (e.g. doctor/force-condense).

Adds comprehensive tests for diff-tree parsing and real-repo scenarios, updates existing attribution and post-commit tests for the new signatures, and removes a now-irrelevant cache-equivalence test.

Written by Cursor Bugbot for commit 2fc6033. Configure here.

@khaong
Copy link
Contributor Author

khaong commented Mar 4, 2026

bugbot run

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves post-commit hook performance by replacing expensive go-git tree walks with a new git diff-tree-based implementation for changed-file detection, while retaining a go-git fallback for doctor/CondenseSessionByID paths.

Changes:

  • Add a new cli/gitops package that wraps git diff-tree --no-commit-id -r -z and parses its output.
  • Switch post-commit “files changed in commit” detection to use git diff-tree instead of go-git tree diffs.
  • Switch attribution’s “non-agent changed files” detection to git diff-tree when commit hashes are available, keeping a slow go-git fallback and updating tests accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
cmd/entire/cli/gitops/diff.go Adds DiffTreeFiles / DiffTreeFileList wrappers and output parser for git diff-tree -z.
cmd/entire/cli/gitops/diff_test.go Adds real-repo tests for diff-tree behavior plus parser unit tests.
cmd/entire/cli/strategy/manual_commit_hooks.go Replaces filesChangedInCommit implementation to use gitops.DiffTreeFiles; threads ctx through.
cmd/entire/cli/strategy/phase_postcommit_test.go Updates post-commit tests for new filesChangedInCommit(ctx, commit) signature and removes prior cache-equivalence coverage.
cmd/entire/cli/strategy/manual_commit_attribution.go Introduces a fast path via DiffTreeFileList for non-agent file detection with a slow go-git fallback.
cmd/entire/cli/strategy/manual_commit_attribution_test.go Updates attribution tests for the new function signature; renames and parallelizes the slow-path test.
cmd/entire/cli/strategy/manual_commit_condensation.go Plumbs commit hashes into attribution options to enable the new fast path when available.
Comments suppressed due to low confidence (1)

cmd/entire/cli/strategy/manual_commit_hooks.go:2124

  • filesChangedInCommit drops all committed file detection if gitops.DiffTreeFiles returns an error (returns an empty map) and the error is silently ignored. That can cause incorrect overlap checks / carry-forward decisions when the git CLI is unavailable or diff-tree fails; consider logging at least a debug/warn and falling back to the prior go-git tree diff (or propagating an error up to PostCommit so the hook can take a safe default).
			slog.String("session_id", state.SessionID),
		)
	}

	// Clear turn checkpoint IDs. Do NOT update CheckpointTranscriptStart here — it was
	// already set correctly by PostCommit: condenseAndUpdateState sets it to the total
	// transcript lines when condensing, and carryForwardToNewShadowBranch resets it to 0
	// when carry-forward is active. Overwriting here would break carry-forward by making
	// sessionHasNewContent think the transcript is fully consumed (no growth).

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

evisdren
evisdren previously approved these changes Mar 4, 2026
The function returns all changed files between base and HEAD — the
non-agent filtering happens in the caller. Name the function and
local variable for what they actually contain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 859b7462b2b9
@khaong
Copy link
Contributor Author

khaong commented Mar 4, 2026

bugbot run

…test coverage

- Fix shared memory.Storage data race in TestGetAllChangedFilesBetweenTreesSlow
  by giving each subtest its own storage via newTestTreeBuilder()
- Add logging.Warn in filesChangedInCommit when DiffTreeFiles fails
- Add logging.Warn in CalculateAttributionWithAccumulated when getAllChangedFiles fails
- Add rename, copy, and mixed rename+modify parser tests for parseDiffTreeOutput
- Add TestDiffTreeFiles_Rename integration test using git mv

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: ad85320a5783
@khaong khaong enabled auto-merge March 4, 2026 01:37
@khaong
Copy link
Contributor Author

khaong commented Mar 4, 2026

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@khaong
Copy link
Contributor Author

khaong commented Mar 4, 2026

git diff-tree output coverage verification

Verified the parser handles all possible status codes from git diff-tree -r -z:

Status Paths Appears without flags? Parser Verified
A (add) 1 Yes unit + integration test
D (delete) 1 Yes unit + integration test
M (modify) 1 Yes unit + integration test
M (chmod-only) 1 Yes (mode fields differ, same status) manual test
T (type change, e.g. file→symlink) 1 Yes manual test
R (rename) 2 No (needs -M flag) unit + integration test
C (copy) 2 No (needs -C flag) unit test
U (unmerged) 1 No (index-only, not between commits) n/a
X (unknown/git bug) 1 Theoretically n/a

Since we don't pass -M or -C, renames always appear as D+A pairs in practice. The R/C handling is defensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants