Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 6, 2025

Profiling showed 99% of clone time spent in GeneratePackIndexV2, with 93% in FindCompressedLength. The code was decompressing the entire pack file twice: once to generate the index, then again to extract objects.

Changes

New direct parsing path

  • Added PackFile.ParseAllObjectsDirectly() that parses objects in a single pass without generating an index
  • Handles regular objects and deltas (OfsDelta, RefDelta) with correct dependency resolution
  • Maintains O(1) lookups for RefDelta base object resolution using dual dictionaries (by-offset and by-SHA1)

Updated clone operations

  • ParsePackFileAndGetCommit() now calls ParseAllObjectsDirectly() instead of GeneratePackIndexV2() → ParsePackIndexV2() → ParseAllObjects()
  • FetchBloblessCloneAsync() uses direct parsing
  • Blob fetching in LoadSubdirectoryContentsWithBloblessCloneAsync() uses direct parsing

Optimization details

  • Reduced allocations in SHA1 calculation using stackalloc for object headers
  • GeneratePackIndexV2() remains unchanged and available when actual .idx/.rev files are needed

Impact

Before:

var indexResult = PackIndex.GeneratePackIndexV2(packFileData);  // Pass 1: decompress all objects
var indexEntries = PackIndex.ParsePackIndexV2(indexResult.IndexData);
var objects = PackFile.ParseAllObjects(packFileData, indexEntries);  // Pass 2: decompress all objects again

After:

var objects = PackFile.ParseAllObjectsDirectly(packFileData);  // Single pass: decompress once

Expected performance: ~50% reduction in pack file parsing time for clone operations.

Original prompt

We have a performance problem to fix:
When cloning from a large repo, the profiler says we spent 99 percent of time, more than one minute in GeneratePackIndexV2.
Also, about 93 percent of that time was spent in FindCompressedLength

  • Do we even need GeneratePackIndexV2 on clone? Shouldnt we be able to just parse all objects out of the pack more directly? Relatedly, can we consolidate duplicate code in cloning methods that retrieves objects to add to the repo?
  • Can we improve efficiency on the index derivation, for the cases where its still needed?

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits November 6, 2025 18:28
…ration

Co-authored-by: Viir <19209696+Viir@users.noreply.github.com>
…ndex-based parsing

Co-authored-by: Viir <19209696+Viir@users.noreply.github.com>
…ocations

Co-authored-by: Viir <19209696+Viir@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix performance issues in cloning from large repositories Eliminate redundant pack file decompression during clone operations Nov 6, 2025
Copilot AI requested a review from Viir November 6, 2025 18:36
@Viir
Copy link
Owner

Viir commented Nov 6, 2025

@copilot Describe this image

@Viir Viir closed this Nov 6, 2025
@Viir Viir deleted the copilot/improve-clone-performance branch November 6, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants