Skip to content

Perf: SIMD first-byte scan, eliminate try/catch and cached span in hot search loop#2

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/optimize-memory-search-performance
Draft

Perf: SIMD first-byte scan, eliminate try/catch and cached span in hot search loop#2
Copilot wants to merge 2 commits intomasterfrom
copilot/optimize-memory-search-performance

Conversation

Copy link

Copilot AI commented Feb 27, 2026

The byte pattern scanner's hot path had several avoidable costs that compound badly when scanning large (50–100 MB) executables.

Hot-path changes (WitchHunt.cs)

  • Removed try/catch from Match: blocked JIT register allocation, loop hoisting, and inlining — bounds are already checked explicitly
  • [AggressiveInlining] on Match: eliminates call frame overhead on every byte of the buffer
  • Made Match static, pass ReadOnlySpan<byte> as parameter: removes repeated ReadOnlyMemory<byte>.Span property access (allocates a new span header each call); cached once in FindSingle as var data = Data.Span
  • Cached start.ToInt32(): was re-evaluated on every iteration of the while condition despite start being loop-invariant

SIMD first-byte skip (WitchHunt.cs)

When no match is found and the first byte has no wildcard (mask == 0xFF), replaced the manual byte-by-byte next-occurrence scan with MemoryExtensions.IndexOf:

// Before: manual O(n) loop
for (; indexOf < dataBuffer.Length; indexOf++)
{
    if ((dataBuffer[indexOf] & mask) != bmo) continue;
    break;
}

// After: SIMD-accelerated (SSE2/AVX2), processes 16–32 bytes per cycle
var nextIdx = remaining.IndexOf((byte)bmo);
return nextIdx < 0 ? -dataBuffer.Length : -(nextIdx + 1);

Pattern parsing (Utilities.cs)

Replaced Hexchars.Contains(t) — an O(24) linear string scan per character — with direct char range comparisons (>= '0' && <= '9', etc.). Removed the now-unused Hexchars constant.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: nt153133 <52430037+nt153133@users.noreply.github.com>
Copilot AI changed the title [WIP] Identify performance optimizations for memory byte pattern search Perf: SIMD first-byte scan, eliminate try/catch and cached span in hot search loop Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants