Skip to content

fix(assembly): correct AMD64 AVX2 stack frame offsets#25

Merged
shaia merged 1 commit into
mainfrom
fix/amd64-assembly-offsets
Nov 20, 2025
Merged

fix(assembly): correct AMD64 AVX2 stack frame offsets#25
shaia merged 1 commit into
mainfrom
fix/amd64-assembly-offsets

Conversation

@shaia
Copy link
Copy Markdown
Owner

@shaia shaia commented Nov 20, 2025

Fix critical assembly bugs that caused go vet failures and test failures on Linux and Windows.

Changes:

  • bucket_lookup_avx2_amd64.s: Fix frame size from /bin/zsh-25 to /bin/zsh-33
    • Correct return offset from ret+25(FP) to ret+32(FP)
  • batch_avx2_amd64.s: Fix frame size from -72 to -64
    • Correct results_base offset from +32(FP) to +24(FP)
    • Correct fingerprintBits offset from +56(FP) to +48(FP)
    • Correct numBuckets offset from +64(FP) to +56(FP)

The errors were due to incorrect calculation of Go function argument sizes:

  • []byte = 24 bytes (base + len + cap, each 8 bytes)
  • byte = 8 bytes (aligned)
  • bool = 8 bytes (aligned)

All tests now pass with race detector enabled on all platforms.

Fixes: go vet errors on AMD64
Fixes: TestBucketLookup* failures on Linux and Windows
Fixes: XXHash batch processing on all platforms

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes critical assembly bugs in AMD64 AVX2 implementations that caused go vet failures and test failures on Linux and Windows. The changes correct stack frame sizes and argument offsets in two assembly files.

Key Changes:

  • Corrected return value offsets in bucket_lookup_avx2_amd64.s from +25 to +32
  • Fixed frame size and all argument offsets in batch_avx2_amd64.s from incorrect values to proper 8-byte aligned offsets
  • However, the frame size in bucket_lookup_avx2_amd64.s still appears to be incorrect (should be 40, not 33)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
internal/lookup/bucket_lookup_avx2_amd64.s Partially corrected frame size to 33 and return offsets to 32; however, frame size should be 40 based on argument layout
internal/hash/xxhash/batch_avx2_amd64.s Correctly fixed frame size to 64 and adjusted all argument offsets (results_base, fingerprintBits, numBuckets) to proper aligned values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// func bucketLookupAVX2(fingerprints []byte, target byte) bool
TEXT ·bucketLookupAVX2(SB), NOSPLIT, $0-25
TEXT ·bucketLookupAVX2(SB), NOSPLIT, $0-33
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frame size should be 40, not 33. The function has:

  • fingerprints []byte = 24 bytes (3 × 8-byte words at offsets 0, 8, 16)
  • target byte = 8 bytes (aligned, at offset 24)
  • return bool = 8 bytes (aligned, at offset 32)

Total = 40 bytes (5 × 8-byte slots)

The correct declaration should be TEXT ·bucketLookupAVX2(SB), NOSPLIT, $0-40

Suggested change
TEXT ·bucketLookupAVX2(SB), NOSPLIT, $0-33
TEXT ·bucketLookupAVX2(SB), NOSPLIT, $0-40

Copilot uses AI. Check for mistakes.
Fix critical assembly bugs that caused go vet failures and test failures
on Linux and Windows.

Changes:
- bucket_lookup_avx2_amd64.s: Fix frame size from /bin/zsh-25 to /bin/zsh-33
  - Correct return offset from ret+25(FP) to ret+32(FP)
- batch_avx2_amd64.s: Fix frame size from -72 to -64
  - Correct results_base offset from +32(FP) to +24(FP)
  - Correct fingerprintBits offset from +56(FP) to +48(FP)
  - Correct numBuckets offset from +64(FP) to +56(FP)

The errors were due to incorrect calculation of Go function argument sizes:
- []byte = 24 bytes (base + len + cap, each 8 bytes)
- byte = 8 bytes (aligned)
- bool = 8 bytes (aligned)

All tests now pass with race detector enabled on all platforms.

Fixes: go vet errors on AMD64
Fixes: TestBucketLookup* failures on Linux and Windows
Fixes: XXHash batch processing on all platforms
@shaia shaia force-pushed the fix/amd64-assembly-offsets branch from c153d1f to 71099ca Compare November 20, 2025 19:49
@shaia shaia merged commit 47e5929 into main Nov 20, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants