Skip to content

perf: replace slice bits by low bits#1840

Merged
gbotrel merged 1 commit into
mainfrom
perf/zkcexec
Jun 5, 2026
Merged

perf: replace slice bits by low bits#1840
gbotrel merged 1 commit into
mainfrom
perf/zkcexec

Conversation

@gbotrel
Copy link
Copy Markdown
Contributor

@gbotrel gbotrel commented Jun 3, 2026

On the hot path of zkc exec , micro bench on this specific function x5 - x10, on some specific zkc exec runs, x2.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the word.Uint.Slice() implementation on the zkc exec hot path by replacing the previous per-bit extraction helper with a lower-level “mask low bits” approach intended to be faster for common widths.

Changes:

  • Replaced Uint.Slice(width) implementation with a new lowBits(width, value) helper.
  • Removed the former readBitSlice loop-based implementation.
  • Added a math/bits-based fast path to avoid touching high words for small widths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +272 to 295
// lowBits returns the low `width` bits of value (i.e. value mod 2^width).
// For example, given value 10111000 and width=4 the result is 1000.
func lowBits(width uint, value big.Int) big.Int {
var slice big.Int
// Fast paths: the result fits within a single uint64.
if width <= 64 {
if value.IsUint64() {
slice.SetUint64(value.Uint64() & mask64(width))
return slice
}
// For a multi-word value, the least-significant machine word already
// holds every bit we keep, so there's no need to touch the high words.
if width <= bits.UintSize {
slice.SetUint64(uint64(value.Bits()[0]) & mask64(width))
return slice
}
}
// General path: mask off everything at or above bit `width`.
mask := new(big.Int).Lsh(&one, width)
mask.Sub(mask, &one)
slice.And(&value, mask)
//
return slice
}
Copy link
Copy Markdown
Contributor

@DavePearce DavePearce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM --- it is definitely an improvement. However, I have to note that on two somewhat realistic benchmarks I am not observing much performance benefit. I guess its not hot. The benchmarks are: 50K block keccak (ZkC) which takes around 58s vs 56s; a run of blake on the RISC-V interpreter (not sure how many rounds, but takes around 4s for both versions).

Signed-off-by: Gautam Botrel <gautam.botrel@gmail.com>
@gbotrel gbotrel merged commit 1411a7d into main Jun 5, 2026
25 checks passed
@gbotrel gbotrel deleted the perf/zkcexec branch June 5, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants