Skip to content

Conversation

@benaadams
Copy link
Member

@benaadams benaadams commented Dec 30, 2025

  • Reorganized UInt256 implementation by splitting into multiple partial class files (core logic, operators, constructors, and conversions)
  • Modified the scalar multiplication implementation (MultiplyScalar) to use a more efficient algorithm
  • Added exception handling for division by zero in modular arithmetic operations
  • Improved performance of MulMod (still needs more when 512bit multiplies are be done)
Method Param Old New Improvement
MultiplyMod_UInt256 2,1,2 bits 10.836 ns 3.559 ns 3.04x
MultiplyMod_UInt256 2,64,2 bits 21.067 ns 5.194 ns 4.06x
MultiplyMod_UInt256 2,128,2 bits 43.343 ns 4.245 ns 10.21x
MultiplyMod_UInt256 2,192,2 bits 45.316 ns 5.094 ns 8.90x
MultiplyMod_UInt256 2,256,2 bits 49.468 ns 4.512 ns 10.96x
MultiplyMod_UInt256 64,1,2 bits 11.527 ns 4.405 ns 2.62x
MultiplyMod_UInt256 64,64,2 bits 21.644 ns 5.346 ns 4.05x
MultiplyMod_UInt256 128,1,2 bits 39.575 ns 6.071 ns 6.52x
MultiplyMod_UInt256 128,64,2 bits 44.438 ns 4.602 ns 9.66x
MultiplyMod_UInt256 192,1,2 bits 43.550 ns 6.915 ns 6.30x
MultiplyMod_UInt256 192,64,2 bits 47.299 ns 5.752 ns 8.22x
MultiplyMod_UInt256 256,1,2 bits 43.148 ns 8.713 ns 4.95x
MultiplyMod_UInt256 256,64,2 bits 49.598 ns 5.332 ns 9.30x
MultiplyMod_UInt256 256,128,2 bits 53.207 ns 4.592 ns 11.59x
MultiplyMod_UInt256 256,192,2 bits 54.829 ns 5.543 ns 9.89x
MultiplyMod_UInt256 256,256,2 bits 58.823 ns 4.860 ns 12.10x
MultiplyMod_UInt256 2,1,64 bits 10.262 ns 2.444 ns 4.20x
MultiplyMod_UInt256 2,64,64 bits 20.997 ns 5.588 ns 3.76x
MultiplyMod_UInt256 2,128,64 bits 41.648 ns 9.767 ns 4.26x
MultiplyMod_UInt256 2,192,64 bits 46.161 ns 10.232 ns 4.51x
MultiplyMod_UInt256 2,256,64 bits 49.688 ns 10.083 ns 4.93x
MultiplyMod_UInt256 64,1,64 bits 10.679 ns 3.667 ns 2.91x
MultiplyMod_UInt256 64,64,64 bits 20.646 ns 6.916 ns 2.99x
MultiplyMod_UInt256 128,1,64 bits 36.752 ns 4.957 ns 7.41x
MultiplyMod_UInt256 128,64,64 bits 42.562 ns 9.604 ns 4.43x
MultiplyMod_UInt256 192,1,64 bits 40.934 ns 5.890 ns 6.95x
MultiplyMod_UInt256 192,64,64 bits 45.279 ns 9.750 ns 4.64x
MultiplyMod_UInt256 256,1,64 bits 43.285 ns 6.702 ns 6.46x
MultiplyMod_UInt256 256,64,64 bits 47.322 ns 10.014 ns 4.73x
MultiplyMod_UInt256 256,128,64 bits 49.380 ns 12.720 ns 3.88x
MultiplyMod_UInt256 256,192,64 bits 57.842 ns 12.516 ns 4.62x
MultiplyMod_UInt256 256,256,64 bits 61.100 ns 12.524 ns 4.88x
MultiplyMod_UInt256 2,1,128 bits 10.197 ns 2.482 ns 4.11x
MultiplyMod_UInt256 2,64,128 bits 9.663 ns 10.860 ns 0.89x
MultiplyMod_UInt256 2,128,128 bits 52.711 ns 24.210 ns 2.18x
MultiplyMod_UInt256 2,192,128 bits 58.223 ns 18.225 ns 3.19x
MultiplyMod_UInt256 2,256,128 bits 66.959 ns 24.787 ns 2.70x
MultiplyMod_UInt256 64,1,128 bits 10.449 ns 2.959 ns 3.53x
MultiplyMod_UInt256 64,64,128 bits 9.555 ns 11.452 ns 0.83x
MultiplyMod_UInt256 128,1,128 bits 41.848 ns 9.580 ns 4.37x
MultiplyMod_UInt256 128,64,128 bits 49.564 ns 13.409 ns 3.70x
MultiplyMod_UInt256 192,1,128 bits 51.241 ns 9.964 ns 5.14x
MultiplyMod_UInt256 192,64,128 bits 57.935 ns 25.689 ns 2.26x
MultiplyMod_UInt256 256,1,128 bits 56.772 ns 9.485 ns 5.98x
MultiplyMod_UInt256 256,64,128 bits 65.921 ns 22.711 ns 2.90x
MultiplyMod_UInt256 256,128,128 bits 63.975 ns 33.877 ns 1.89x
MultiplyMod_UInt256 256,192,128 bits 83.193 ns 64.623 ns 1.29x
MultiplyMod_UInt256 256,256,128 bits 88.687 ns 58.266 ns 1.52x
MultiplyMod_UInt256 2,1,192 bits 9.839 ns 2.526 ns 3.90x
MultiplyMod_UInt256 2,64,192 bits 9.893 ns 7.513 ns 1.32x
MultiplyMod_UInt256 2,128,192 bits 24.371 ns 22.502 ns 1.08x
MultiplyMod_UInt256 2,192,192 bits 59.319 ns 40.578 ns 1.46x
MultiplyMod_UInt256 2,256,192 bits 63.385 ns 46.876 ns 1.35x
MultiplyMod_UInt256 64,1,192 bits 10.241 ns 2.690 ns 3.81x
MultiplyMod_UInt256 64,64,192 bits 9.612 ns 7.527 ns 1.28x
MultiplyMod_UInt256 128,1,192 bits 24.059 ns 2.741 ns 8.78x
MultiplyMod_UInt256 128,64,192 bits 24.249 ns 22.895 ns 1.06x
MultiplyMod_UInt256 192,1,192 bits 46.350 ns 14.956 ns 3.10x
MultiplyMod_UInt256 192,64,192 bits 53.871 ns 37.197 ns 1.45x
MultiplyMod_UInt256 256,1,192 bits 54.677 ns 14.854 ns 3.68x
MultiplyMod_UInt256 256,64,192 bits 66.220 ns 47.914 ns 1.38x
MultiplyMod_UInt256 256,128,192 bits 67.148 ns 48.439 ns 1.39x
MultiplyMod_UInt256 256,192,192 bits 80.158 ns 64.521 ns 1.24x
MultiplyMod_UInt256 256,256,192 bits 91.011 ns 77.917 ns 1.17x
MultiplyMod_UInt256 2,1,256 bits 10.354 ns 2.479 ns 4.18x
MultiplyMod_UInt256 2,64,256 bits 9.882 ns 7.492 ns 1.32x
MultiplyMod_UInt256 2,128,256 bits 23.911 ns 24.109 ns 0.99x
MultiplyMod_UInt256 2,192,256 bits 25.065 ns 23.036 ns 1.09x
MultiplyMod_UInt256 2,256,256 bits 62.323 ns 51.123 ns 1.22x
MultiplyMod_UInt256 64,1,256 bits 9.920 ns 2.470 ns 4.02x
MultiplyMod_UInt256 64,64,256 bits 10.203 ns 8.293 ns 1.23x
MultiplyMod_UInt256 128,1,256 bits 23.537 ns 3.056 ns 7.70x
MultiplyMod_UInt256 128,64,256 bits 23.729 ns 23.212 ns 1.02x
MultiplyMod_UInt256 192,1,256 bits 23.024 ns 2.321 ns 9.92x
MultiplyMod_UInt256 192,64,256 bits 22.689 ns 21.476 ns 1.06x
MultiplyMod_UInt256 256,1,256 bits 47.235 ns 14.885 ns 3.17x
MultiplyMod_UInt256 256,64,256 bits 58.625 ns 39.194 ns 1.50x
MultiplyMod_UInt256 256,128,256 bits 63.913 ns 50.311 ns 1.27x
MultiplyMod_UInt256 256,192,256 bits 73.730 ns 51.345 ns 1.44x
MultiplyMod_UInt256 256,256,256 bits 84.343 ns 65.590 ns 1.29x

Copilot AI review requested due to automatic review settings December 30, 2025 13:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the UInt256 implementation to improve the performance of modular arithmetic operations, particularly Mulmod. The changes include reorganizing code into separate partial class files for better maintainability and potentially optimizing the underlying implementations.

Key changes:

  • Reorganized UInt256 implementation by splitting into multiple partial class files (core logic, operators, constructors, and conversions)
  • Modified the scalar multiplication implementation (MultiplyScalar) to use a more efficient algorithm
  • Added exception handling for division by zero in modular arithmetic operations

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
UInt256.cs Removed large sections of code (moved to new partial files); updated multiplication and carry logic; added constant Len; disabled AVX-512 multiplication path
UInt256.Operators.cs New file containing all operator overloads and type conversion operators previously in main file
UInt256.Ctors.cs New file containing constructors and factory methods previously in main file
UInt256.Conversions.cs New file containing conversion and parsing methods previously in main file
UInt256Tests.cs Updated tests to check for DivideByZeroException and ArgumentException in modular arithmetic operations with zero modulus

benaadams and others added 4 commits December 30, 2025 13:52
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@LukaszRozmej
Copy link
Member

Fallback to old code for 64,64,128 bits?

@benaadams
Copy link
Member Author

Added specialized path

Method Param Old New Improvement
MultiplyMod_UInt256 64,64,192 bits 9.555 ns 7.453 ns 1.28x


qhat--;
}
public int CompareTo(object? obj) => obj is not UInt256 int256 ? throw new InvalidOperationException() : CompareTo(int256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be compared to other number types?

// y != 0
// x > y

if (x.IsUint64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is somewhat redundant with x.IsZero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you what you mean? Will have returned already if x.IsZero; so different check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So both checks, are testing some same thing to some extent.
Like we could check IsUint64, IsZero and IsOne only once and save few instructions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 3 test 0 on last 3 fields and differ only on test on 1st field

Comment on lines +184 to +185
if (y.IsZero) ThrowDivideByZeroException();
if (x.IsZero || y.IsOne)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y.IsZero and y.IsOne are somewhat redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x.IsZero or y.IsOne; different variables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same variables on different lines

Comment on lines +244 to +256
if (m.IsZero) ThrowDivideByZeroException();
if (m.IsOne)
{
// Any value mod 1 is mathematically 0.
res = default;
return;
}

// Compute 257-bit sum S = x + y as 5 limbs (s0..s3, s4=carry)
bool overflow = AddOverflow(in x, in y, out UInt256 sum);
ulong s4 = !overflow ? 0UL : 1UL;

if (m.IsUint64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m.IsZero, m.IsOne and m.IsUint64 are somewhat redundant? They check same fields for most.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BitLen is slower

uint modBits = (uint)m.BitLen;
uint xBits = (uint)x.BitLen;
uint yBits = (uint)y.BitLen;

Comment on lines +278 to +282
else if (m.u3 != 0)
{
Remainder257By256Bits(in sum, in m, out res);
}
else if (m.u2 != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again comparing m fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't really find a better way that measurably shows up

Comment on lines +320 to +338
if (m.IsZero) ThrowDivideByZeroException();
if (m.IsOne || x.IsZero || y.IsZero)
{
res = default;
return;
}

// Trivial no-mul cases first.
if (y.IsOne) { Mod(in x, in m, out res); return; }
if (x.IsOne) { Mod(in y, in m, out res); return; }

// Modulus-size dispatch first - keeps all the tiny-mod magic.
if (m.IsUint64)
{
MulModBy64Bits(in x, in y, m.u0, out res);
return;
}

if ((m.u2 | m.u3) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same redundant compares?

Copy link
Member Author

@benaadams benaadams Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is > 128bit check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will measure leading zeros as single check 🤔

@LukaszRozmej
Copy link
Member

Anything more for?

2,64,128 bits
2,128,256 bits
128,64,256 bits	
192,64,256 bits	

benaadams and others added 2 commits December 31, 2025 11:24
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
@benaadams
Copy link
Member Author

Anything more for?

2,64,128 bits
2,128,256 bits
128,64,256 bits	
192,64,256 bits	

Yes but wasn't great for the amount of additional code added. Are opportunities there, will revisit

@benaadams benaadams merged commit f272b7e into main Dec 31, 2025
11 checks passed
@benaadams benaadams deleted the mulmod-perf branch December 31, 2025 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants