Skip to content

Releases: vitaut/zmij

1.0 "exponentially fast"

22 Jan 14:09
e5ddbe9

Choose a tag to compare

What’s Changed

This release focuses on correctness and performance, providing a minimal, safe API to obtain the shortest correctly rounded decimal representation in either exponential format or as a decimal floating-point number.

Performance & Algorithm Improvements

  • Optimized division, modulo, and logarithm computations
  • Reduced conditional branching compared to Schubfach
  • Simplified decimal significand selection by using a single shorter candidate, based on an idea by Cassio Neri
  • Simplified and optimized modified rounding computation
  • Applied an optimization by Yaoyuan Guo, replacing 2–3 costly 128×64 multiplications with a single multiplication in the common case
  • Reworked digit generation to process eight digits at a time instead of using lookup tables, reducing branching and enabling better compiler optimization for more consistent performance (#4, thanks @TobiSchluter)
  • Switched to BCD encoding to evaluate eight significand digits in parallel (#7, thanks @TobiSchluter and @xjb714)
  • Made exponent handling branch-free (#10, #16, thanks @TobiSchluter)
  • Switched to built-in leading-zero counting with a safe fallback for older compilers (#21, #22, #25, thanks @AlexGuteniev)
  • Applied a collection of improvements from Dougall Johnson (#49):
    • Further optimized division, modulo, and logarithm computations
    • Optimized exponent output logic
    • Generated the powers-of-10 table using constexpr and 192-bit arithmetic
    • Applied faster indexed loads on ARM to improve table access performance
    • Introduced an optional precomputed exp_shift table to speed up decimal scaling
  • Peeled off the rightmost digit to enable cheaper division and remove zero checks, reducing branching and streamlining digit extraction (#72, thanks @TobiSchluter)
  • Replaced abs with a ternary expression to recover ~10% performance on GCC (#66, thanks @TobiSchluter)
  • Reduced conditional branching on ARM to improve performance (#73, thanks @xjb714)
  • Optimized NEON zero-check logic to speed up digit processing (#74, thanks @xjb714)

SIMD & Architecture Support

  • Added an optimized write_significand implementation using NEON to accelerate digit extraction on supported platforms (thanks @dougallj)
  • Added SSE SIMD support on x86 to leverage 128-bit vector instructions for faster parallel digit processing and improved performance (#59, thanks @TobiSchluter)
  • Enabled NEON vectorization on ARM64 MSVC (#55, thanks @AlexGuteniev)
  • Disabled SIMD correctly when ZMIJ_USE_SIMD=0 (#75, thanks @TobiSchluter)

Portability & Toolchain Fixes

  • Fixed MSVC support on x64 and ARM64 by improving code generation, replacing unavailable intrinsics, and resolving related warnings (#8, thanks @mmozeiko)
  • Fixed multiple MSVC issues across 32-bit builds, table generation, warning cleanup, vector type handling, and forced inlining (#30, #31, #34, #42, #44, #48, #50, #65, #69, #71, #76, thanks @AlexGuteniev)
  • Fixed compilation regressions after recent changes (#38, #45, #56, #57, #77, thanks @AlexGuteniev and @TobiSchluter)
  • Fixed GCC compilation issues related to NEON intrinsics (#53)

API & Usability

  • Added to_decimal for converting binary floating-point values to decimal (#6)
  • Added float (binary32) support (#1, #15)
  • Returned the size of the resulting representation (#32, thanks @AlexGuteniev)
  • Reduced float_buffer_size from 17 to 16 (#51, #52, thanks @dtolnay)
  • Trimmed leading zeros in float formatting (#27, thanks @dtolnay)
  • Lowered minimum required standard to C++14 (#61, #62, thanks @AlexGuteniev)

Correctness & Safety

  • Added assertion to countl_zero for non-zero input (#29, thanks @AlexGuteniev)
  • Completed fallback implementation for bswap64 (#23, thanks @AlexGuteniev)
  • Avoided subtracting unrelated pointers for small buffers (#36, thanks @AlexGuteniev)
  • Prevented use of hundreds in float exponent path (#43, thanks @AlexGuteniev)
  • Fixed handling of subnormals (#11, #17, #19)
  • Fixed incorrect formatting of 32-bit infinity and NaN values (#70)

Verification & Tooling

  • Added test coverage for major configurations: default (C++), no SIMD, no builtins, and C
  • Added verification programs
  • Parallelized the verification program to run across multiple threads, dividing the test space by hardware concurrency to speed up exhaustive correctness checking (#26, thanks @dtolnay)
  • Improved verification tooling:
    • Increased concurrency support (#33)
    • Made failures return non-zero exit codes (#37)
    • Added buffer overrun tests (#46, #54)
      (thanks @AlexGuteniev)
  • Fixed CSV writer usage (#60, thanks @TobiSchluter)

New Contributors

Full Changelog:
https://github.com/vitaut/zmij/commits/v1.0