perf: 1.7-2.4x faster SSIM (experimental, AI-generated)#183
Open
lilith wants to merge 7 commits intokornelski:mainfrom
Open
perf: 1.7-2.4x faster SSIM (experimental, AI-generated)#183lilith wants to merge 7 commits intokornelski:mainfrom
lilith wants to merge 7 commits intokornelski:mainfrom
Conversation
Apply cargo fmt across all source files. Replace GammaComponent::max_value() method with COMPONENT_MAX const. Fix std::f64::EPSILON -> f64::EPSILON legacy constant. Suppress clippy::enum_variant_names on vImage_Flags (macOS FFI).
Rewrite blur as separable horizontal + vertical passes. Fuse double 3-tap (H→V→H→V) into single 5-tap (H5→V5), halving memory traffic. Add blur_mul: fuses element-wise multiply into the horizontal blur pass, used for img_sq_blur and img1_img2_blur. Add compare_scale_3ch with manually unrolled 3-channel SSIM loop, replacing LAB interleaving via multizip. Parallelize 3-channel img1_img2_blur across channels.
Add archmage and magetypes as optional dependencies behind fma feature. SIMD blur: blur_avx2, blur_in_place_avx2, blur_mul_avx2 using f32x8. SIMD tolab: vectorized RGB→XYZ matrix multiply + cbrt polynomial. SIMD SSIM: compare_3ch_avx2 processes 8 pixels per iteration. Runtime dispatch via X64V3Token::summon() with scalar fallback. All archmage code is safe — no unsafe blocks in SIMD modules.
Replace lodepng/load_image/lcms2 C dependencies with pure Rust: - PNG via `png` crate (8/16-bit, ICC profiles) - JPEG via `zune-jpeg` (RGB and grayscale, ICC profiles) - PNM/PAM via `zenbitmaps` - ICC color management via moxcms (sRGB parametric TRC) - Opaque alpha stripping, 16-bit big-endian byte-swap - Remove avif/webp/mozjpeg features (C deps eliminated)
- Edition 2021 → 2024, rust-version 1.72 → 1.85
- #[no_mangle] → #[unsafe(no_mangle)] in c_api.rs
- Add unsafe {} blocks inside unsafe fns (edition 2024 requirement)
- cargo fmt for edition 2024 import sorting rules
- Add #![deny(unsafe_code)] crate-level lint to dssim-core - Replace uninit_f32_vec (unsafe set_len) with zeroed_f32_vec (vec![0.0; n]) - #[allow(unsafe_code)] only on c_api and ffi modules (FFI boundary) - blur module is now fully safe — no allow needed - unsafe extern "C" block in ffi.rs (edition 2024 requirement) - All non-FFI modules are verified unsafe-free (including SIMD via archmage)
Point at imazen/zenbitmaps rev 818992c instead of local path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Experimental branch showing 1.7-2.4x speedup (CPU user time) on typical images. This is AI-generated and not intended for merging as-is — it's meant as inspiration for future optimizations.
What's here
deny(unsafe_code)with targeted allows on FFI onlyBenchmarks
CPU user time, averaged over 3-5 runs:
Caveats
zenbitmapsis a git dependency (not published to crates.io)avif/webp/mozjpegfeature support (C deps removed)