Skip to content

Add ppc64le backend (supports p8 and above architectures)#1648

Open
dannytsen wants to merge 29 commits into
pq-code-package:mainfrom
dannytsen:new_main
Open

Add ppc64le backend (supports p8 and above architectures)#1648
dannytsen wants to merge 29 commits into
pq-code-package:mainfrom
dannytsen:new_main

Conversation

@dannytsen
Copy link
Copy Markdown

Updated patch for ML_KEM for ppc64le supports p8 and above architectures.

  1. Run scripts/autogen and scripts/lint on Mac but not sure if it runs for ppc64le.
  2. Run simpasm on Red Hat Linux.
  3. Added detailed comments on NTT and INTT implementations.
  4. Used C type symbols to improve readability.
  5. Fixed some typos.

Signed-off-by: Danny Tsen dtsen@us.ibm.com

The following tests were run on p10.

[09:28] danny@ltcden12-lp1 new_ppc64le_mlkem % ./scripts/tests func
INFO > Functional Test Compile (native no_opt): make func OPT=0 AUTO=1 -j40
INFO > Functional Test ML-KEM-512 (native no_opt): make run_func_512 -j40
INFO > Functional Test ML-KEM-768 (native no_opt): make run_func_768 -j40
INFO > Functional Test ML-KEM-1024 (native no_opt): make run_func_1024 -j40
INFO > Functional Test Compile (native opt): make func OPT=1 AUTO=1 -j40
INFO > Functional Test ML-KEM-512 (native opt): make run_func_512 -j40
INFO > Functional Test ML-KEM-768 (native opt): make run_func_768 -j40
INFO > Functional Test ML-KEM-1024 (native opt): make run_func_1024 -j40
All good!

[09:28] danny@ltcden12-lp1 new_ppc64le_mlkem % ./scripts/tests bench -c PERF
INFO > Benchmark Compile (native no_opt): make bench OPT=0 AUTO=1 CYCLES=PERF -j40
INFO > Benchmark ML-KEM-512 (native no_opt): make run_bench_512
INFO > Benchmark ML-KEM-512 (native no_opt): test/build/mlkem512/bin/bench_mlkem512
keypair cycles = 66982
encaps cycles = 78820
decaps cycles = 100923

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 66438 66690 66791 66857 66920 66982 67043 67122 67218 67306 71905
encaps percentiles: 78322 78516 78618 78687 78752 78820 78878 78933 79012 79116 83825
decaps percentiles: 100427 100634 100733 100804 100869 100923 100985 101056 101131 101253 105852

INFO > Benchmark ML-KEM-768 (native no_opt): make run_bench_768
INFO > Benchmark ML-KEM-768 (native no_opt): test/build/mlkem768/bin/bench_mlkem768
keypair cycles = 111380
encaps cycles = 125891
decaps cycles = 154364

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 110575 110914 111083 111192 111291 111380 111496 111617 111821 112414 116725
encaps percentiles: 125081 125403 125526 125655 125776 125891 125998 126122 126293 126771 131358
decaps percentiles: 153575 153870 154008 154131 154261 154364 154487 154630 154782 155313 159863

INFO > Benchmark ML-KEM-1024 (native no_opt): make run_bench_1024
INFO > Benchmark ML-KEM-1024 (native no_opt): test/build/mlkem1024/bin/bench_mlkem1024
keypair cycles = 166809
encaps cycles = 185315
decaps cycles = 220229

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 165339 165995 166236 166435 166616 166809 167007 167200 167505 171058 175606
encaps percentiles: 183839 184563 184778 184951 185158 185315 185518 185744 186123 189637 192014
decaps percentiles: 218911 219430 219705 219841 220027 220229 220436 220673 221029 224484 226901

INFO > Benchmark Compile (native opt): make bench OPT=1 AUTO=1 CYCLES=PERF -j40
INFO > Benchmark ML-KEM-512 (native opt): make run_bench_512
INFO > Benchmark ML-KEM-512 (native opt): test/build/mlkem512/bin/bench_mlkem512
keypair cycles = 45750
encaps cycles = 50661
decaps cycles = 63561

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 45248 45469 45546 45620 45690 45750 45806 45886 45954 46063 50703
encaps percentiles: 50192 50367 50468 50542 50600 50661 50710 50771 50858 50954 55652
decaps percentiles: 63091 63276 63381 63436 63497 63561 63623 63679 63743 63857 68437

INFO > Benchmark ML-KEM-768 (native opt): make run_bench_768
INFO > Benchmark ML-KEM-768 (native opt): test/build/mlkem768/bin/bench_mlkem768
keypair cycles = 79045
encaps cycles = 86455
decaps cycles = 103878

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 78313 78578 78742 78847 78954 79045 79169 79285 79470 79978 84430
encaps percentiles: 85628 86009 86172 86272 86363 86455 86592 86711 86879 87292 92038
decaps percentiles: 103041 103399 103540 103676 103788 103878 103993 104104 104274 104736 109361

INFO > Benchmark ML-KEM-1024 (native opt): make run_bench_1024
INFO > Benchmark ML-KEM-1024 (native opt): test/build/mlkem1024/bin/bench_mlkem1024
keypair cycles = 124072
encaps cycles = 134500
decaps cycles = 157090

       percentile      1     10     20     30     40     50     60     70     80     90     99

keypair percentiles: 122727 123259 123515 123720 123929 124072 124253 124527 125009 128466 133334
encaps percentiles: 133064 133681 133933 134129 134320 134500 134711 134933 135346 138753 141067
decaps percentiles: 155503 156261 156510 156694 156894 157090 157285 157605 158014 161592 166723

All good!

@dannytsen dannytsen requested a review from a team as a code owner April 6, 2026 22:16
@dannytsen dannytsen mentioned this pull request Apr 6, 2026
5 tasks
@dannytsen
Copy link
Copy Markdown
Author

@bhess @mkannwischer Please review. Thanks.

@mkannwischer
Copy link
Copy Markdown
Contributor

@bhess @mkannwischer Please review. Thanks.

Thanks @dannytsen for rebasing this PR. Could you get it through CI please?

@mkannwischer mkannwischer self-assigned this Apr 7, 2026
Copy link
Copy Markdown
Contributor

@bhess bhess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, @dannytsen.

Minor comments are inline.

supports p8 and above architectures.

I do not think this is correct as written. I see illegal instructions with qemu-ppc64le -cpu power8, while the same code runs with -cpu power9. In particular, lxvx and stxvx do not appear to be supported on p8.

Can you please double-check this and update the requirement to POWER9 if that is indeed the minimum supported target?

Comment thread dev/ppc64le/src/arith_native_ppc64le.h Outdated
Comment thread dev/ppc64le/src/consts.c Outdated
Comment thread dev/ppc64le/src/reduce.S Outdated
Comment thread dev/ppc64le/src/poly_tomont.S Outdated
Comment thread dev/ppc64le/src/reduce.S Outdated
Comment thread mlkem/src/native/ppc64le/meta.h
Comment thread dev/ppc64le/src/poly_tomont.S Outdated
Comment thread dev/ppc64le/src/reduce.S Outdated
@dannytsen
Copy link
Copy Markdown
Author

dannytsen commented Apr 7, 2026

Thanks for the updates, @dannytsen.

Minor comments are inline.

supports p8 and above architectures.

I do not think this is correct as written. I see illegal instructions with qemu-ppc64le -cpu power8, while the same code runs with -cpu power9. In particular, lxvx and stxvx do not appear to be supported on p8.

Can you please double-check this and update the requirement to POWER9 if that is indeed the minimum supported target?

@mkannwischer I tested on p8 system. I have to double check that. Ah, I found the problem, my p8 supports ISA 2.07 which means the qemu supports older one. I'll fix that along with @bhess comments. Thanks.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess Submitted an updated code after re-run autogen. Please review.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

Thanks for the progress.

You can ignore the two OpenTitan tests (those should not be run on a fork because the AWS auth fails) and the markdown lint (that is a spurious failure).
The ppc one needs investigation - you should be able to run the examples locally to reproduce it.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

Thanks for the progress.

You can ignore the two OpenTitan tests (those should not be run on a fork because the AWS auth fails) and the markdown lint (that is a spurious failure). The ppc one needs investigation - you should be able to run the examples locally to reproduce it.

OK. I'll check into the ppc one. Thanks.

@bhess
Copy link
Copy Markdown
Contributor

bhess commented Apr 8, 2026

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

Thanks for the updates. I have a few more minor follow-ups inline.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

Thanks for the updates. I have a few more minor follow-ups inline.

@bhess I did not see your follow-ups or I missed it. Can you post it again? Thanks.

Comment thread mlkem/src/native/ppc64le/README.md Outdated
Comment thread dev/ppc64le/README.md Outdated
Comment thread dev/ppc64le/src/poly_tomont.S Outdated
Comment thread dev/ppc64le/src/reduce.S Outdated
#define V_26 2
#define V_MKQ 3

.machine "any"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this directive will get removed after simpasm.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it still make sense to include it in the dev-folder then, e.g., for documentation purposes?

Copy link
Copy Markdown
Author

@dannytsen dannytsen Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhess Well, this code still supports p8 but with ISA 2.07. Some older p8s don't support ISA 2.07. Looks like qemu supports older p8 not with ISA 2.07. In general, we support p9 but also p8 with ISA 2.07.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, is there any way to make this distinction with the .machine pseudo-op?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's possible. I'll think about it.

@bhess
Copy link
Copy Markdown
Contributor

bhess commented Apr 9, 2026

@mkannwischer @bhess Just submitted my changes and stated that the supporting architecture to be p9 and above or Power systems support ISA 2.07 and other fixes. Hope this solve all the issues. Thanks.

Thanks for the updates. I have a few more minor follow-ups inline.

@bhess I did not see your follow-ups or I missed it. Can you post it again? Thanks.

Should be visible now.

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer @bhess Not sure what's the errors for 4 CI errors. This was happened after I ran the autogen.

Thanks for the progress.
You can ignore the two OpenTitan tests (those should not be run on a fork because the AWS auth fails) and the markdown lint (that is a spurious failure). The ppc one needs investigation - you should be able to run the examples locally to reproduce it.

OK. I'll check into the ppc one. Thanks.

If you rebase on top of main, the OpenTitan tests should be properly skipped (fixed in #1649)

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

@bhess
Copy link
Copy Markdown
Contributor

bhess commented Apr 9, 2026

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

I’ve opened a PR against your branch that updates autogen.
Running the updated autogen should produce identical results to your current manually modified code. PR: dannytsen#1

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

I’ve opened a PR against your branch that updates autogen. Running the updated autogen should produce identical results to your current manually modified code. PR: dannytsen#1

@bhess Thanks.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess I manually modified mlkem/mlkem_native_asm.S and mlkem/mlkem_native.c to add ppc64le include so to pass the monolithic_build_multilevel_native CI tests because autogen can't handle the ppc64le platform. So I am not sure how to fix the test unless autogen get fixed.

I’ve opened a PR against your branch that updates autogen. Running the updated autogen should produce identical results to your current manually modified code. PR: dannytsen#1

@bhess Thanks.

@bhess I merged the autogen and rerun the autogen. Hope this commit fix the problem. Thanks.

@dannytsen
Copy link
Copy Markdown
Author

@bhess I just added a capability check of PPC_FEATURE2_ARCH_2_07. It will fall back to the original implementation if the architecture doesn't support ISA 2.07 and above which include some p8 and p9 above. Please review. Thanks.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer @bhess Any feedback on the recent commit? Thanks.

@dannytsen
Copy link
Copy Markdown
Author

@mkannwischer In qemu, power8e supports ISA 2.07 which is a variant of p8 with DD2.

Copy link
Copy Markdown
Contributor

@bhess bhess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @dannytsen.
The runtime capability check goes in a slightly different direction than what I had in mind. My suggestion was about whether the .machine pseudo-op can express something like "POWER8 with ISA 2.07 support". If there's no way to express this with .machine, I'd suggest just removing it from the assembly and documenting the ISA 2.07 requirement in the project docs / README instead (instead of machine "any").

For the dynamic getauxval approach: I wonder if runtime feature detection might be better left to downstream consumers of mlkem-native (e.g., liboqs) rather than adding it here. There's also the consideration that getauxval/<sys/auxv.h> is not portable AFAIK. Any thoughts @mkannwischer ?

Comment thread mlkem/src/native/ppc64le/meta.h Outdated
@dannytsen
Copy link
Copy Markdown
Author

Thanks for the update, @dannytsen. The runtime capability check goes in a slightly different direction than what I had in mind. My suggestion was about whether the .machine pseudo-op can express something like "POWER8 with ISA 2.07 support". If there's no way to express this with .machine, I'd suggest just removing it from the assembly and documenting the ISA 2.07 requirement in the project docs / README instead (instead of machine "any").

For the dynamic getauxval approach: I wonder if runtime feature detection might be better left to upstream consumers of mlkem-native (e.g., liboqs) rather than adding it here. There's also the consideration that getauxval/<sys/auxv.h> is not portable AFAIK. Any thoughts @mkannwischer ?

@bhess ".machine any" is already a psudo-op. I can remove that in dev/ppc64le/src. The README is already had the update.

For getauxval approach, it's to prevent the code run on a older Power systems that did not support proper instaructions.

Copy link
Copy Markdown
Contributor

@bhess bhess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this after a closer look at runtime CPU checks in mlkem-native:
my remaining concern is the runtime capability check in the PR. mlkem-native already has a designed-in mechanism for exactly this case, and I think the PR should use it rather than doing its own in the native backend.

Concretely, this would mean adding MLK_SYS_CAP_PPC_ISA_2_07 to the enum,
replacing the four mlkem_ppc_check_cap() calls with
mlk_sys_check_capability(...), and shipping the auxv code as a new
test/configs/custom_native_capability_config_PPC_AUXV.h. Specifics are inline.

@mkannwischer: please correct me if that's not the intended pattern, and feel free to share any additional review.

Comment thread dev/ppc64le/meta.h Outdated
#include "../api.h"
#include "src/arith_native_ppc64le.h"

#include <sys/auxv.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this header is pulled in by everyone that uses the native backend (via src/native/meta.h), this include makes the whole native build glibc-dependent.

The existing AVX2 and NEON backends don't query the OS, they assume the ISA the build was configured for, and runtime dispatch is delegated to the consumer through MLK_CONFIG_CUSTOM_CAPABILITY_FUNC (see test/configs/custom_native_capability_config_CPUID_AVX2.h and ..._ID_AA64PFR1_EL1.h).
I'd advise to use the same pattern here: drop this include, and move the auxv-based detection into a new
test/configs/custom_native_capability_config_PPC_AUXV.h example.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is auto generated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, anyway I think this is moot and not a blocker if the checks are removed for now (see the other thread). I can handle the autogen in a followup.

Comment thread dev/ppc64le/meta.h
}

MLK_MUST_CHECK_RETURN_VALUE
static MLK_INLINE int mlk_ntt_native(int16_t data[MLKEM_N])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested replacement (matching with AVX2/NEON, should be applied to the other functions in this file as well):

/* in mlkem/src/sys.h: add MLK_SYS_CAP_PPC_ISA_2_07 to mlk_sys_cap */

MLK_MUST_CHECK_RETURN_VALUE
static MLK_INLINE int mlk_ntt_native(int16_t data[MLKEM_N])
{
  if (!mlk_sys_check_capability(MLK_SYS_CAP_PPC_ISA_2_07))
    return MLK_NATIVE_FUNC_FALLBACK;
  mlk_ntt_ppc(data, mlk_ppc_qdata);
  return MLK_NATIVE_FUNC_SUCCESS;
}

The default mlk_sys_check_capability returns 1 -> the native path is always taken when the backend is selected, matching the build-time default used elsewhere. Consumers that want runtime dispatch supply their own; the auxv code from this PR can ship as a test/configs/custom_native_capability_config_PPC_AUXV.h example.

Once the capability check is centralised on mlk_sys_check_capability(MLK_SYS_CAP_PPC_ISA_2_07), all four wrappers stay structurally identical but no longer reference a private static (ppc_inited).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhess I disagreed with you. The purpose of that I want it to check the capability is not just checking the ISA 2.07 but also to be able to added different architecture support in the future like p10 specific or p11 etc. So, I'll be able to re-assign function pointer to a different entry point if needed. So, mlk_ntt_ppc will be a function pointer to a different entry depending on the architecture. That's for future arch support. I can remove this check now. I just want to get the implementation merged quickly because we have been over this implementation for months if not 6 months.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good, I don't think runtime checks are a blocker for this PR, so removing them seems fine and can be iterated later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a function pointer based dispatch; unless there's a good reason to do it otherwise, dispatch should follow the pattern of checking architecture flags and calling the correct function directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a function pointer based dispatch; unless there's a good reason to do it otherwise, dispatch should follow the pattern of checking architecture flags and calling the correct function directly.

I agree. For this PR, I’m in favor of removing the checks. We can defer runtime capability checks to a later iteration if we need to support multiple architectures/ISA levels in one build.

Copy link
Copy Markdown
Contributor

@bhess bhess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @dannytsen. I re-reviewed this together with the earlier discussion in #1184. The removal of the runtime capability check resolves my previous comment, and the main readability points raised on the earlier PR also seem addressed for this iteration. I’m not the maintainer, but from my side this iteration now looks good and I’d support merging it.

bhess added 2 commits May 19, 2026 04:31
Signed-off-by: Danny Tsen <dtsen@us.ibm.com>
… twisted zetas.

Signed-off-by: Danny Tsen <dtsen@us.ibm.com>
@dannytsen
Copy link
Copy Markdown
Author

@bhess @mkannwischer merged all @bhess 's new autogen, simpasm and Use Barrett for NTT twiddle products.

Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I left another round of comments.
Could you kindly also rebase the changes against main and make sure the diff is as you expect? Right now this PR contains a lot of unrelated changes making it hard to review.

I'm still testing the autogeneration and will get back on that shortly.

Comment thread integration/liboqs/config_ppc64le.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the liboqs integration from this PR. @bhess will add it in a follow-up once it's confirmed to work.

Comment thread test/mk/components.mk Outdated
@@ -9,6 +9,7 @@ endif
SOURCES += $(wildcard mlkem/src/*.c)
ifeq ($(OPT),1)
SOURCES += $(wildcard mlkem/src/native/aarch64/src/*.[csS]) $(wildcard mlkem/src/native/x86_64/src/*.[csS]) $(wildcard mlkem/src/native/riscv64/src/*.[csS])
SOURCES += $(wildcard mlkem/src/native/ppc64le/src/*.[csS])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation here is still off.

Comment thread mlkem/src/native/ppc64le/src/consts.c Outdated
#include "../../../common.h"

#if defined(MLK_ARITH_BACKEND_PPC64LE_DEFAULT) && \
!defined(MLK_CONFIG_MULTILEVEL_NO_SHARED)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the POWER8_VECTOR guard here.

Comment thread mlkem/src/native/ppc64le/src/consts.c Outdated

#include "consts.h"

MLK_ALIGN const int16_t mlk_ppc_qdata[] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add MLK_INTERNAL_DATA_DEFINITION here (allowing to mark this as static - see #1658).

Comment thread mlkem/src/native/ppc64le/src/consts.h Outdated

#ifndef __ASSEMBLER__
#define mlk_ppc_qdata MLK_NAMESPACE(ppc_qdata)
extern const int16_t mlk_ppc_qdata[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extern const int16_t mlk_ppc_qdata[];
MLK_INTERNAL_DATA_DECLARATION const int16_t mlk_ppc_qdata[];

Comment thread flake.nix
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are autogenerating ppc asm now, the autogeneration CI needs to be adjusted.
Can you extend the cross-autogen devShell please:

          devShells.cross-autogen = util.mkShell {
            packages = builtins.attrValues { inherit (config.packages) linters toolchain_ppc64le; inherit (pkgs) gcc-arm-embedded; }
              ++ pkgs.lib.optionals pkgs.stdenv.hostPlatform.isx86_64 [ config.packages.toolchain_aarch64 ]
              ++ pkgs.lib.optionals pkgs.stdenv.hostPlatform.isAarch64 [ config.packages.toolchain_x86_64 ];
          };

Comment thread dev/ppc64le/src/ntt_ppc_asm.S Outdated
Comment on lines +273 to +274
* Yields the signed canonical representative of (b*z) mod q,
* bounded by q/2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. Barrett multiplication doesn't yield results bounded by q/2.
@hanno-becker, could you possibly take a look at this and suggest how to phrase this?

Comment on lines +1 to +9
/*
* Copyright (c) The mlkem-native project authors
* SPDX-License-Identifier: Apache-2.0 OR ISC OR MIT
*
* Copyright IBM Corp. 2025, 2026
*
* ===================================================================================
* Written by Danny Tsen <dtsen@us.ibm.com>
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
* Copyright (c) The mlkem-native project authors
* SPDX-License-Identifier: Apache-2.0 OR ISC OR MIT
*
* Copyright IBM Corp. 2025, 2026
*
* ===================================================================================
* Written by Danny Tsen <dtsen@us.ibm.com>
*/
* Copyright(c) The mlkem-native project authors
* Copyright (c) IBM Corp. 2025, 2026
* SPDX-License-Identifier: Apache-2.0 OR ISC OR MIT
*
* Written by Danny Tsen <dtsen@us.ibm.com>
*/

Comment thread .github/workflows/ci.yml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment https://github.com/pq-code-package/mlkem-native/pull/1648/changes#r3187159006 regarding the CI wasn't addressed.

Could you please extend the cross ppc64le CI, so that is exercises the new backend.
Also to test the guard, we should also add a test for a platform not supporting the power 8 vector instructions.

This patch should do the job:

ppc64le-power7-ci.patch

Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run

nix develop .#cross-ppc64le
autogen

I get a diff in the following files:

  • mlkem/src/native/ppc64le/src/intt_ppc_asm.S
  • mlkem/src/native/ppc64le/src/ntt_ppc_asm.S
  • mlkem/src/native/ppc64le/src/poly_tomont_ppc_asm.S
  • mlkem/src/native/ppc64le/src/reduce_ppc_asm.S

Please commit the autogen'd files. Once the cross-autogen shell has been updated this should also surface in CI.

Signed-off-by: Basil Hess <bhe@zurich.ibm.com>
@bhess
Copy link
Copy Markdown
Contributor

bhess commented May 20, 2026

When I run

nix develop .#cross-ppc64le
autogen

I get a diff in the following files:

  • mlkem/src/native/ppc64le/src/intt_ppc_asm.S
  • mlkem/src/native/ppc64le/src/ntt_ppc_asm.S
  • mlkem/src/native/ppc64le/src/poly_tomont_ppc_asm.S
  • mlkem/src/native/ppc64le/src/reduce_ppc_asm.S

Please commit the autogen'd files. Once the cross-autogen shell has been updated this should also surface in CI.

Thanks for the review and for checking this @mkannwischer.
I re-ran autogen in the nix cross-ppc64le environment, so this should now match what you see locally (previously ran it on macOS where the toolchain appears to produce slightly different assembly). I also addressed the other review comments in a PR against @dannytsen’s branch: dannytsen#4.

The only item I have not updated yet is the Barrett-bounds wording in the comment, since I did not want to guess the preferred phrasing.

@mkannwischer
Copy link
Copy Markdown
Contributor

When I run

nix develop .#cross-ppc64le
autogen

I get a diff in the following files:

  • mlkem/src/native/ppc64le/src/intt_ppc_asm.S
  • mlkem/src/native/ppc64le/src/ntt_ppc_asm.S
  • mlkem/src/native/ppc64le/src/poly_tomont_ppc_asm.S
  • mlkem/src/native/ppc64le/src/reduce_ppc_asm.S

Please commit the autogen'd files. Once the cross-autogen shell has been updated this should also surface in CI.

Thanks for the review and for checking this @mkannwischer. I re-ran autogen in the nix cross-ppc64le environment, so this should now match what you see locally (previously ran it on macOS where the toolchain appears to produce slightly different assembly). I also addressed the other review comments in a PR against @dannytsen’s branch: dannytsen#4.

Thanks for the quick turnaround and all the changes! That autogen issue sounds annoying. Was that both within nix?

The only item I have not updated yet is the Barrett-bounds wording in the comment, since I did not want to guess the preferred phrasing.

Okay, got it! I'm planning to do one more pass on the documentation once everything works.

@bhess
Copy link
Copy Markdown
Contributor

bhess commented May 20, 2026

Thanks for the quick turnaround and all the changes! That autogen issue sounds annoying. Was that both within nix?

Not initially. The earlier macOS run used the Homebrew toolchain directly, not the nix cross-ppc64le environment. Now I'm using a dockerized nix which seems the most convenient way to use it on macOS.

@mkannwischer
Copy link
Copy Markdown
Contributor

Thanks for the quick turnaround and all the changes! That autogen issue sounds annoying. Was that both within nix?

Not initially. The earlier macOS run used the Homebrew toolchain directly, not the nix cross-ppc64le environment. Now I'm using a dockerized nix which seems the most convenient way to use it on macOS.

Okay, then I am less surprised that it produced different asm.
Why can you not use the cross-ppc64le nix shell on MacOS directly?

@bhess
Copy link
Copy Markdown
Contributor

bhess commented May 20, 2026

Why can you not use the cross-ppc64le nix shell on MacOS directly?

I could, but on macOS the nix setup was much slower, likely because fewer prebuilt packages available than in the dockerized Linux environment. Setting it up in docker was much faster. I may be missing a simpler way to do this on macOS though, since I am not very familiar with nix.

@mkannwischer
Copy link
Copy Markdown
Contributor

Why can you not use the cross-ppc64le nix shell on MacOS directly?

I could, but on macOS the nix setup was much slower, likely because fewer prebuilt packages available than in the dockerized Linux environment. Setting it up in docker was much faster. I may be missing a simpler way to do this on macOS though, since I am not very familiar with nix.

Sounds plausible that nix develop .#cross-ppc64le is much slower on MacOS than on Linux. I'm also not entirely sure why that is the case and if there is anything we can do about it. It's a one-time cost though.
Anyway, glad you got it to work. Thanks!

@hanno-becker
Copy link
Copy Markdown
Contributor

I agree the cross shells are extremely slow to build (hours for me), but as @mkannwischer said, it's a one-off.

dannytsen and others added 3 commits May 20, 2026 07:00
ppc64le: address review comments from mkannwischer
Signed-off-by: Basil Hess <bhe@zurich.ibm.com>
Signed-off-by: Basil Hess <bhe@zurich.ibm.com>
@bhess
Copy link
Copy Markdown
Contributor

bhess commented May 20, 2026

Could you kindly also rebase the changes against main and make sure the diff is as you expect? Right now this PR contains a lot of unrelated changes making it hard to review.

I pushed an update merging main into this branch, so the diff should look clean now.

@mkannwischer
Copy link
Copy Markdown
Contributor

Could you kindly also rebase the changes against main and make sure the diff is as you expect? Right now this PR contains a lot of unrelated changes making it hard to review.

I pushed an update merging main into this branch, so the diff should look clean now.

Thanks! The diff looks much easier on the eyes now. At first glance the changes look great - thank you!
Python linting and magic-constant linting still fails. Could you take a look at that, please?

Once CI works here, I'll update the other PR to run the complete CI and then do another review. I expect that all that's left is aligning documentation. Thanks for the patience!

- ruff format scripts/autogen (formatting fix)
- Add check-magic annotation for array size 2072 in consts.c and consts.h
  (7 groups of 8 base constants + 4 twiddle tables * 63 rows * 8 values)

Signed-off-by: Basil Hess <bhe@zurich.ibm.com>
@bhess
Copy link
Copy Markdown
Contributor

bhess commented May 21, 2026

Thanks @mkannwischer. I pushed an update that fixes the linting (linting now passes locally).

@mkannwischer
Copy link
Copy Markdown
Contributor

Thanks @mkannwischer. I pushed an update that fixes the linting (linting now passes locally).

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

@bhess
Copy link
Copy Markdown
Contributor

bhess commented May 21, 2026

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

Yes just noticed it, will take a look.

@mkannwischer
Copy link
Copy Markdown
Contributor

For the HOL-Light failures, a workaround would be to add toolchain_ppc64le to the hol_light-cross, hol_light-cross-aarch64, hol_light-cross-x86_64 shells. That's what we did for gcc-arm-embedded, too. That is fine for now until @hanno-becker or I can take a look on how to make --force-cross a little more selective.

bhess added 2 commits May 21, 2026 13:27
…power8 (nix libc is compiled for power8 and otherwise causes illegal instructions.

Avoids unused data parameter errors in the fallback code path.

Signed-off-by: Basil Hess <bhe@zurich.ibm.com>
Signed-off-by: Basil Hess <bhe@zurich.ibm.com>
@bhess
Copy link
Copy Markdown
Contributor

bhess commented May 21, 2026

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

The Illegal instruction issue doen't appear to come from the mlkem code itself, but from the nix toolchain that uses a glibc compiled for power8, so binaries fault when run under qemu-ppc64le -cpu power7.
As a workaround, I changed the CI to still compile with -mcpu=power7 to enable the fallback, but to run the resulting binary with qemu-ppc64le -cpu power8 so that the binary runs properly. I think a cleaner long-term fix would be a separate ppc64le toolchain built for power7.

I also reproduced the fallback path locally. While doing that I hit an issue in the fallback-path (unused variable) fixed in commit 3442786.

For the HOL-Light failures, a workaround would be to add toolchain_ppc64le to the hol_light-cross, hol_light-cross-aarch64, hol_light-cross-x86_64 shells. That's what we did for gcc-arm-embedded, too. That is fine for now until @hanno-becker or I can take a look on how to make --force-cross a little more selective.

Done in the latest commit.

@mkannwischer
Copy link
Copy Markdown
Contributor

Thanks! There are some issues in the POWER-7 tests: https://github.com/pq-code-package/mlkem-native/actions/runs/26212206823/job/77130946223?pr=1648. Could you take a look please?

The Illegal instruction issue doen't appear to come from the mlkem code itself, but from the nix toolchain that uses a glibc compiled for power8, so binaries fault when run under qemu-ppc64le -cpu power7. As a workaround, I changed the CI to still compile with -mcpu=power7 to enable the fallback, but to run the resulting binary with qemu-ppc64le -cpu power8 so that the binary runs properly. I think a cleaner long-term fix would be a separate ppc64le toolchain built for power7.

I also reproduced the fallback path locally. While doing that I hit an issue in the fallback-path (unused variable) fixed in commit 3442786.

Thanks for investigating. That's annoying. I think this solution is acceptable for now and I'm glad that it at least revealed some problems we would have otherwise missed.

I have updated #1677 so we can run the full CI and benchmarks there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark this PR should be benchmarked in CI ppc64le

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants