x86: Add VMOVMSKPS, VPMOVZXBD, VZEROUPPER instruction models by jakemas · Pull Request #387 · awslabs/s2n-bignum

jakemas · 2026-04-08T03:52:51Z

Summary

Add semantic models for three x86 AVX instructions needed by mldsa-native's rej_uniform formal verification (pq-code-package/mldsa-native#1014):

VMOVMSKPS: Extract sign bits (bit 31) from each 32-bit lane of a YMM register into a GPR. Used to build a comparison mask after VPSUBD rejection testing.
VPMOVZXBD: Zero-extend bytes to dwords (8→32 bit). Used to expand table lookup indices from the VPERMD compaction table.
VZEROUPPER: Clear upper 128 bits of all YMM registers. Modeled as a no-op (like ENDBR64) since the proof framework tracks YMM registers as full 256-bit values and the instruction only affects performance, not correctness.

Includes decoder entries, instruction type constructors, semantic definitions, simulator test cases, and execution dispatch.

Having some issues with the Sematest CI:

list-x86-insns.sh fix: The sematest script tools/list-x86-insns.sh generates
memory-operand test cases by disassembling all .S files, extracting instructions
with [...] operands, and replacing them with [rsp+32]. This breaks for rep movs
which has two implicit memory operands (es:[rdi], ds:[rsi]) — replacing [rdi] with
[rsp+32] produces an invalid instruction. This is a pre-existing issue (reproduces
on main) but only surfaces in CI due to runner/assembler version differences. Fixed
by filtering out movs from the memory instruction list, matching the existing
filters for rsp/rip/lea/push/pop/ret/call/j*.

Add semantic models for three x86 AVX instructions needed by mldsa-native's rej_uniform formal verification: - VMOVMSKPS: Extract sign bits (bit 31) from each 32-bit lane of a YMM/XMM register into a GPR. Used to build a comparison mask after VPSUBD rejection testing. - VPMOVZXBD: Zero-extend bytes to dwords (8->32 bit). Used to expand table lookup indices from the VPERMD compaction table. - VZEROUPPER: Clear upper 128 bits of all YMM registers. Modeled as a no-op (like ENDBR64) since the proof framework tracks YMM registers as full 256-bit values and the instruction only affects performance, not correctness. Includes decoder entries, instruction type constructors, semantic definitions, simulator test cases, and execution dispatch. Signed-off-by: jakemas <jakemas@amazon.com>

VZEROUPPER zeros bits 128-511 of ZMM0-ZMM15 while preserving the lower 128 bits (XMM values). Model this by writing each XMM register's current value back through the zero-extending XMM component path, which automatically zeros the upper bits of the containing ZMM register.

The XMM-based model (XMM := read XMM s) chains through two zerotop wrappers (zerotop_128 then zerotop_256), creating deeply nested word_zx terms that the sematest cosimulation tactics cannot simplify. Write to YMM directly with explicit word_subword/word_zx instead, which only goes through one zerotop layer (zerotop_256 to ZMM). Semantically identical: preserves lower 128 bits, zeros upper bits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- nix/s2n_bignum/default.nix: bump pin to awslabs/s2n-bignum#387 head (adds VMOVMSKPS, VPMOVZXBD, and VZEROUPPER instruction models required by rej_uniform). - proofs/hol_light/x86_64/proofs/mldsa_rej_uniform.ml: - Fix namespaced imports ('s2n_bignum/', 'mldsa_native/') to pass scripts/check-hol-light-imports lint. - Replace `(*)` multiplication-operator term with `( * )` in vpermd_factor_for so the OCaml lexer doesn't read `(*` as a comment-open when the file is processed through inline_load. - mldsa/src/native/x86_64/src/rej_uniform_avx2.S: autogen-applied assembly normalization (tabs -> spaces, ELF note footer). With these fixes the x86_64 HOL-Light bytecode dump, lint, and autogen dry-run all succeed under `nix develop .#hol_light`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Jake Massimo <jakemas@amazon.com>

jakemas · 2026-05-07T22:27:10Z

Note: It looks like we're pulling vzeroupper out of the code (CR comments), but we can keep it modeled here if we like, I was able to get the proof working using it.

jargh · 2026-05-15T00:09:57Z

Looks great to me except for one thing which it might be nice to make cleaner. The decoding of VMOVMSKPS accepts both XMM and YMM forms but the eventual instruction dispatch always treats the operand as OPERAND256. So for example we'd see a simulator failure in
[0xc5; 0xf8; 0x50; 0xc1] (* vmovmskps eax, xmm1 -- VEX.L=0, 128-bit form *)
It would be nice either to make the decoder return NONE in case L is false (an easy fix) or actually handle both cases in the dispatch (maybe a bit more work for no immediate application, since the core instruction semantics x86_VMOVMSKPS only handles the 256-bit case too).

The instruction semantics only handles the 256-bit (VEX.L=1) case. Reject the 128-bit form in the decoder rather than silently producing an incorrect operand size from a 128-bit encoding. Signed-off-by: Jake Massimo <jakemas@amazon.com> Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-118.us-west-2.compute.internal>

jargh

Thanks, this all looks great and all tests are successful. One minor thing I'd be inclined to add is some simple memory tests in the simulator's simple_memory_iclasses list for the vpmovzxbd instruction. I added the following and they all worked fine with a further make sematest run, which was a nice additional sanity check:

  [0xc4; 0xe2; 0x79; 0x31; 0x0c; 0x24];        (* vpmovzxbd xmm1,DWORD PTR [rsp] *)
  [0xc4; 0x62; 0x79; 0x31; 0x0c; 0x24];        (* vpmovzxbd xmm9,DWORD PTR [rsp] *)
  [0xc4; 0xe2; 0x7d; 0x31; 0x54; 0x24; 0x40];  (* vpmovzxbd ymm2,QWORD PTR [rsp+0x40] *)
  [0xc4; 0x62; 0x7d; 0x31; 0x7c; 0x24; 0x40];  (*  vpmovzxbd ymm15,QWORD PTR [rsp+0x40] *)

However I don't see this as a reason not to merge (since those did all work!) so I'll approve :-)

jakemas force-pushed the jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper branch 7 times, most recently from 6e7ea14 to eadf4ac Compare April 8, 2026 19:02

jakemas force-pushed the jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper branch from eadf4ac to 2b23121 Compare April 8, 2026 19:40

jakemas and others added 7 commits April 12, 2026 23:58

Merge branch 'main' into jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

97f9789

Merge branch 'main' into jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

1e19874

Merge branch 'main' into jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

02c0dad

Merge branch 'main' into jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

95b1901

Merge branch 'main' into jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

4c4fe1d

jakemas mentioned this pull request May 5, 2026

x86_64 + HOL-Light: Replace rej_uniform intrinsics with assembly and HOL-Light CORRECT and MEMSAFE proofs pq-code-package/mldsa-native#1014

Open

jakemas mentioned this pull request May 6, 2026

ML-DSA x86 AVX2 rej_uniform CORRECT proof #401

Open

Merge branch 'main' into jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

9eed4aa

jakemas force-pushed the jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper branch from f2dc13d to 3f84379 Compare May 15, 2026 01:38

Merge branch 'main' into jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

5943221

jargh self-requested a review May 17, 2026 04:24

jargh approved these changes May 17, 2026

View reviewed changes

jargh merged commit 333cdfc into awslabs:main May 17, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86: Add VMOVMSKPS, VPMOVZXBD, VZEROUPPER instruction models#387

x86: Add VMOVMSKPS, VPMOVZXBD, VZEROUPPER instruction models#387
jargh merged 11 commits into
awslabs:mainfrom
jakemas:jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

jakemas commented Apr 8, 2026 •

edited

Loading

Uh oh!

jakemas commented May 7, 2026

Uh oh!

jargh commented May 15, 2026

Uh oh!

jargh left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jakemas commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

jakemas commented May 7, 2026

Uh oh!

jargh commented May 15, 2026

Uh oh!

jargh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakemas commented Apr 8, 2026 •

edited

Loading