Skip to content

x86: Add VMOVMSKPS, VPMOVZXBD, VZEROUPPER instruction models#387

Merged
jargh merged 11 commits into
awslabs:mainfrom
jakemas:jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper
May 17, 2026
Merged

x86: Add VMOVMSKPS, VPMOVZXBD, VZEROUPPER instruction models#387
jargh merged 11 commits into
awslabs:mainfrom
jakemas:jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper

Conversation

@jakemas
Copy link
Copy Markdown
Contributor

@jakemas jakemas commented Apr 8, 2026

Summary

Add semantic models for three x86 AVX instructions needed by mldsa-native's rej_uniform formal verification (pq-code-package/mldsa-native#1014):

  • VMOVMSKPS: Extract sign bits (bit 31) from each 32-bit lane of a YMM register into a GPR. Used to build a comparison mask after VPSUBD rejection testing.
  • VPMOVZXBD: Zero-extend bytes to dwords (8→32 bit). Used to expand table lookup indices from the VPERMD compaction table.
  • VZEROUPPER: Clear upper 128 bits of all YMM registers. Modeled as a no-op (like ENDBR64) since the proof framework tracks YMM registers as full 256-bit values and the instruction only affects performance, not correctness.

Includes decoder entries, instruction type constructors, semantic definitions, simulator test cases, and execution dispatch.

Having some issues with the Sematest CI:

list-x86-insns.sh fix: The sematest script tools/list-x86-insns.sh generates
memory-operand test cases by disassembling all .S files, extracting instructions
with [...] operands, and replacing them with [rsp+32]. This breaks for rep movs
which has two implicit memory operands (es:[rdi], ds:[rsi]) — replacing [rdi] with
[rsp+32] produces an invalid instruction. This is a pre-existing issue (reproduces
on main) but only surfaces in CI due to runner/assembler version differences. Fixed
by filtering out movs from the memory instruction list, matching the existing
filters for rsp/rip/lea/push/pop/ret/call/j*.

@jakemas jakemas force-pushed the jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper branch 7 times, most recently from 6e7ea14 to eadf4ac Compare April 8, 2026 19:02
Add semantic models for three x86 AVX instructions needed by
mldsa-native's rej_uniform formal verification:

- VMOVMSKPS: Extract sign bits (bit 31) from each 32-bit lane of a
  YMM/XMM register into a GPR. Used to build a comparison mask after
  VPSUBD rejection testing.

- VPMOVZXBD: Zero-extend bytes to dwords (8->32 bit). Used to expand
  table lookup indices from the VPERMD compaction table.

- VZEROUPPER: Clear upper 128 bits of all YMM registers. Modeled as
  a no-op (like ENDBR64) since the proof framework tracks YMM
  registers as full 256-bit values and the instruction only affects
  performance, not correctness.

Includes decoder entries, instruction type constructors, semantic
definitions, simulator test cases, and execution dispatch.

Signed-off-by: jakemas <jakemas@amazon.com>
@jakemas jakemas force-pushed the jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper branch from eadf4ac to 2b23121 Compare April 8, 2026 19:40
jakemas and others added 7 commits April 12, 2026 23:58
VZEROUPPER zeros bits 128-511 of ZMM0-ZMM15 while preserving the
lower 128 bits (XMM values). Model this by writing each XMM register's
current value back through the zero-extending XMM component path,
which automatically zeros the upper bits of the containing ZMM register.
The XMM-based model (XMM := read XMM s) chains through two zerotop
wrappers (zerotop_128 then zerotop_256), creating deeply nested
word_zx terms that the sematest cosimulation tactics cannot simplify.

Write to YMM directly with explicit word_subword/word_zx instead,
which only goes through one zerotop layer (zerotop_256 to ZMM).
Semantically identical: preserves lower 128 bits, zeros upper bits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jakemas added a commit to pq-code-package/mldsa-native that referenced this pull request May 6, 2026
- nix/s2n_bignum/default.nix: bump pin to
  awslabs/s2n-bignum#387 head (adds VMOVMSKPS, VPMOVZXBD, and
  VZEROUPPER instruction models required by rej_uniform).
- proofs/hol_light/x86_64/proofs/mldsa_rej_uniform.ml:
  - Fix namespaced imports ('s2n_bignum/', 'mldsa_native/') to pass
    scripts/check-hol-light-imports lint.
  - Replace `(*)` multiplication-operator term with `( * )` in
    vpermd_factor_for so the OCaml lexer doesn't read `(*` as a
    comment-open when the file is processed through inline_load.
- mldsa/src/native/x86_64/src/rej_uniform_avx2.S: autogen-applied
  assembly normalization (tabs -> spaces, ELF note footer).

With these fixes the x86_64 HOL-Light bytecode dump, lint, and
autogen dry-run all succeed under `nix develop .#hol_light`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Jake Massimo <jakemas@amazon.com>
@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented May 7, 2026

Note: It looks like we're pulling vzeroupper out of the code (CR comments), but we can keep it modeled here if we like, I was able to get the proof working using it.

@jargh
Copy link
Copy Markdown
Contributor

jargh commented May 15, 2026

Looks great to me except for one thing which it might be nice to make cleaner. The decoding of VMOVMSKPS accepts both XMM and YMM forms but the eventual instruction dispatch always treats the operand as OPERAND256. So for example we'd see a simulator failure in
[0xc5; 0xf8; 0x50; 0xc1] (* vmovmskps eax, xmm1 -- VEX.L=0, 128-bit form *)
It would be nice either to make the decoder return NONE in case L is false (an easy fix) or actually handle both cases in the dispatch (maybe a bit more work for no immediate application, since the core instruction semantics x86_VMOVMSKPS only handles the 256-bit case too).

The instruction semantics only handles the 256-bit (VEX.L=1) case.
Reject the 128-bit form in the decoder rather than silently producing
an incorrect operand size from a 128-bit encoding.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-118.us-west-2.compute.internal>
@jakemas jakemas force-pushed the jakemas/add-x86-vmovmskps-vpmovzxbd-vzeroupper branch from f2dc13d to 3f84379 Compare May 15, 2026 01:38
@jargh jargh self-requested a review May 17, 2026 04:24
Copy link
Copy Markdown
Contributor

@jargh jargh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this all looks great and all tests are successful. One minor thing I'd be inclined to add is some simple memory tests in the simulator's simple_memory_iclasses list for the vpmovzxbd instruction. I added the following and they all worked fine with a further make sematest run, which was a nice additional sanity check:

  [0xc4; 0xe2; 0x79; 0x31; 0x0c; 0x24];        (* vpmovzxbd xmm1,DWORD PTR [rsp] *)
  [0xc4; 0x62; 0x79; 0x31; 0x0c; 0x24];        (* vpmovzxbd xmm9,DWORD PTR [rsp] *)
  [0xc4; 0xe2; 0x7d; 0x31; 0x54; 0x24; 0x40];  (* vpmovzxbd ymm2,QWORD PTR [rsp+0x40] *)
  [0xc4; 0x62; 0x7d; 0x31; 0x7c; 0x24; 0x40];  (*  vpmovzxbd ymm15,QWORD PTR [rsp+0x40] *)

However I don't see this as a reason not to merge (since those did all work!) so I'll approve :-)

@jargh jargh merged commit 333cdfc into awslabs:main May 17, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants