x86: Add VMOVMSKPS, VPMOVZXBD, VZEROUPPER instruction models#387
Conversation
6e7ea14 to
eadf4ac
Compare
Add semantic models for three x86 AVX instructions needed by mldsa-native's rej_uniform formal verification: - VMOVMSKPS: Extract sign bits (bit 31) from each 32-bit lane of a YMM/XMM register into a GPR. Used to build a comparison mask after VPSUBD rejection testing. - VPMOVZXBD: Zero-extend bytes to dwords (8->32 bit). Used to expand table lookup indices from the VPERMD compaction table. - VZEROUPPER: Clear upper 128 bits of all YMM registers. Modeled as a no-op (like ENDBR64) since the proof framework tracks YMM registers as full 256-bit values and the instruction only affects performance, not correctness. Includes decoder entries, instruction type constructors, semantic definitions, simulator test cases, and execution dispatch. Signed-off-by: jakemas <jakemas@amazon.com>
eadf4ac to
2b23121
Compare
VZEROUPPER zeros bits 128-511 of ZMM0-ZMM15 while preserving the lower 128 bits (XMM values). Model this by writing each XMM register's current value back through the zero-extending XMM component path, which automatically zeros the upper bits of the containing ZMM register.
The XMM-based model (XMM := read XMM s) chains through two zerotop wrappers (zerotop_128 then zerotop_256), creating deeply nested word_zx terms that the sematest cosimulation tactics cannot simplify. Write to YMM directly with explicit word_subword/word_zx instead, which only goes through one zerotop layer (zerotop_256 to ZMM). Semantically identical: preserves lower 128 bits, zeros upper bits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- nix/s2n_bignum/default.nix: bump pin to awslabs/s2n-bignum#387 head (adds VMOVMSKPS, VPMOVZXBD, and VZEROUPPER instruction models required by rej_uniform). - proofs/hol_light/x86_64/proofs/mldsa_rej_uniform.ml: - Fix namespaced imports ('s2n_bignum/', 'mldsa_native/') to pass scripts/check-hol-light-imports lint. - Replace `(*)` multiplication-operator term with `( * )` in vpermd_factor_for so the OCaml lexer doesn't read `(*` as a comment-open when the file is processed through inline_load. - mldsa/src/native/x86_64/src/rej_uniform_avx2.S: autogen-applied assembly normalization (tabs -> spaces, ELF note footer). With these fixes the x86_64 HOL-Light bytecode dump, lint, and autogen dry-run all succeed under `nix develop .#hol_light`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Jake Massimo <jakemas@amazon.com>
|
Note: It looks like we're pulling |
|
Looks great to me except for one thing which it might be nice to make cleaner. The decoding of |
The instruction semantics only handles the 256-bit (VEX.L=1) case. Reject the 128-bit form in the decoder rather than silently producing an incorrect operand size from a 128-bit encoding. Signed-off-by: Jake Massimo <jakemas@amazon.com> Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-118.us-west-2.compute.internal>
f2dc13d to
3f84379
Compare
jargh
left a comment
There was a problem hiding this comment.
Thanks, this all looks great and all tests are successful. One minor thing I'd be inclined to add is some simple memory tests in the simulator's simple_memory_iclasses list for the vpmovzxbd instruction. I added the following and they all worked fine with a further make sematest run, which was a nice additional sanity check:
[0xc4; 0xe2; 0x79; 0x31; 0x0c; 0x24]; (* vpmovzxbd xmm1,DWORD PTR [rsp] *)
[0xc4; 0x62; 0x79; 0x31; 0x0c; 0x24]; (* vpmovzxbd xmm9,DWORD PTR [rsp] *)
[0xc4; 0xe2; 0x7d; 0x31; 0x54; 0x24; 0x40]; (* vpmovzxbd ymm2,QWORD PTR [rsp+0x40] *)
[0xc4; 0x62; 0x7d; 0x31; 0x7c; 0x24; 0x40]; (* vpmovzxbd ymm15,QWORD PTR [rsp+0x40] *)
However I don't see this as a reason not to merge (since those did all work!) so I'll approve :-)
Summary
Add semantic models for three x86 AVX instructions needed by mldsa-native's
rej_uniformformal verification (pq-code-package/mldsa-native#1014):Includes decoder entries, instruction type constructors, semantic definitions, simulator test cases, and execution dispatch.
Having some issues with the Sematest CI: