fix(dronecan): correct DSDL bit-serialization — flight-critical conformance bug (pydronecan reference vectors)#232
Open
avrabe wants to merge 1 commit into
Open
fix(dronecan): correct DSDL bit-serialization — flight-critical conformance bug (pydronecan reference vectors)#232avrabe wants to merge 1 commit into
avrabe wants to merge 1 commit into
Conversation
…t by pydronecan reference vectors A flight-critical wire-conformance bug in the shipped v1.92-94 decoders, found by validating against pydronecan (the reference impl PX4/ArduPilot's libcanard conforms to) — exactly the external oracle a self-constructed round-trip test cannot be. THE BUG: the hand-rolled bit-field decoders used LSB-first packing, but DroneCAN v0 DSDL uses be_from_le_bits (8-bit-chunk reversal: byte-aligned fields are little-endian — which is why the float16/u32 decodes were already correct — but sub-byte/multi-byte bit fields are MSB-first-per-stream-byte). Against the canonical reference: - NodeStatus health/mode/sub_mode: read 1/6/4 instead of 2/3/1. - esc.RawCommand int14 (FLIGHT-CRITICAL): [8191,0,-8192,4096] read as [-1241,899,0,16] — falcon would have sent garbage throttles to a real ESC. - esc.Status rpm/power_rating/esc_index: wrong bit fields. - MagneticFieldStrength: a phantom ahrs_id (that field is in MagneticFieldStrength2/1003), shifting the field by a byte. WHY EVERYTHING MISSED IT: 40 tests, 9 Kani harnesses, clean-room 10/10 all passed — they proved robustness + self-consistency (no panic, bounded, encode∘decode = identity), all true and all blind to a spec mismatch, because every test used the same wrong convention for both encode and decode. The textbook limit of self-referential testing. THE FIX: - crate::dsdl — a faithful port of be_from_le_bits / le_from_be_bits (the DSDL bit codec), bounds-guarded + total. Kani DC-K08 verify_dsdl_read_bounded. - decode_node_status / encode+decode_raw_command (msg.rs), decode_esc_status + decode_mag (sensors.rs) rewritten to use it. Byte-aligned float16/u32 paths unchanged (already correct). The buggy LSB-first read_bits removed. - CONFORMANCE tests: canonical pydronecan payloads (0403020199efbe NodeStatus, ff7c0000080010 RawCommand, etc.) decoded + asserted to their reference field values; encoders asserted to reproduce the canonical bytes. scripts/gen- dronecan-vectors.py regenerates them. 42 tests (was 36 + the conformance vectors), 9/9 Kani SUCCESSFUL, clippy clean, no external breakage (relay-dronecan is a leaf). rivet FV-FALCON-DC-CONFORMANCE verifies DC-P02/P03 on the conformance axis; validate PASS, 0 gaps. NOTE: encode_node_status (in the un-merged v1.95) needs the same dsdl::write_uint treatment when v1.95 lands — tracked. Falsification: this fix is wrong if a relay-dronecan decoder disagrees with a pydronecan-encoded canonical frame for any message in the conformance test set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A flight-critical wire-conformance bug in shipped v1.92–94, found by external reference vectors
Validating the DroneCAN decoders against pydronecan (the reference impl PX4/ArduPilot's libcanard conforms to) revealed the hand-rolled bit-field decoders use the wrong DSDL bit-serialization — LSB-first instead of DroneCAN v0's
be_from_le_bits(8-bit-chunk reversal).Against the canonical reference:
NodeStatushealth/mode/sub_modeesc.RawCommand(ESC throttles)[8191,0,-8192,4096][-1241,899,0,16]esc.Statusrpm/power/indexMagneticFieldStrengthfield_gafrom byte 0ahrs_id(that's Mag2/1003), field shiftedThe RawCommand one is the dangerous one: falcon would have sent garbage throttles to a real ESC. Byte-aligned fields (float16/u32 at byte boundaries) were always correct — only sub-byte/bit-misaligned fields were wrong.
Why 40 tests + 9 Kani harnesses + clean-room 10/10 all passed anyway
They verified the wrong axis. Kani/proptest/round-trips prove robustness + self-consistency (no panic, bounded,
encode∘decode = identity) — all true, all blind to a spec mismatch, because every test used the same wrong convention for encode and decode. This is the textbook limit of self-referential testing; only an external oracle could catch it. (Captured as a durable lesson.)The fix
crate::dsdl— a faithful port ofbe_from_le_bits/le_from_be_bits, bounds-guarded + total. Kani DC-K08verify_dsdl_read_bounded(the codec is total + result< 2^w).decode_node_status,encode+decode_raw_command,decode_esc_status,decode_magrewritten to use it; the byte-aligned paths unchanged; the buggy LSB-firstread_bitsremoved.ff7c0000080010RawCommand, etc.) decoded + asserted to reference field values; encoders asserted to reproduce the bytes.scripts/gen-dronecan-vectors.pyregenerates them.42 tests (incl. the conformance vectors), 9/9 Kani SUCCESSFUL, clippy clean, no external breakage. rivet
FV-FALCON-DC-CONFORMANCEverifies DC-P02/P03 on the conformance axis.Falsification
Wrong if a relay-dronecan decoder disagrees with a pydronecan-encoded canonical frame for any message in the conformance test set.
🤖 Generated with Claude Code