Skip to content

fix(dronecan): correct DSDL bit-serialization — flight-critical conformance bug (pydronecan reference vectors)#232

Open
avrabe wants to merge 1 commit into
mainfrom
fix/dronecan-dsdl-conformance
Open

fix(dronecan): correct DSDL bit-serialization — flight-critical conformance bug (pydronecan reference vectors)#232
avrabe wants to merge 1 commit into
mainfrom
fix/dronecan-dsdl-conformance

Conversation

@avrabe

@avrabe avrabe commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

A flight-critical wire-conformance bug in shipped v1.92–94, found by external reference vectors

Validating the DroneCAN decoders against pydronecan (the reference impl PX4/ArduPilot's libcanard conforms to) revealed the hand-rolled bit-field decoders use the wrong DSDL bit-serialization — LSB-first instead of DroneCAN v0's be_from_le_bits (8-bit-chunk reversal).

Against the canonical reference:

Message Truth Old (buggy) decode
NodeStatus health/mode/sub_mode 2 / 3 / 1 1 / 6 / 4
esc.RawCommand (ESC throttles) [8191,0,-8192,4096] [-1241,899,0,16]
esc.Status rpm/power/index bit-packed wrong
MagneticFieldStrength field_ga from byte 0 phantom ahrs_id (that's Mag2/1003), field shifted

The RawCommand one is the dangerous one: falcon would have sent garbage throttles to a real ESC. Byte-aligned fields (float16/u32 at byte boundaries) were always correct — only sub-byte/bit-misaligned fields were wrong.

Why 40 tests + 9 Kani harnesses + clean-room 10/10 all passed anyway

They verified the wrong axis. Kani/proptest/round-trips prove robustness + self-consistency (no panic, bounded, encode∘decode = identity) — all true, all blind to a spec mismatch, because every test used the same wrong convention for encode and decode. This is the textbook limit of self-referential testing; only an external oracle could catch it. (Captured as a durable lesson.)

The fix

  • crate::dsdl — a faithful port of be_from_le_bits/le_from_be_bits, bounds-guarded + total. Kani DC-K08 verify_dsdl_read_bounded (the codec is total + result < 2^w).
  • decode_node_status, encode+decode_raw_command, decode_esc_status, decode_mag rewritten to use it; the byte-aligned paths unchanged; the buggy LSB-first read_bits removed.
  • CONFORMANCE tests: canonical pydronecan payloads (ff7c0000080010 RawCommand, etc.) decoded + asserted to reference field values; encoders asserted to reproduce the bytes. scripts/gen-dronecan-vectors.py regenerates them.

42 tests (incl. the conformance vectors), 9/9 Kani SUCCESSFUL, clippy clean, no external breakage. rivet FV-FALCON-DC-CONFORMANCE verifies DC-P02/P03 on the conformance axis.

NOTE: encode_node_status (in the un-merged v1.95) needs the same dsdl::write_uint treatment when v1.95 lands — tracked in the commit.

Falsification

Wrong if a relay-dronecan decoder disagrees with a pydronecan-encoded canonical frame for any message in the conformance test set.

🤖 Generated with Claude Code

…t by pydronecan reference vectors

A flight-critical wire-conformance bug in the shipped v1.92-94 decoders, found by
validating against pydronecan (the reference impl PX4/ArduPilot's libcanard
conforms to) — exactly the external oracle a self-constructed round-trip test
cannot be.

THE BUG: the hand-rolled bit-field decoders used LSB-first packing, but DroneCAN
v0 DSDL uses be_from_le_bits (8-bit-chunk reversal: byte-aligned fields are
little-endian — which is why the float16/u32 decodes were already correct — but
sub-byte/multi-byte bit fields are MSB-first-per-stream-byte). Against the
canonical reference:
- NodeStatus health/mode/sub_mode: read 1/6/4 instead of 2/3/1.
- esc.RawCommand int14 (FLIGHT-CRITICAL): [8191,0,-8192,4096] read as
  [-1241,899,0,16] — falcon would have sent garbage throttles to a real ESC.
- esc.Status rpm/power_rating/esc_index: wrong bit fields.
- MagneticFieldStrength: a phantom ahrs_id (that field is in
  MagneticFieldStrength2/1003), shifting the field by a byte.

WHY EVERYTHING MISSED IT: 40 tests, 9 Kani harnesses, clean-room 10/10 all passed
— they proved robustness + self-consistency (no panic, bounded, encode∘decode =
identity), all true and all blind to a spec mismatch, because every test used the
same wrong convention for both encode and decode. The textbook limit of
self-referential testing.

THE FIX:
- crate::dsdl — a faithful port of be_from_le_bits / le_from_be_bits (the DSDL
  bit codec), bounds-guarded + total. Kani DC-K08 verify_dsdl_read_bounded.
- decode_node_status / encode+decode_raw_command (msg.rs), decode_esc_status +
  decode_mag (sensors.rs) rewritten to use it. Byte-aligned float16/u32 paths
  unchanged (already correct). The buggy LSB-first read_bits removed.
- CONFORMANCE tests: canonical pydronecan payloads (0403020199efbe NodeStatus,
  ff7c0000080010 RawCommand, etc.) decoded + asserted to their reference field
  values; encoders asserted to reproduce the canonical bytes. scripts/gen-
  dronecan-vectors.py regenerates them.

42 tests (was 36 + the conformance vectors), 9/9 Kani SUCCESSFUL, clippy clean,
no external breakage (relay-dronecan is a leaf). rivet FV-FALCON-DC-CONFORMANCE
verifies DC-P02/P03 on the conformance axis; validate PASS, 0 gaps.

NOTE: encode_node_status (in the un-merged v1.95) needs the same dsdl::write_uint
treatment when v1.95 lands — tracked.

Falsification: this fix is wrong if a relay-dronecan decoder disagrees with a
pydronecan-encoded canonical frame for any message in the conformance test set.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant