htlcswitch: add FSM fuzz harness for channelLink commit protocol#1
htlcswitch: add FSM fuzz harness for channelLink commit protocol#1
Conversation
Coverage Report for CI Build 24286915830Coverage decreased (-10.1%) to 52.197%Details
Uncovered Changes
Coverage Regressions30606 previously-covered lines in 468 files lost coverage.
Coverage Stats
💛 - Coveralls |
001781c to
1400d9a
Compare
1b219d1 to
ca4be0a
Compare
fcb4bb0 to
6e4610c
Compare
Expose the `invoiceRegistry` field in `singleLinkTestHarness` so tests can register and look up invoices directly. Add `generateSingleHopHtlc`, a test helper that builds a single-hop `UpdateAddHTLC` with a random preimage, intended for use in unit and fuzz tests.
Add a no-op MailBox implementation and a no-op ticker for use in the channelLink FSM fuzz harness.
Replace createChannelLinkWithPeer (which required a Switch and spawned the htlcManager goroutine) with newFuzzLink, a minimal link factory that: - accepts dependencies directly (registry, preimage cache, circuit map, bestHeight) instead of a mockServer, so no Switch or background goroutines are created at all - sets link.upstream directly to a buffered channel controlled by the caller, bypassing the mailbox entirely - attaches a mockMailBox so mailBox.ResetPackets() in resumeLink succeeds
|
@Crypt-iQ when you have time, could you take a look? |
Sure I will take a look |
b28e3d8 to
5f570d5
Compare
Introduce `fuzz_link_test.go` with a model-based fuzzer that drives the Alice-Bob channel link through arbitrary sequences of protocol events and checks key invariants after each step. fuzz_link_test
Introduce fuzzSigner and fuzzSigVerifier in the fuzz harness, along with the SigVerifier hook in LightningChannel (WithSigVerifier, verifySig) and a matching SigPool extension (VerifyFunc field) so the harness can bypass secp256k1 verification end-to-end. Also refactors createTestChannel to accept functional options (testChannelOpt) so the signer and channel options can be injected from tests.
Introduce CommitKeyDeriverFunc and WithCommitKeyDeriver to allow LightningChannel to bypass the secp256k1-based DeriveCommitmentKeys on every commit round. All internal call sites are migrated to lc.deriveCommitmentKeys. The fuzz harness injects fuzzCommitKeyDeriver, a trivial identity deriver that avoids scalar-multiplication overhead.
createTestChannel started alicePool and bobPool but never stopped them. During fuzzing this caused goroutines to leak per. Register t.Cleanup handlers to call Stop() on both pools so all workers are torn down when the test ends.
newMockRegistry started an InvoiceRegistry but never stopped it. InvoiceRegistry internally starts two background goroutines — invoiceEventLoop and the InvoiceExpiryWatcher mainLoop — that run for the lifetime of the registry. Without a matching Stop() call both goroutines leaked for every test that called newMockRegistry, accumulating thousands of goroutines during fuzzing. Register a t.Cleanup to call registry.Stop() so both loops are torn down when the test ends.
Crypt-iQ
left a comment
There was a problem hiding this comment.
Ok, I took a (quick) look at the fuzz test because I've lost context and am short on time so I can't give a detailed line-by-line code-level review that I'd like to give. I skipped over the commits that stub out the signing stuff, seems good because signing could be a bottleneck. Using a RAM disk is a good idea since you want to avoid disk i/o. My main comment is that the fuzz harness could use more of the fuzz input. For example, when a new fee is being sent, the fee is calculated by newFee := len(f.aliceLink.channel.ActiveHtlcs())*100 + 1000. I think most of the messages should instead be constructed from the fuzz input (besides things like signatures that would cause obvious invalidity but even those you could sometimes make invalid). So instead of the fuzz input being a sequence of events, it would be a sequence of events + some byte slice that can be parsed as message fields. You could also just accept one byte blob as you do here, but parse it into events + message fields. Also, if possible, I think it'd be good to see if the link can start up after being stopped. There have been several bugs over the years where the link can't start up properly due to reestablishment (and I think the other work-in-progress link harness found an issue just like this). Finally, it would be a good idea to measure the coverage and see if there are any obvious blind spots for this fuzzer and then improve on those by adding extra events.
|
|
||
| // Generate the ChannelReestablish messages that each side needs to | ||
| // receive in order to complete the sync handshake. | ||
| aliceSyncMsg, err := alice.channel.State().ChanSyncMsg() |
| } | ||
|
|
||
| // Check total balances. | ||
| var aliceHtlcAmt, bobHtlcAmt lnwire.MilliSatoshi |
There was a problem hiding this comment.
is it possible to assert that alice or bob's balance is a certain expected value? this works, but I'm wondering if there's any way to detect funds loss
| } | ||
|
|
||
| var preimage lntypes.Preimage | ||
| r, err := generateRandomBytes(sha256.Size) |
There was a problem hiding this comment.
It might be good to use a corpus input with HTLC ID and sender ID to deterministically generate the preimage.
| } | ||
|
|
||
| // Pick the oldest preimage Alice tracks and settle it on her | ||
| // link. |
There was a problem hiding this comment.
It might be even better to use own input to decide which HTLC to settle.
|
|
||
| // Guard against excessively long inputs that would make the | ||
| // test run too long. | ||
| if len(data) > 250 { |
There was a problem hiding this comment.
Bump up? Sometimes very large inputs are interesting
| // applyEvent dispatches a single fuzz-generated event to the FSM for either | ||
| // Alice or Bob. Events that cannot be applied in the current state are silently | ||
| // skipped so the fuzzer can keep making progress without failing the test. | ||
| func (f *fuzzFSM) applyEvent(e Event) { |
There was a problem hiding this comment.
Sometimes, it may be worth it to send messages where no pre-checks are done. So send an HTLC where Bob hasn't created the Hold invoice, sending a settle for an HTLC that doesn't exist, etc.
There was a problem hiding this comment.
Yes, I’ll look into more of those unexpected events.
|
|
||
| type Event uint8 | ||
|
|
||
| const ( |
There was a problem hiding this comment.
If possible, would be good to have an event where the links restart and assert that they can still sync?
There was a problem hiding this comment.
Yes, it’s part of the follow-up work.
|
Thank you @Crypt-iQ for your time. I’ll be addressing the comments above. |
The channelLink commit protocol — the sequence of CommitSig / RevokeAndAck exchanges that advance commitment heights on both sides of a channel — is one of the most critical and subtle state machines in lnd. Despite extensive unit tests, the ordering of these messages is highly concurrent and easy to get wrong. A single missed revocation or out-of-order commit can corrupt channel state irreparably.
This PR adds a coverage-guided fuzz harness that exercises the full commit protocol FSM by randomly interleaving HTLC additions, commits, revocations, settlements, and failures from both Alice and Bob. The fuzzer checks structural invariants (monotonic commit heights, mirror symmetry between peers) after every event, catching protocol violations that deterministic tests cannot anticipate.
Testing
go test ./htlcswitch/ -run TestChannelLinkFSMScenarios -v
go test ./htlcswitch -run=^$ -fuzz=FuzzChannelLinkFSM -fuzztime=1m