htlcswitch: add FSM fuzz harness for channelLink commit protocol by MPins · Pull Request #1 · MPins/lnd

MPins · 2026-03-12T12:33:39Z

The channelLink commit protocol — the sequence of CommitSig / RevokeAndAck exchanges that advance commitment heights on both sides of a channel — is one of the most critical and subtle state machines in lnd. Despite extensive unit tests, the ordering of these messages is highly concurrent and easy to get wrong. A single missed revocation or out-of-order commit can corrupt channel state irreparably.

This PR adds a coverage-guided fuzz harness that exercises the full commit protocol FSM by randomly interleaving HTLC additions, commits, revocations, settlements, and failures from both Alice and Bob. The fuzzer checks structural invariants (monotonic commit heights, mirror symmetry between peers) after every event, catching protocol violations that deterministic tests cannot anticipate.

Testing

go test ./htlcswitch/ -run TestChannelLinkFSMScenarios -v
go test ./htlcswitch -run=^$ -fuzz=FuzzChannelLinkFSM -fuzztime=1m

coveralls · 2026-03-12T13:50:29Z

Coverage Report for CI Build 24286915830

Coverage decreased (-10.1%) to 52.197%

Details

Coverage decreased (-10.1%) from the base build.
Patch coverage: 175 uncovered changes across 5 files (22 of 197 lines covered, 11.17%).
30606 coverage regressions across 468 files.

Uncovered Changes

File	Changed	Covered	%
htlcswitch/test_utils.go	96	0	0.0%
htlcswitch/mock.go	55	0	0.0%
lnwallet/channel.go	31	19	61.29%
lnwallet/mock.go	8	0	0.0%
lnwallet/sigpool.go	7	3	42.86%

Coverage Regressions

30606 previously-covered lines in 468 files lost coverage.

Top 10 Files by Coverage Loss	Lines Losing Coverage	Coverage
lnwire/test_message.go	1469	0.0%
lnwallet/channel.go	1088	68.48%
invoices/sql_store.go	1066	0.0%
htlcswitch/test_utils.go	814	0.0%
htlcswitch/mock.go	531	0.0%
peer/test_utils.go	529	0.0%
autopilot/agent.go	395	0.0%
channeldb/migration30/migration.go	367	0.0%
lnwallet/test_utils.go	364	0.0%
contractcourt/utxonursery.go	342	38.01%

Coverage Stats


Relevant Lines:	193501
Covered Lines:	101002
Line Coverage:	52.2%
Coverage Strength:	1.62 hits per line

💛 - Coveralls

Expose the `invoiceRegistry` field in `singleLinkTestHarness` so tests can register and look up invoices directly. Add `generateSingleHopHtlc`, a test helper that builds a single-hop `UpdateAddHTLC` with a random preimage, intended for use in unit and fuzz tests.

Add a no-op MailBox implementation and a no-op ticker for use in the channelLink FSM fuzz harness.

Replace createChannelLinkWithPeer (which required a Switch and spawned the htlcManager goroutine) with newFuzzLink, a minimal link factory that: - accepts dependencies directly (registry, preimage cache, circuit map, bestHeight) instead of a mockServer, so no Switch or background goroutines are created at all - sets link.upstream directly to a buffered channel controlled by the caller, bypassing the mailbox entirely - attaches a mockMailBox so mailBox.ResetPackets() in resumeLink succeeds

MPins · 2026-04-09T00:25:01Z

@Crypt-iQ when you have time, could you take a look?

Crypt-iQ · 2026-04-09T13:30:00Z

@Crypt-iQ when you have time, could you take a look?

Sure I will take a look

Introduce `fuzz_link_test.go` with a model-based fuzzer that drives the Alice-Bob channel link through arbitrary sequences of protocol events and checks key invariants after each step. fuzz_link_test

Introduce fuzzSigner and fuzzSigVerifier in the fuzz harness, along with the SigVerifier hook in LightningChannel (WithSigVerifier, verifySig) and a matching SigPool extension (VerifyFunc field) so the harness can bypass secp256k1 verification end-to-end. Also refactors createTestChannel to accept functional options (testChannelOpt) so the signer and channel options can be injected from tests.

Introduce CommitKeyDeriverFunc and WithCommitKeyDeriver to allow LightningChannel to bypass the secp256k1-based DeriveCommitmentKeys on every commit round. All internal call sites are migrated to lc.deriveCommitmentKeys. The fuzz harness injects fuzzCommitKeyDeriver, a trivial identity deriver that avoids scalar-multiplication overhead.

createTestChannel started alicePool and bobPool but never stopped them. During fuzzing this caused goroutines to leak per. Register t.Cleanup handlers to call Stop() on both pools so all workers are torn down when the test ends.

newMockRegistry started an InvoiceRegistry but never stopped it. InvoiceRegistry internally starts two background goroutines — invoiceEventLoop and the InvoiceExpiryWatcher mainLoop — that run for the lifetime of the registry. Without a matching Stop() call both goroutines leaked for every test that called newMockRegistry, accumulating thousands of goroutines during fuzzing. Register a t.Cleanup to call registry.Stop() so both loops are torn down when the test ends.

Crypt-iQ

Ok, I took a (quick) look at the fuzz test because I've lost context and am short on time so I can't give a detailed line-by-line code-level review that I'd like to give. I skipped over the commits that stub out the signing stuff, seems good because signing could be a bottleneck. Using a RAM disk is a good idea since you want to avoid disk i/o. My main comment is that the fuzz harness could use more of the fuzz input. For example, when a new fee is being sent, the fee is calculated by newFee := len(f.aliceLink.channel.ActiveHtlcs())*100 + 1000. I think most of the messages should instead be constructed from the fuzz input (besides things like signatures that would cause obvious invalidity but even those you could sometimes make invalid). So instead of the fuzz input being a sequence of events, it would be a sequence of events + some byte slice that can be parsed as message fields. You could also just accept one byte blob as you do here, but parse it into events + message fields. Also, if possible, I think it'd be good to see if the link can start up after being stopped. There have been several bugs over the years where the link can't start up properly due to reestablishment (and I think the other work-in-progress link harness found an issue just like this). Finally, it would be a good idea to measure the coverage and see if there are any obvious blind spots for this fuzzer and then improve on those by adding extra events.

Crypt-iQ · 2026-04-13T14:25:07Z

+
+	// Generate the ChannelReestablish messages that each side needs to
+	// receive in order to complete the sync handshake.
+	aliceSyncMsg, err := alice.channel.State().ChanSyncMsg()


could fuzz these

Crypt-iQ · 2026-04-13T14:27:23Z

+	}
+
+	// Check total balances.
+	var aliceHtlcAmt, bobHtlcAmt lnwire.MilliSatoshi


is it possible to assert that alice or bob's balance is a certain expected value? this works, but I'm wondering if there's any way to detect funds loss

Crypt-iQ · 2026-04-13T14:28:42Z

+	}
+
+	var preimage lntypes.Preimage
+	r, err := generateRandomBytes(sha256.Size)


non-deterministic?

It might be good to use a corpus input with HTLC ID and sender ID to deterministically generate the preimage.

Crypt-iQ · 2026-04-13T14:32:02Z

+		}
+
+		// Pick the oldest preimage Alice tracks and settle it on her
+		// link.


Unnecessary - can choose randomly?

It might be even better to use own input to decide which HTLC to settle.

Crypt-iQ · 2026-04-13T14:34:14Z

+
+		// Guard against excessively long inputs that would make the
+		// test run too long.
+		if len(data) > 250 {


Bump up? Sometimes very large inputs are interesting

Crypt-iQ · 2026-04-13T14:35:50Z

+// applyEvent dispatches a single fuzz-generated event to the FSM for either
+// Alice or Bob. Events that cannot be applied in the current state are silently
+// skipped so the fuzzer can keep making progress without failing the test.
+func (f *fuzzFSM) applyEvent(e Event) {


Sometimes, it may be worth it to send messages where no pre-checks are done. So send an HTLC where Bob hasn't created the Hold invoice, sending a settle for an HTLC that doesn't exist, etc.

Yes, I’ll look into more of those unexpected events.

Crypt-iQ · 2026-04-13T14:36:56Z

+
+type Event uint8
+
+const (


If possible, would be good to have an event where the links restart and assert that they can still sync?

Yes, it’s part of the follow-up work.

MPins · 2026-04-13T16:06:02Z

Thank you @Crypt-iQ for your time. I’ll be addressing the comments above.

brunoerg reviewed Mar 12, 2026

View reviewed changes

Comment thread htlcswitch/fuzz_link_test.go Outdated

brunoerg reviewed Mar 12, 2026

View reviewed changes

Comment thread htlcswitch/link_isolated_test.go Outdated

MPins force-pushed the link_fsm_fuzz branch 3 times, most recently from 001781c to 1400d9a Compare March 13, 2026 21:50

MPins mentioned this pull request Mar 18, 2026

htlcswitch: Add HTLC state machine fuzz tests NishantBansal2003/lnd#7

Open

MPins force-pushed the link_fsm_fuzz branch 4 times, most recently from 1b219d1 to ca4be0a Compare March 28, 2026 00:16

doc: create a diagram representing the link machine state

a4a3279

MPins force-pushed the link_fsm_fuzz branch 10 times, most recently from fcb4bb0 to 6e4610c Compare April 1, 2026 12:59

MPins force-pushed the link_fsm_fuzz branch from 6e4610c to 78af141 Compare April 8, 2026 19:34

MPins added 3 commits April 8, 2026 16:40

htlcswitch: add mockMailBox and noopTicker test helpers

ee70f80

Add a no-op MailBox implementation and a no-op ticker for use in the channelLink FSM fuzz harness.

MPins force-pushed the link_fsm_fuzz branch from 78af141 to 7a5aa99 Compare April 8, 2026 19:40

MPins force-pushed the link_fsm_fuzz branch from 1b6213a to 14126bd Compare April 10, 2026 19:00

MPins force-pushed the link_fsm_fuzz branch 3 times, most recently from b28e3d8 to 5f570d5 Compare April 11, 2026 01:34

MPins added 5 commits April 11, 2026 13:36

htlcswitch: add FSM fuzz harness for channelLink commit protocol

6ddb676

Introduce `fuzz_link_test.go` with a model-based fuzzer that drives the Alice-Bob channel link through arbitrary sequences of protocol events and checks key invariants after each step. fuzz_link_test

htlcswitch: stop SigPools in createTestChannel cleanup

31fba27

createTestChannel started alicePool and bobPool but never stopped them. During fuzzing this caused goroutines to leak per. Register t.Cleanup handlers to call Stop() on both pools so all workers are torn down when the test ends.

MPins force-pushed the link_fsm_fuzz branch from 5f570d5 to bea62c8 Compare April 11, 2026 16:42

Crypt-iQ reviewed Apr 13, 2026

View reviewed changes


		type Event uint8

		const (

Conversation

MPins commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

coveralls commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 24286915830

Coverage decreased (-10.1%) to 52.197%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

MPins commented Apr 9, 2026

Uh oh!

Crypt-iQ commented Apr 9, 2026

Uh oh!

Crypt-iQ left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MPins Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MPins commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coveralls commented Mar 12, 2026 •

edited

Loading

MPins Apr 13, 2026 •

edited

Loading