Skip to content

Non-atomic fs::write in rgb_utils causes node crash and data corruption #120

@free-free-6

Description

@free-free-6

Summary

write_rgb_channel_info in rgb_utils/mod.rs uses fs::write() to update RGB channel info files. fs::write internally performs two separate syscalls:

  1. open(O_TRUNC) — truncates the file to 0 bytes
  2. write(data) — writes new content

This causes two problems:

Problem 1 — Runtime crash (race condition): If another thread reads the file between syscall 1 and 2, it reads an empty file, serde_json fails with EOF, and the thread panics. The panic poisons the shared Mutex, cascading to all other threads and crashing the node.

Problem 2 — Persistent data corruption: If the process is killed (SIGKILL / OOM / docker stop / power loss) between syscall 1 and 2, the file is permanently left as 0 bytes. On restart, any code path that reads this file panics, making the node unable to start.

Affected code

rust-lightning/lightning/src/rgb_utils/mod.rs:

  • write_rgb_channel_info
  • write_rgb_payment_info_file
  • fs::write calls in color_commitment

All write to files in .ldk/ directory without any synchronization or atomic write pattern.

How to reproduce

channel_info_file_race.txt
Attached test file: channel_info_file_race.txt (rename to .rs, place in src/test/)

The test opens an RGB channel, then fires rapid concurrent payments while 5 background tasks continuously call /listchannels. The payments trigger write_rgb_channel_info (via PaymentSent events), while /listchannels calls parse_rgb_channel_info on the same file. Within ~20-30 rounds the race condition is hit:

parse_rgb_channel_info thread_id: ThreadId(3)   ← reading
write_rgb_channel_info thread_id: ThreadId(5)   ← writing (487µs)

panicked at rgb_utils/mod.rs:586:
valid rgb info file: Error("EOF while parsing a value", line: 1, column: 0)

panicked at channelmanager.rs:13758: PoisonError { .. }
panicked at channelmanager.rs:4260:  PoisonError { .. }
...cascade...

Register the test in src/test/mod.rs:

mod channel_info_file_race;

Run:

cargo test channel_info_file_race -- --test-threads=1 --nocapture

Note: the test uses worker_threads = 4 internally to enable true thread concurrency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions