Summary
On a host running Coder workspaces as unprivileged Docker-in-Docker via Sysbox, sysbox-fs got stuck in fuse_flush (kernel hung_task), wedging the whole host: load average ~50 while the CPU was ~100% idle and RAM was free. About 50 processes were stuck in uninterruptible sleep (D state) and could not be killed (not even kill -9). systemctl restart docker also hung in D. Only a full host reboot recovered it — and the ordered shutdown itself took ~6 minutes fighting the D-state tasks before forcing.
Environment
- sysbox-ce / sysbox-runc 0.7.0 (commit
a4dd414f7b9b7455c0fbf0d5e5db7bcfe30645bc, built 2026-03-03)
- OS: Debian GNU/Linux 13 (trixie)
- Kernel: 6.12.90+deb13.1-amd64 (idmapped mounts, no shiftfs)
- Docker 29.4.2, runc 1.3.5;
sysbox-runc registered as a runtime in daemon.json
- VM, 8 vCPU / ~23 GiB RAM, dedicated to Coder workspaces (DinD, unprivileged)
I'm aware Debian 13 + kernel 6.12 is outside the officially supported matrix — flagging in case it's relevant to the FUSE path.
Symptom — kernel hung_task in fuse_flush
sysbox-fs (tgid 886, the daemon) and then every runc:[INIT|PARENT|CHILD] trying to start/operate workspaces blocked for >120s, all parked on fuse_flush:
INFO: task sysbox-fs:<pid> blocked for more than 120 seconds.
task:sysbox-fs state:D stack:0 pid:<pid> tgid:886 ppid:1 flags:0x00000002
Call Trace:
__schedule+0x505/0xc00
schedule+0x27/0xf0
fuse_flush+0xe8/0x1e0
? __lruvec_stat_mod_folio+0x83/0xd0
? __folio_mod_stat+0x26/0x80
INFO: task runc:[0:PARENT]:<pid> blocked for more than 120 seconds.
Call Trace:
__schedule+0x505/0xc00
schedule+0x27/0xf0
fuse_flush+0xe8/0x1e0
...
The same set was re-reported at 120 / 241 / 362s until Future hung task reports are suppressed. Note the fuse_flush frame sits on top of folio memory-accounting (__lruvec_stat_mod_folio / __folio_mod_stat).
Impact
- Host effectively unusable: ~50 tasks in
D, mostly runc:[0:PARENT] / runc:[1:CHILD] (spawned by containerd to start/exec workspaces) plus coder stat disk.
- CPU ~100% idle, RAM free, iowait 0 — the load (~50) was entirely D-state tasks, so it was invisible to CPU/memory monitoring.
- systemd restarted the
sysbox-fs daemon, but that did not free the already-stuck tasks (they stayed attached to the wedged FUSE mount).
systemctl restart docker / service docker restart also hung in D.
Recovery
Only a full host reboot cleared the D-state processes.
Reproducibility
No deterministic repro — it happened under normal workspace usage (starting/operating a DinD workspace) after ~14 days of uptime. It has not recurred since the reboot.
Questions
- Is this a known
sysbox-fs deadlock in fuse_flush on kernel 6.12 / Debian 13 (or generally outside the supported matrix)?
- Any recommended mitigation (specific kernel version, sysbox-fs mount/config option) short of changing the host distro/kernel?
- Anything specific worth capturing if it recurs (it's intermittent)?
Happy to provide more detail — full dmesg, ps -eo pid,stat,wchan,args of the stuck tasks, docker info, etc.
Summary
On a host running Coder workspaces as unprivileged Docker-in-Docker via Sysbox,
sysbox-fsgot stuck infuse_flush(kernelhung_task), wedging the whole host: load average ~50 while the CPU was ~100% idle and RAM was free. About 50 processes were stuck in uninterruptible sleep (Dstate) and could not be killed (not evenkill -9).systemctl restart dockeralso hung inD. Only a full host reboot recovered it — and the ordered shutdown itself took ~6 minutes fighting the D-state tasks before forcing.Environment
a4dd414f7b9b7455c0fbf0d5e5db7bcfe30645bc, built 2026-03-03)sysbox-runcregistered as a runtime indaemon.jsonI'm aware Debian 13 + kernel 6.12 is outside the officially supported matrix — flagging in case it's relevant to the FUSE path.
Symptom — kernel hung_task in
fuse_flushsysbox-fs(tgid 886, the daemon) and then everyrunc:[INIT|PARENT|CHILD]trying to start/operate workspaces blocked for >120s, all parked onfuse_flush:The same set was re-reported at 120 / 241 / 362s until
Future hung task reports are suppressed. Note thefuse_flushframe sits on top of folio memory-accounting (__lruvec_stat_mod_folio/__folio_mod_stat).Impact
D, mostlyrunc:[0:PARENT]/runc:[1:CHILD](spawned by containerd to start/exec workspaces) pluscoder stat disk.sysbox-fsdaemon, but that did not free the already-stuck tasks (they stayed attached to the wedged FUSE mount).systemctl restart docker/service docker restartalso hung inD.Recovery
Only a full host reboot cleared the D-state processes.
Reproducibility
No deterministic repro — it happened under normal workspace usage (starting/operating a DinD workspace) after ~14 days of uptime. It has not recurred since the reboot.
Questions
sysbox-fsdeadlock infuse_flushon kernel 6.12 / Debian 13 (or generally outside the supported matrix)?Happy to provide more detail — full
dmesg,ps -eo pid,stat,wchan,argsof the stuck tasks,docker info, etc.