Phase 3 Plan D: bufcache + block driver + FS read path#16
Merged
Conversation
Implementation plan for Phase 3.D landing the entire FS read path on top of Plan 3.C's process lifecycle: kernel-side PLIC + block drivers, the buffer cache, the FS layer (layout/balloc/inode/dir/path), file table, per-process cwd + ofile, FS-aware exec, 7 syscalls (openat/close/read/ lseek/fstat/chdir/getcwd), mkfs host tool, FS-mode kmain that loads init from disk, and the e2e-fs acceptance gate. Targets the post-restructure layout (src/kernel/, src/emulator/, programs/, tests/e2e/).
- Add std.debug.assert(b.refs > 0) at start of brelse to catch premature releases - Document why Pass 1 sleep needs no re-validation (refs bumped before sleep prevents eviction) - Document why wakeup-then-busy=false ordering is safe (scheduler defers woken process) These defensive improvements make subtle synchronization invariants more discoverable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six bugs discovered while smoke-testing kernel-fs.elf:
1. kmain (FS_DEMO): cpu.sched_context.{ra,sp} was set AFTER proc.exec,
so proc.sleep inside exec saw sched_context.ra==0 and returned
without yielding — the SIE=0 spin then deadlocked because the block
IRQ couldn't fire. Move the setup before exec.
2. kmain (FS_DEMO): after exec returns, init_p.context still pointed
inside proc.sleep with a stale stack. Re-arm to forkret + a fresh
kstack frame so the next scheduler swtch enters via the trap-frame
sret path, not by re-running sleep's epilogue. forkret promoted to
pub export.
3. sched.scheduler: replaced the busy-spin (when nothing is Runnable
but something is alive) with a one-instruction SIE window so a
pending PLIC IRQ or timer SSI can actually be delivered.
4. trap.s_trap_dispatch (S-from-S): clear sstatus.SPIE so sret restores
SIE=0 — otherwise the csrc that closes the SIE window never executes
(a perpetual timer SSI re-traps before each fetch).
5. trap.s_trap_dispatch (timer SSI in scheduler): when cpu.cur is null
the trap fired inside the SIE window, not inside a process. Skip
the yield, which would otherwise swtch into sched_context and re-
enter scheduler() from its top, wiping its loop state.
6. vm.mapKernelAndMmio: add the block device (0x10001000) and PLIC
(0x0C000000, 0x0C002000, 0x0C201000) MMIO pages — without these,
the kernel page-faults on its own driver registers as soon as
paging is on.
7. mkfs: inode array was indexed by `inum - 1`, putting root at slot 0
in the on-disk image while the kernel reads inum 1 from byte offset
64 (= slot 1). Switched to `inodes[inum]` with slot 0 reserved as
Free, and pre-fill root inline (createDir would seed `..` with the
bogus parent_inum=0 placeholder).
Smoke test now reaches the syscall-return path inside fs_init's first
open("/etc/motd"); a remaining StorePageFault on the syscall return
write to tf->a0 is still under investigation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndow The user vector (s_trap_entry) assumes the trap came from U-mode: it csrrw-swaps sp with sscratch (which holds the current proc), saves all GPRs into the proc's tf at offset 0, and jumps to s_trap_dispatch on the proc's kstack. That sequence is hostile in two ways when a trap fires from S-mode inside the scheduler's SIE window: 1. The proc whose &p sits in sscratch is *sleeping* with frames on its kstack. Switching sp to its kstack_top and writing handler frames from there clobbers the in-flight syscall stack — causing a wild pointer (0xaaaaaab6 etc.) on the next post-swtch load. 2. Every GPR (including ra/sp/s0..s11) gets stamped into init_p.tf, wiping the saved user state. The next sret-to-U then jumps off into garbage. Fix: install s_kernel_trap_entry as stvec only while the SIE window is open. The kernel vector saves caller-saved regs to the *current* (i.e. scheduler) stack, runs s_kernel_trap_dispatch, restores, srets back to the csrc that closes the window. s_kernel_trap_dispatch only handles the two interrupts the window can deliver — IRQ_BLOCK and timer SSI — and forces SPIE=0 so the post-sret csrc executes without re-trapping. Also panic on S-from-S in the user vector — that should never happen once the kernel vector is installed, and catching it surfaces future stvec bugs immediately rather than silently corrupting state. The pre-existing SSI-in-scheduler check (cpu.cur==null skip yield) and the user-vector SPIE clear remain as belt-and-suspenders for the kernel vector path's narrow scope. Smoke test of kernel-fs.elf now reaches the second user syscall (read after openat) before hitting an unrelated stack-corruption symptom on the post-swtch path; that's the next debug target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
file.read declared `var kbuf: [READ_CHUNK=4096]u8 = undefined` on its stack, ballooning its frame to ~4.2 KB. Combined with the upstream trap-handler + syscall-dispatch + sysRead frames and the downstream inode.readi + bread + bget + sleep frames, the syscall reached almost 6 KB of stack — well beyond the 4 KB per-process kernel stack. Pages allocated by page_alloc are consecutive: cpu.sched_stack_top is adjacent (in PA) to init_p.kstack. So when init_p's read syscall overflowed its kstack, it wrote into the next page — the scheduler stack — and the array's `undefined` initializer (Zig's debug 0xaa fill) clobbered the scheduler's locals. The next post-swtch lw read back 0xaaaaaaaa as &cpu, faulting on the cpu.cur=null write. Fix: hoist kbuf to a static module var. Single-threaded kernel ⇒ safe. Smoke test of kernel-fs.elf now passes end-to-end: hello from phase 3 ticks observed: 4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawns ccc --disk fs.img kernel-fs.elf, asserts: - exit 0 - stdout contains "hello from phase 3\n" - stdout has the canonical "ticks observed: N\n" PID 1 trailer Mirrors tests/e2e/fork.zig structure. All prior e2e suites still pass (test, e2e, e2e-mul, e2e-trap, e2e-hello-elf, e2e-kernel, e2e-multiproc-stub, e2e-fork, e2e-plic-block, e2e-snake). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final task in Plan 3.D. Headline status flips from "Plan C done" to
"Plan D done". Layout grows new entries for plic.zig, block.zig,
file.zig, fs/{layout,bufcache,balloc,inode,dir,path}.zig, mkfs.zig,
user/fs_init.zig, userland/fs/etc/motd, and tests/e2e/fs.zig. Building
table grows kernel-fs / kernel-fs-init / mkfs / fs-img / e2e-fs.
Final regression sweep (clean): test, e2e, e2e-mul, e2e-trap,
e2e-hello-elf, e2e-kernel, e2e-multiproc-stub, e2e-fork, e2e-plic-block,
e2e-snake, e2e-fs, riscv-tests, wasm — all PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…om disk)
Three new chapters slot between the 3.C lifecycle slide and the
epilogue:
32. block + bufcache — first IRQ-driven kernel sleep, single-
outstanding submit, NBUF=16 LRU buffer cache, the new
s_kernel_trap_entry vector.
33. FS read path — on-disk layout, the read stack
(read → file.read → readi → bmap+bread → block.read), nameix.
34. init from disk — exec via namei+readi, mkfs build pipeline,
fs_init.zig, e2e-fs milestone, the kbuf-overflow war story.
TOC, intro caption, epilogue prose, and check-list grow a 3.D row;
"Next" panel flips from 3.D (filesystem + shell) to 3.E (write side
+ console + shell).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fs/{layout,bufcache,balloc,inode,dir,path}.zig), file table (NFILE=64) with per-processofile[16]+cwd, 7 new syscalls (getcwd,chdir,openat,close,lseek,read,fstat), and a host-sidemkfstool that builds a 4 MBfs.img.proc.execnow resolves the path vianamei + readiinto a 64 KB kernel scratch buffer (FS-mode) instead of looking up an embedded blob —/bin/initis loaded from disk for the first time.s_kernel_trap_entryvector so the scheduler's SIE-window IRQs (the newblock.zigwaiter wakes through this path) don't clobber the sleeping proc's tf/kstack.Milestone:
e2e-fsrunskernel-fs.elfagainstfs.img; the on-disk/bin/initopens/etc/motd, reads it, writes the contents to fd 1, exits 0:34 commits: 1 plan + 25 task implementations + 4 debug fixes (sched_context order, kernel-vec separation, kbuf static, e2e harness) + 2 docs (README + deck).
Test plan
zig build test— all unit tests passzig build e2e— RV32I hello worldzig build e2e-mul— RV32IMA demozig build e2e-trap— privilege/trap demozig build e2e-hello-elf— Phase 1 §Definition of donezig build e2e-kernel— Phase 2 §Definition of donezig build e2e-multiproc-stub— Plan 3.B PID 1 + PID 2zig build e2e-fork— Plan 3.C fork/exec/wait/exitzig build e2e-plic-block— Plan 3.A IRQ + block round-tripzig build e2e-snake— snake demo deterministic inputzig build e2e-fs— Plan 3.D milestone (this PR's headline gate)zig build riscv-tests— rv32ui/um/ua/mi/si conformancezig build wasm— wasm cross-build (deck demo)🤖 Generated with Claude Code