Fixed all BLOCKER, SERIOUS, and MINOR issues from code review v0.9.0#63
Closed
byteshiftlabs wants to merge 28 commits intomainfrom
Closed
Fixed all BLOCKER, SERIOUS, and MINOR issues from code review v0.9.0#63byteshiftlabs wants to merge 28 commits intomainfrom
byteshiftlabs wants to merge 28 commits intomainfrom
Conversation
… table error paths B1 (BLOCKER): free_page_table_recursive skipped all leaf PTEs, leaking every user-space physical page when a process exited. Restructured the function to free leaf pages marked with PTE_U via pmm_free_page while still leaving kernel identity-mapped pages (no PTE_U) untouched. B5 (SERIOUS): create_user_page_table error paths called kfree(user_pt) on a page obtained from pmm_alloc_page. kfree expects a kmalloc header and would corrupt the heap. Changed all three occurrences to free_page_table(user_pt) so intermediate page tables created by earlier map_page calls are also properly released. Updated free_page_table docstring to accurately describe the new behavior.
…signal B2 (BLOCKER): cond_wait restored interrupts BEFORE calling wait_queue_sleep, opening a window where a timer preemption could let another process signal the CV before the waiter was on the queue — losing the wakeup forever. Inlined the wait queue enqueue logic so that under a single interrupt-disable region we: (1) add the process to the CV wait queue, (2) unlock the mutex and wake mutex waiters, (3) mark SLEEPING and dequeue from scheduler. Only then are interrupts restored and schedule() called. This eliminates the race. Also fixed cond_signal to call wait_queue_wake_one instead of wait_queue_wake. Signal should wake exactly one waiter; broadcast (which already called wait_queue_wake) wakes all.
Added proper signal context preservation to prevent user register corruption when delivering signals to user-defined handlers: - Saved full trap frame in kernel memory before redirecting execution to signal handler (signal.c) - Written a sigreturn trampoline (li a7,SYS_SIGRETURN; ecall) on the user stack so handler 'ret' triggers context restoration - Added SYS_SIGRETURN handling in syscall_handler_with_frame that restores the saved pre-signal trap frame (syscall.c) - Prevented trap handler from overwriting restored a0/sepc after sigreturn completes (trap.c) - Added saved_signal_context field to process struct (process.h) - Initialized saved_signal_context to NULL in signal_init_process Nested signals are dropped while a handler is executing (the saved context would be overwritten). This is a deliberate simplification; full nesting support requires a signal frame stack.
The previous implementation called kmalloc() directly, which prepends a header before the returned pointer. The returned address was page_base + HEADER_SIZE — not page-aligned despite the claim. Changes: - Over-allocated to find an aligned boundary with room for the header - Placed the header immediately before the aligned return address - Added base_page field to kmalloc_header so kfree can locate the real PMM allocation start (header may no longer sit at page base) - Regular kmalloc sets base_page = page_addr for backward compat - Alignments > PAGE_SIZE now work instead of returning NULL
S6: sys_getppid now returns parent->pid instead of hardcoded 0.
The process struct already had a parent pointer — it just was
never consulted.
S8: signal_default_action now returns SIG_DFL for SIGUSR1/SIGUSR2.
POSIX mandates terminate, not ignore. The handler in
signal_handle already terminated correctly — the lookup
function disagreed.
S13: cond_signal now calls wait_queue_wake_one instead of
wait_queue_wake, preventing thundering herd on condition
variable signal.
S14: sys_uname now reports version 0.10.0 Synchronization instead
of stale 0.7.0 Virtual Terminals.
wait_queue_sleep now yields the CPU and returns -1 when the wait queue entry cannot be allocated, preventing callers (mutex_lock, pipe_read, pipe_write, rwlock, condvar) from spinning in a tight loop burning CPU. The yield ensures forward progress for other processes that may free memory. Return type changed from void to int so callers can detect the failure if needed.
pipe_read and pipe_write now disable interrupts around the circular buffer copy and metadata updates (read_pos, write_pos, data_size). Without this, a timer interrupt mid-operation could schedule another process that accesses the same pipe, corrupting the buffer state. The wait_queue_sleep calls remain outside the critical section since they handle their own interrupt state.
Parent pointer is now read under process_lock and validated as not PROC_UNUSED before saving. Previously, parent was captured before acquiring the lock — if the parent exited concurrently and its slot was reused, SIGCHLD would be sent to an unrelated process.
Both functions now acquire process_lock before scanning the process table, preventing races where a slot's state or PID changes mid-lookup. All callers verified: none hold process_lock when calling these functions.
pmm_alloc_page, pmm_alloc_pages, and pmm_free_page now disable interrupts around the bitmap read-modify-write sequences. Without this, a timer interrupt between bitmap_test and bitmap_set could schedule another process whose allocation returns the same page, causing silent memory corruption.
sys_mutex_create, sys_mutex_destroy, sys_cond_create, sys_cond_destroy, sys_rwlock_create, and sys_rwlock_destroy now disable interrupts around the in_use array scan and update. Without this, two concurrent creates could be preempted between slot to both callers.
sys_waitpid now properly sets PROC_SLEEPING and dequeues the process from the scheduler before yielding, rather than immediately toggling state back to PROC_RUNNING. The SIGCHLD sent by process_exit already calls process_wakeup on sleeping parents, which re-enqueues and sets PROC_READY — this path was always there but the old code never gave it a chance to fire because the parent was only sleeping for one scheduler round.
Added sleep_until_tick field to process struct. process_sleep now sets the deadline, marks PROC_SLEEPING, and dequeues from scheduler. The timer interrupt handler calls process_check_sleepers on every tick to wake processes whose deadline has passed. sys_sleep no longer manually toggles sstatus.SIE, saves and restores sscratch, or WFI-loops counting ticks. It simply computes the tick count and calls process_sleep, freeing the CPU for other processes during the sleep interval.
M1: Updated shell version string from v0.4.0 to v0.10.0
M2: Removed duplicate break statement in SYS_FORK case (dead code)
M3: Removed misleading (void) casts on queue_count/queue_tail in
scheduler_enqueue — both variables are actively used
M4: Factored duplicated process init code (~25 lines x3) into
process_init_common() helper, called from process_create,
process_create_user, and process_create_elf
M5: Removed dead second kernel_panic in user_mode_entry_wrapper
M6: Removed redundant #define INITIAL_STACK_PAGES inside function body
(already defined in constants.h)
M7: Removed redundant #define MAX_EXEC_ARGS / MAX_ARG_LEN inside
elf_exec_replace (already defined in constants.h, already included)
M8: Removed redundant extern for vterm_set_active_fg_pid inside
SYS_SETFGPID case block (already declared in drivers/vterm.h,
already included)
M9: Skipped — already fixed by S4 on bugfix/serious-quick-fixes
M10: Replaced magic number 'priority = 10' in 3 locations with
DEFAULT_PROCESS_PRIORITY constant in constants.h
Also removed redundant 'extern void signal_init_process' inside
process_create (already declared in kernel/signal.h, already included).
All files compile clean with -Wall -Wextra.
Older binutils versions don't recognise 'menvcfg' by name (added in privilege spec 1.12). Use the numeric CSR address 0x30A directly in the r_menvcfg/w_menvcfg macros so the build works across all supported toolchain versions.
- Extracted userland/ history using git subtree split - Replaced userland/ source with submodule pointer - Made userland build optional: skips gracefully if submodule not initialized - Added run_os.sh for easy QEMU invocation - Updated build_os.sh to suggest run_os.sh after build
…sly-dead test functions into test runner - Added tests/unit/test_pmm.c: 9 tests for PMM alloc/free/page-accounting - Added tests/unit/test_kmalloc.c: 6 tests for kmalloc_aligned correctness (B4 regression) - Added tests/unit/test_errno.c: 10 tests for errno set/get/clear/strerror - Fixed test_vterm.c: updated extern declarations to match actual vterm API - Wired run_memory_isolation_tests() and test_v070_features() into main.c (were declared but never called) - Added kstrcmp() to kstring.h and kstring.c (used by test_errno) - Deleted CODE_REVIEW_v0.9.0.md (all issues verified resolved) - Deleted NETWORK_TESTING.md (belongs to feature/networking branch)
- Added tests/scripts/test_helpers.sh: shared gtest_run/gtest_run_absent/ gtest_suite_begin/gtest_suite_end/gtest_summary functions; eliminates copy-pasted print_pass/print_fail/print_header in every script - Suppressed raw QEMU uart output from stdout; captured to file only; shown on failure via tail context - Each test check now prints [ RUN ] / [ OK ] / [ FAILED ] lines - Failures show the expected pattern and last 8 lines of captured output - Added --skip-build flag to test_kernel.sh, test_integration.sh, test_boot.sh - run_all_tests.sh builds kernel ONCE and passes --skip-build to sub-scripts; eliminates redundant double make clean+make when running full suite - Integration test now uses integration_fs.img instead of clobbering test_fs.img
- Updated version numbers from 0.1.0/0.4.0/0.7.0 to 0.9.0 across all docs - Removed networking (VirtIO-net) from active status in index.rst - Marked v0.11.0 Networking as deferred in ROADMAP.md - Removed broken FREEZE.md reference from CONTRIBUTING.md - Added userland submodule note to README.md - Updated documentation metrics (45 RST files, 26K+ lines) - Updated test README with correct check counts - Updated copyright year to 2025-2026 - Updated README next milestone to code quality hardening
…errno - Replaced busy-spin polling with WFI in VirtIO block driver - Documented higher-half kernel mapping as known limitation - Added mode parameter to vfs_open for O_CREAT file permissions - Rejected unsupported mmap features with proper errno codes - Applied per-segment ELF permissions (R/W/X) instead of blanket RWX - Updated VirtIO IRQ handler comment to explain WFI wake mechanism - Fixed stale process_sleep TODO comment - Re-pend nested signals instead of silently dropping them
- Replaced magic number 512 with SECTOR_SIZE in ext2_write.c write_block - Added clear_errno() to ext2_write_file success path - Added errno propagation comments to remove_dir_entry, is_dir_empty, ext2_remove_file - Added -Werror to CFLAGS in Makefile - Added tests/unit/test_ext2_vfs.c with 6 VFS/ext2 tests (stat, read, create, write, delete) - Registered test_vfs_all() in kernel/main.c (runs post filesystem init)
- Fixed vfs_close memory leak for vfs_node_t and ext2_inode_t (B1) - Added is_valid_user_pointer validation to sys_chmod and sys_chown (B2) - Added is_valid_user_pointer validation to sys_waitpid wstatus pointer (B3) - Added interrupt_save_disable to signal_send to fix pending_signals race (B4) - Added bounds check to vterm escape_buf before write (B5) - Replaced shell_strcat with bounded version taking dest_size parameter (B6) - Added set_errno to wait_queue_sleep error paths (S1) - Added set_errno to sys_close, sys_lseek, sys_kill, sys_settty (S2-S4, S14) - Added full path re-validation after length measurement in syscalls (S5) - Reconciled SYSCALL_MAX_PATH to 256 to match VFS_MAX_PATH (S6-S7) - Added kernel address validation to sys_munmap (S8) - Added clear_errno to 42 syscall success return paths (S9) - Added clear_errno to ext2_vfs_lookup and vfs_resolve_path success paths (S10-S11) - Changed signal_handle_with_frame kmalloc failure to re-pend signal (S12) - Fixed TEST_MODE build failure by wrapping launch_shell in ifndef guard - Removed userland submodule
- Added clear_errno to ext2_vfs write/create/mkdir/unlink/rmdir success paths (M1) - Fixed misleading cleanup_pages comment in paging.c (M2) - Documented sys_dup2 errno delegation to vfs_dup2 (M3) - Added clear_errno to vfs_dup2 oldfd==newfd success path (M4) - Added clear_errno to vfs_get_file success path (M5) - Added clear_errno to wait_queue_wake and wake_one success paths (M6) - Added rows==0 guard to vterm_scroll_up to prevent unsigned underflow (M8) - Added set_errno(ENOMEM) to signal_handle_with_frame kmalloc failure (M9) - Fixed virtio_blk_do_request alloc failure to use RETURN_ERRNO (M10)
- Enforced the supported QEMU 10.1.2 workflow with a host guard and Docker-backed runner - Fixed the remaining waitpid and timed-sleeper audit issues and tightened submodule enforcement in CI and tests - Added the final audit ledger update, refreshed documentation, and pinned the cleaned userland submodule commit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full code review of ThunderOS v0.9.0, followed by fixes for all identified issues. Full audit documented in
CODE_REVIEW_v0.9.0.md.BLOCKER fixes (5)
free_page_table_recursivenow frees leaf pages markedPTE_U; error paths usepmm_free_pageinstead ofkfreecond_wait— atomic unlock-enqueue-sleep under interrupt disableSYS_SIGRETURN(syscall 23)kmalloc_alignedreturning unaligned memory — over-allocates for alignment, storesbase_pagein header for correctkfreeSERIOUS fixes (12 fixed, 2 dismissed as false positives)
sys_waitpidproperly sleeps instead of busy-waitingsys_sleepuses timed sleep viasleep_until_tick+process_check_sleepersin timer ISRprocess_exitreads parent under lock, validates notPROC_UNUSEDgetppidreturnsparent->pidinstead of own pidSIG_DFLinstead of being silently ignoredprocess_get/process_get_by_indexholdprocess_lockwait_queue_sleepyields and returns -1 onkmallocfailurecond_signalwakes one waiter instead of allunamereturns version 0.10.0MINOR fixes (9 fixed, 1 skipped — duplicate of S4)
breakinSYS_FORKcase(void)casts on actively-used variables inscheduler_enqueueprocess_init_common()kernel_panicinuser_mode_entry_wrapper#define INITIAL_STACK_PAGESinside function body (already in constants.h)MAX_EXEC_ARGS/MAX_ARG_LENinelf_loader.c(already in constants.h)externdeclarations inside function/case bodiespriority = 10in 3 locations withDEFAULT_PROCESS_PRIORITY