Skip to content

feat: dynamic numtide catalog (88+ agents), CI smoke tests, catalog defensive fixes#2

Merged
HashWarlock merged 140 commits intomasterfrom
pi-sandbox-refactor
Apr 10, 2026
Merged

feat: dynamic numtide catalog (88+ agents), CI smoke tests, catalog defensive fixes#2
HashWarlock merged 140 commits intomasterfrom
pi-sandbox-refactor

Conversation

@HashWarlock
Copy link
Copy Markdown
Owner

@HashWarlock HashWarlock commented Apr 9, 2026

Summary

  • Dynamic catalog: replaced hardcoded 25-name whitelist in nix/catalog.nix with a full llm-agents.nix passthrough — 88+ agents now exposed automatically, zero maintenance on upstream bumps
  • CI smoke tests: added openclaw, hermes-agent (binary: hermes), and jules to the agent matrix; added catalog count assertion (>50) with presence check for new agents
  • Catalog defensive fixes: query_catalog in nix.rs now filters non-derivation attrs via filterDrvs before accessing .meta.description; catalog.nix emits a builtins.trace warning when llm-agents-pkgs is unexpectedly empty

Test Plan

  • All 20 Rust unit tests pass (cargo test)
  • nix eval .#catalog.agents returns 88+ package names
  • nixosandbox catalog --json shows 88 agents, 24 tools
  • CI agent smoke tests pass for all 9 agents (claude-code, codex, opencode, amp, droid, pi, openclaw, hermes-agent, jules)
  • CI catalog count step asserts >50 agents and confirms openclaw/hermes-agent/jules present

🤖 Generated with Claude Code

HashWarlock and others added 30 commits April 3, 2026 18:24
Complete design specification for the Pi + nixosandbox refactor covering:
- Repository structure (two-package flat layout)
- NDJSON protocol contract with truthfulness invariants
- Runtime client and crash synthesis
- Session manager, profiles, runtime bases, and reconciler
- Pi extension tools (sandbox_run, read/write/list files, session info)
- Rust runtime stub and 6 canonical protocol tests
- Migration phases and definition of done

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Tasks 9-12: full plan validation with error/warning codes, network
observer stub with would-have-blocked computation, process supervisor with
atomic sequence numbers and cancel-channel SIGTERM support, and a wired main.rs
pipeline (parse -> validate -> supervise -> result).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Tasks 13-18: version mismatch rejection, validation failure
(RW_TARGET_NOT_ALLOWED), successful echo run with sequence checking,
cancel flow observing cancel_requested lifecycle, crash synthesis TS-only
unit tests (both with/without validation state), and degraded allowlist
mode verification (ALLOWLIST_NOT_ENFORCED warning, effective mode=full).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… states)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… extension tools (Phase 6-7)

Implements Tasks 19-24: hardcoded execution profiles, host-derived mount
resolution with sha256 fingerprinting, UUID-based session directory management,
orphan-PID reconciliation, 5 sandbox tool definitions, and the Pi extension
entry point with session_start/session_shutdown lifecycle hooks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bservation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure function that converts a PlanPayload + EffectiveState into a Vec<String>
of bwrap arguments, covering mounts, devices, proc, namespaces, env, cwd, and command.
Includes 13 unit tests covering all argument categories.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…D warnings

Validator now accepts BwrapAvailability to determine which namespaces can
be applied, emitting NAMESPACE_DEGRADED warnings when bwrap is unavailable.
Also adds env_allowlist filtering and namespacesApplied/envApplied fields
to EffectiveState. main.rs wired to detect bwrap and pass it to validator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…back

Supervisor now accepts BwrapAvailability and dispatches to bwrap argv
construction via plan_builder when available, or falls back to direct
Command execution on platforms where bwrap is unavailable (e.g., macOS).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rser

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…wlist)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HashWarlock and others added 25 commits April 9, 2026 17:17
- Add @types/node to devDependencies (fixes CI typecheck failure)
- Add matrix-based agent smoke tests: create sandbox with each agent
  (claude-code, codex, opencode, amp, droid, pi), verify binary launches
- Tests run in parallel via strategy.matrix with fail-fast: false
- Use NIXOSANDBOX_BWRAP_PATH to point to Nix-built bwrap 0.11.0
  (system bwrap from apt was found first via PATH, lacking --pivot-root)
- Update actions/checkout v4→v6, actions/cache v4→v5,
  actions/setup-node v4→v6 to eliminate Node.js 20 deprecation warnings
…rrectly

Three bugs fixed:
- bwrap spawn ignored the path from detect(), always using bare "bwrap"
  which fails when NIXOSANDBOX_BWRAP_PATH points outside PATH
- cmd_exec called load_profile() on custom:* sessions, printing a
  spurious warning since those aren't real profile files
- docker ensure_image() leaked stderr to terminal during tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--pivot-root is not a bubblewrap CLI option (it's the syscall bwrap
uses internally). The correct approach is --ro-bind <rootfs> / which
tells bwrap to mount the rootfs read-only at / and internally perform
the pivot_root.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lags

- Switch CI from Nix-built bwrap (non-setuid, blocked by AppArmor on
  Ubuntu 24.04) to apt-installed bwrap (setuid, handles namespaces)
- Add inline verification that bwrap can actually create sandboxes
- Add AppArmor sysctl fallback for containers/bubblewrap#632
- Add --die-with-parent and --new-session per bwrap docs for lifecycle safety

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Nix rootfs contains absolute symlinks pointing into /nix/store.
Without binding /nix/store into the sandbox, all binaries and
libraries are dangling symlinks, causing "execvp: No such file or
directory" for every command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bwrap needs /nix/store to exist as a directory inside the read-only
rootfs so it can bind-mount the host Nix store there. Without it,
bwrap fails with "Can't mkdir parents for /nix/store: Read-only file
system" since it can't create directories on a ro-bind mount.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old README described a REST API server that no longer exists.
Rewritten to document the current architecture: Rust CLI, Nix catalog
with 25+ agents from llm-agents.nix, bubblewrap isolation, session
management, built-in profiles, and custom composition via --with.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Build/test commands, architecture overview, key design decisions,
module responsibilities, and CI-specific notes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Guard all tool execute() args destructuring with (args ?? {}) to
  handle Pi passing undefined for tools called with no arguments
- Patch TypeBox schemas to always include required[] array, which Pi
  expects but TypeBox omits for all-optional parameter schemas
- Add Pi extension setup instructions to README
- Add .pi/ to gitignore (local extension wrappers are user-specific)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…llm-agents-pkgs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@HashWarlock HashWarlock changed the title feat: Pi sandbox runtime + llm-agents.nix integration feat: dynamic numtide catalog (88+ agents), CI smoke tests, catalog defensive fixes Apr 10, 2026
…l string

d[\"agents\"] inside a single-quoted shell heredoc passes literal backslashes
to Python, causing: SyntaxError: unexpected character after line continuation
character. Extract to a variable (agents = d["agents"]) instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@HashWarlock HashWarlock mentioned this pull request Apr 10, 2026
@HashWarlock HashWarlock merged commit 8a508be into master Apr 10, 2026
26 checks passed
@HashWarlock HashWarlock deleted the pi-sandbox-refactor branch April 11, 2026 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant