feat: nested-Docker sandbox — --no-bwrap, greywall-netns-helper, --netns, --bridge-port#88
Draft
tito wants to merge 3 commits into
Draft
feat: nested-Docker sandbox — --no-bwrap, greywall-netns-helper, --netns, --bridge-port#88tito wants to merge 3 commits into
tito wants to merge 3 commits into
Conversation
Bubblewrap cannot create user namespaces inside Docker Desktop's VM when running as a non-root user (uid_map write is blocked regardless of cap_add SYS_ADMIN + seccomp/apparmor unconfined). This commit adds a --no-bwrap flag that skips bubblewrap entirely and enforces the sandbox using primitives that work unprivileged: - Landlock (already applied via the internal --landlock-apply wrapper) - seccomp-bpf (loaded directly via prctl + SECCOMP_SET_MODE_FILTER) - env-var-based SOCKS5/HTTP proxy (HTTP libraries honor ALL_PROXY) Layout: - internal/sandbox/linux_seccomp.go: factored out generateBPFInstructions() so the BPF program can be produced in memory without writing a file. - internal/sandbox/linux_seccomp_apply.go (NEW): ApplySeccompFilter loads the filter directly into the current process via the seccomp() syscall. Idempotent wrt PR_SET_NO_NEW_PRIVS (Landlock's Apply already sets it). - internal/sandbox/linux_nobwrap.go (NEW): WrapCommandLinuxNoBwrap emits a short shell script that exports GREYWALL_CONFIG_JSON + proxy env vars, then execs `greywall --landlock-apply --seccomp -- bash -c ...`. - cmd/greywall/main.go: added --no-bwrap flag; extended runLandlockWrapper to accept --seccomp which triggers ApplySeccompFilter after Landlock. - internal/sandbox/manager.go: Manager.noBwrap + SetNoBwrap; dispatches to WrapCommandLinuxNoBwrap in WrapCommand; skips proxy/DNS bridge initialization in no-bwrap mode (no Unix-socket bind target anyway). What this path gives up vs. full bwrap: - mount namespace / FS view isolation (Landlock denies, doesn't hide) - PID namespace - transparent tun2socks capture (needs a netns; out of scope here — follow-up work will add a --netns flag + a privileged netns helper) What it keeps: - Landlock filesystem access control - seccomp syscall denial (ptrace, mount, kexec, TIOCSTI, ...) - env-based proxy routing - zero privileges required on the wrapper or wrapped process
Adds Stage B of --no-bwrap: a separate setcap'd helper binary that
builds a persistent network namespace with tun2socks, paired with a
new --netns flag on greywall that routes the sandbox through it.
Together with --no-bwrap this restores kernel-enforced egress
capture (all traffic from the sandbox → tun0 → tun2socks → SOCKS5
proxy) without requiring the sandboxed command to hold any
privileges and without relying on the process to honor proxy env
vars.
greywall-netns-helper subcommands:
create --proxy URL [--tun2socks PATH]
unshare CLONE_NEWNET, bring up tun0 at 198.18.0.1/15, add
default route via tun0, launch tun2socks inside the netns,
pin at /run/greywall/ns-<uuid> via bind-mount, print pin path.
Needs CAP_NET_ADMIN + CAP_SYS_ADMIN (via file caps, not root).
Ambient-caps raise CAP_NET_ADMIN so ip/tun2socks children
inherit it.
exec --netns PATH -- CMD [ARGS...]
setns into the pinned netns, clear all cap sets (effective,
permitted, inheritable, ambient) + bounding set, then
syscall.Exec CMD. Strictly rejects netns paths outside
/run/greywall to prevent abuse of the file caps for arbitrary
namespace entry.
destroy PATH
SIGTERM the recorded tun2socks pid, unmount and unlink the pin
and its .pid sidecar.
Greywall CLI:
--netns <path> Require --no-bwrap. Inserts
`greywall-netns-helper exec --netns <path> --`
in front of the landlock-apply wrapper chain.
--netns-helper <path> Override helper location (default: PATH).
When --netns is set, the env-var SOCKS5 proxy injection in
WrapCommandLinuxNoBwrap is skipped: traffic is already captured by
tun0 inside the netns, and ALL_PROXY would just double-proxy to an
unreachable localhost port.
Installation requires:
setcap cap_net_admin,cap_sys_admin+ep /usr/local/bin/greywall-netns-helper
and a writable /run/greywall (deployment-side; the helper itself
does not attempt to chown /run).
Verified in Docker (non-root agent user, no bubblewrap):
* helper create → pin created, tun2socks running in netns
* helper exec → child runs with CapEff=0 inside netns
* full chain → greywall --no-bwrap --netns <path> yields
uid=1000, CapEff=0, NoNewPrivs=1, Seccomp=2,
Seccomp_filters=1, isolated netns (tun0 + lo,
no eth0). Parent shell's netns unaffected.
…socket Adds an optional `--bridge-port N` flag to `create` that lets a host-netns client reach a TCP listener inside the pinned netns. Mechanism: 1. Before unshare(CLONE_NEWNET), `create` spawns a sibling via an internal `_bridge-host` subcommand that stays in the host netns. It waits for the shared Unix socket to appear, drops all caps, then execs socat TCP4-LISTEN:N,bind=127.0.0.1 -> UNIX-CONNECT:<pin>.sock. 2. After entering the new netns, `create` also spawns socat UNIX-LISTEN:<pin>.sock -> TCP:127.0.0.1:N inside the netns, so incoming connections to the host port land on the in-netns TCP port. `destroy` now reads a multi-line pid sidecar and SIGTERMs every recorded pid (host-bridge, tun2socks, inside-bridge), then removes the pin, pidfile and .sock. `_bridge-host` validates that the socket lives alongside a valid pin path so the setcap'd helper can't be leveraged to proxy arbitrary Unix sockets. Use case: an orchestrator that drives an HTTP/RPC server inside the pinned sandbox netns from a host-netns control process. Without the bridge, dearmail-style designs (the sandboxed process exposes an API that the trusted orchestrator consumes) couldn't survive the netns isolation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft. Three stacked commits that together make greywall usable in environments where bubblewrap cannot create user namespaces — most notably inside Docker (including Docker Desktop) running as a non-root user, where
uid_mapwrites are blocked by the host kernel regardless ofcap_add SYS_ADMIN + seccomp=unconfined + apparmor=unconfined. See containers/bubblewrap#505 for the underlying limitation.Each commit stands on its own feature; they are filed together because the later ones depend on the first. Happy to split if that makes review easier.
Commits
1.
feat: --no-bwrap mode for nested-Docker / rootless environments(7c10e28)Adds a
--no-bwrapflag that skips bubblewrap entirely and enforces the sandbox using primitives that work for an unprivileged process:--landlock-applywrapper.prctl(PR_SET_NO_NEW_PRIVS) + seccomp(SECCOMP_SET_MODE_FILTER)in a newinternal/sandbox/linux_seccomp_apply.go. The existinggenerateBPFInstructions()was factored out ofwriteBPFProgramso both the file-writing path (bwrap--seccomp 3) and the direct-load path can share it.WrapCommandLinuxNoBwrap(new file) emits the proxy URL viaALL_PROXY/HTTPS_PROXYinstead of bwrap's tun2socks. Fine for well-behaved HTTP clients; raw-socket bypass is possible at this layer (commit feat: Homebrew tap formula with brew-aware commands #2 addresses that).Files:
Behaviour when
--no-bwrapis unset: unchanged.2.
feat: greywall-netns-helper + --netns flag for transparent capture(5de0788)Adds a separate helper binary
greywall-netns-helper(installed withsetcap cap_net_admin,cap_sys_admin+ep, never root) that:create --proxy URL—unshare(CLONE_NEWNET), bring uptun0at 198.18.0.1/15, set default route via tun0, launch the existing embeddedtun2socksinside the netns pointed at the SOCKS5 proxy, bind-mount/proc/self/ns/netto/run/greywall/ns-<uuid>to pin it, print the pin path, exit.exec --netns PATH -- CMD—setnsinto the pin, drop all caps (effective + permitted + inheritable + ambient + bounding set),syscall.Execthe user command. Rejects paths outside/run/greywallso the file caps can't be leveraged to enter arbitrary netns.destroy PATH— SIGTERM the recorded tun2socks pid, unmount and remove the pin.The main greywall binary gets a paired
--netns <path>flag that, when combined with--no-bwrap, prefixes the command chain withgreywall-netns-helper exec --netns <path> --so the wrapped process enters the prepared netns before Landlock/seccomp are applied.Result: kernel-enforced egress capture (every TCP/UDP packet goes through tun0 → tun2socks → SOCKS5) without requiring the wrapped process to hold any capabilities and without relying on it honoring proxy env vars. Raw sockets are no longer a bypass.
Ambient-caps dance for
CAP_NET_ADMINis used soip/tun2sockschildren of the helper inherit the capability without the helper having to reimplement the netlink calls in Go.3.
feat(netns-helper): --bridge-port(e735ca7)Optional
--bridge-port Nflag oncreatethat makes a TCP server inside the netns reachable from the host netns (typical use: a sandboxed server whose API is consumed by a trusted host-netns orchestrator). Mechanism:UNIX-LISTEN:<pin>.sock → TCP:127.0.0.1:Nunshare): socatTCP4-LISTEN:N,bind=127.0.0.1 → UNIX-CONNECT:<pin>.sockdestroynow reads a multi-line PID sidecar (<pin>.pidcarrying tun2socks + both socat pids) and SIGTERMs every recorded pid. The_bridge-hostinternal subcommand validates that the socket path sits alongside a valid pin so the file caps can't be abused to proxy arbitrary Unix sockets.Security notes
greywall-netns-helperbinary.greywall-netns-helper execvalidates its--netnspath is under/run/greywall/and drops every cap set (effective, permitted, inheritable, ambient, bounding) beforesyscall.Exec.greywall-netns-helper _bridge-host(internal) validates its--socketpath similarly.cap_net_admin,cap_sys_admin+ep) are the principle of least privilege for the helper:CAP_SYS_ADMINis required tounshare(CLONE_NEWNET)andsetns;CAP_NET_ADMINforip tuntap/addr/link/route. Nothing else.Install delta
iproute2(theiptool) andlibcap2-bin(at install-time, forsetcap).tun2socksbinary is already embedded in the greywall source tree for amd64/arm64./run/greywall/directory owned by the invoking user must exist at runtime (helper does not attempt to chown). Suggested deployment:mkdir -p /run/greywall && chown $USER:$USER /run/greywall, or a systemd-tmpfiles drop-in.Testing
All three commits verified end-to-end inside a plain
python:3.12-slimDocker container running as a non-root user with the usualcap_add SYS_ADMIN NET_ADMIN+security_opt seccomp=unconfined apparmor=unconfined:--no-bwrapalone: wrapped process hasuid=1000, CapEff=0, NoNewPrivs=1, Seccomp=2, Seccomp_filters=1, Landlock rules enforced (write to/etc→ EACCES), write to cwd allowed.--no-bwrap --netns <pin>: same sandbox state plus the process netns differs from the host netns;ip -br linkinside shows onlylo + tun0,eth0not visible.--bridge-port 4096: a host-netns client connects to127.0.0.1:4096and talks to an HTTP server bound at127.0.0.1:4096inside the netns; round-trip confirmed with a standard health-check response.destroy: all three pids (tun2socks + 2 socat) are gone, pin + .sock + .pid files unlinked.Builds clean on
go build ./...andGOOS=linux GOARCH=arm64 go build ./....Test plan
go build ./...on darwin/arm64GOOS=linux GOARCH=arm64 go build ./...--no-bwrap -- <cmd>as non-root user in Docker → Landlock + seccomp appliedgreywall-netns-helper create→ netns pinned, tun2socks up, helper exits cleanlygreywall --no-bwrap --netns <pin> -- <cmd>→ child runs in isolated netns with no capsgreywall-netns-helper destroy→ all pids SIGTERM'd, files cleaned up--bridge-port N→ bidirectional TCP reachable from host netns