pycrate

A minimal container runtime written in Python for educational purposes only. It demonstrates the three Linux features that make containers work: namespaces, cgroups, and chroot.

The implementation is most likely full of all kinds of security holes so don't use it for anything you care about.

Usage

pycrate must run on Linux as root.

cd /pycrate
python3 pycrate.py run <image.tar.gz> <command> [options]

Example

# Run a shell inside a container with resource limits
sudo ./pycrate.py run ubuntu.tar.gz /bin/bash --hostname mycontainer --memory 64 --cpu 50

Options

Flag	Description	Example
`--hostname`	Set the container's hostname	`mybox`
`--memory`	Memory limit in MB	`64`
`--cpu`	CPU limit as a percentage	`50`

How it works

When you run pycrate.py run, it goes through four steps that mirror what Docker does with docker run:

Step 1: Extract the image

ubuntu.tar.gz  ->  /tmp/container-<id>/
                      |-- bin/
                      |-- etc/
                      |-- lib/
                      |-- proc/
                      |-- ...

The image tarball is extracted into a temporary directory. This becomes the container's root filesystem. A real container runtime like Docker uses a layered filesystem (overlayfs), but a flat tarball achieves the same result for our purposes. The container gets a populated filesystem to run in.

Step 2: Set up cgroups (optional)

cgroups (control groups) are the kernel mechanism for limiting resources. They work through a virtual filesystem at /sys/fs/cgroup. To limit a process:

Create a directory under /sys/fs/cgroup/ (this is the cgroup)
Write limits to files in that directory (memory.max, cpu.max)
Write the process's PID to cgroup.procs

The kernel then enforces those limits. If a process exceeds its memory limit, the kernel OOM kills it. CPU limits use a quota/period model e.g. 50% CPU means the process gets 50ms of CPU time per 100ms window.

The current process is added to the cgroup before namespaces are created, so the forked child inherits membership automatically.

Step 3: Create namespaces

os.unshare(CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWNET | CLONE_NEWIPC)

unshare() tells the kernel to give this process and its children isolated versions of system resources. Each namespace type isolates something different:

Namespace	Flag	What it isolates
PID	`CLONE_NEWPID`	Process IDs (container sees only its own processes)
Mount	`CLONE_NEWNS`	Mount table (container's mounts don't affect host)
UTS	`CLONE_NEWUTS`	Hostname (container gets its own hostname)
Network	`CLONE_NEWNET`	Network stack (container gets its own interfaces)
IPC	`CLONE_NEWIPC`	Inter-process communication (shared memory, etc.)

Step 4: Fork and set up the filesystem

After creating namespaces, the process forks:

          os.fork()
          /         \
  Child (pid=0)    Parent
      |               |
sethostname()     waitpid(child)
      |               |
setup_filesystem()  cleanup()
      |
os.execvp(command)

The child becomes the container:

Sets the hostname inside the UTS namespace using socket.sethostname()
Mounts /proc inside the new root - a virtual filesystem that exposes process info. Without it, ps and top won't work.
Calls chroot(rootfs) this tells the kernel "for this process, / now means rootfs." The container can only see files inside the extracted image.
Calls os.execvp(command) replaces itself with the requested command (e.g. /bin/bash). The container is now "running."

The parent manages the lifecycle:

Waits for the child to exit
Moves itself out of the cgroup and removes it
Deletes the temporary rootfs directory

Known Limitations

This is an educational tool and I am no expert in low-level kernel interfaces. Some differences that I know about vs something like Docker:

chroot instead of pivot_root - chroot is a path-lookup trick that a root process can escape. Docker uses pivot_root which actually swaps the mount namespace root, making escape much harder.
No overlayfs - Docker uses layered filesystems so multiple containers can share base image layers. We extract a flat tarball each time.
No networking - We create a network namespace but don't set up virtual interfaces, so the container has no network access.
No cgroup namespace - We skip CLONE_NEWCGROUP because our single-process architecture (unlike Docker's daemon model) means the parent needs access to /sys/fs/cgroup for cleanup after the container exits.
Must run as root - Docker's daemon handles privilege separation. We require root directly.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
utils		utils
README.md		README.md
alpine-3.21.3.tar.gz		alpine-3.21.3.tar.gz
pycrate.py		pycrate.py
ubuntu.tar.gz		ubuntu.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pycrate

Usage

Example

Options

How it works

Step 1: Extract the image

Step 2: Set up cgroups (optional)

Step 3: Create namespaces

Step 4: Fork and set up the filesystem

Known Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pycrate

Usage

Example

Options

How it works

Step 1: Extract the image

Step 2: Set up cgroups (optional)

Step 3: Create namespaces

Step 4: Fork and set up the filesystem

Known Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages