kmorph is a runtime management system for multikernel architectures, providing automated health monitoring, failure recovery, and zero-downtime kernel live upgrades. It consists of two main components:
kmorphd- kmorph daemon providing continuous runtime orchestrationkmorphctl- Command-line interface for runtime operations
kmorph complements kerf:
- kerf: Configuration and resource allocation (device tree management, instance creation)
- kmorph: Runtime management and orchestration (monitoring, healing, morphing/upgrades)
Both projects share the underlying libkerf library for device tree operations and state management.
Continuous heartbeat-based health monitoring of kernel instances with configurable policies:
- Heartbeat protocol over vsock transport
- Configurable timeouts (development: 60s, production: 1s)
- Health metrics collection (CPU, memory, I/O responsiveness)
- Failure detection with exponential backoff for transient issues
Automated failure detection and resource recovery:
- Automatic failover when kernel instance becomes unresponsive
- Resource reclamation from crashed kernels
- Graceful degradation with notification to operators
- vmcore capture during recovery for post-mortem analysis
Interactive kernel morph/upgrade protocol with minimal downtime (<10ms):
- State negotiation between old and new kernels
- Capability validation before commit
- Atomic handover of resources and workloads
- Rollback support if upgrade fails validation
Replaces traditional kdump with multikernel-aware crash capture:
- Parallel capture by surviving kernel instances
- No system reboot required
- Preserved system state across other instances
- Standard vmcore format compatible with crash analysis tools
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Space β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β kmorphctl CLI β
β ββ Commands: spawn, monitor, morph, recover β
β ββ Communicates with daemon via Unix socket β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β kmorphd Daemon β
β ββ Health Monitor: Heartbeat protocol, timeout detection β
β ββ Recovery Manager: Auto-heal, vmcore capture β
β ββ Morphing Orchestrator: State transfer, validation β
β ββ vsock Protocol Handler: KMP implementation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β libkerf (Shared Library) β
β ββ Device tree parsing and manipulation β
β ββ Instance state management β
β ββ Resource allocation APIs β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β vsock (port 9000)
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Kernel Space β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β kerf-vsock.ko β
β ββ VSOCK_TRANSPORT_F_LOCAL transport β
β ββ CID mapping: instance_id + 2 β
β ββ Hybrid IPI/shared-memory backend β
β ββ Fast path (<256 bytes) via IPI β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Kernel Instance (Running) β
β ββ Heartbeat responder β
β ββ State serialization hooks β
β ββ Resource handover support β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
kmorphd (kmorph Daemon)
ββ Main Event Loop
β ββ vsock connection manager
β ββ Unix socket server (CLI communication)
β ββ Timer management (heartbeats, timeouts)
β
ββ Health Monitor Module
β ββ Heartbeat sender
β ββ Timeout detector
β ββ Health metric aggregator
β ββ Policy engine (per-instance policies)
β
ββ Recovery Manager Module
β ββ Failure detector
β ββ Resource reclaim coordinator
β ββ Vmcore capture controller
β ββ Auto-heal state machine
β
ββ Morphing Orchestrator Module
β ββ State transfer negotiator
β ββ Validation coordinator
β ββ Atomic handover controller
β ββ Rollback handler
β
ββ Protocol Handler (KMP)
ββ Message serialization/deserialization
ββ Connection management
ββ Retry and timeout logic
ββ Protocol state machines
The Kernel Morphing Protocol (KMP) runs over vsock on port 9000, providing reliable message exchange between kmorphd and kernel instances.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Message Header (32 bytes) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β magic : u32 (e.g., 0x4B4D5000 = "KMP\0") β
β version : u16 (protocol version) β
β msg_type : u16 (message type enum) β
β sequence : u32 (for request/response matching) β
β payload_len : u32 (length of payload in bytes) β
β timestamp : u64 (nanoseconds since boot) β
β crc32 : u32 (checksum of header + payload) β
β reserved : u32 (for future use) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Payload (variable length) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Health Monitoring:
KMP_HEARTBEAT_REQ- Heartbeat request from daemonKMP_HEARTBEAT_RSP- Heartbeat response with health metricsKMP_HEALTH_STATUS- Unsolicited health status report
Auto-Healing:
KMP_RECOVER_PREPARE- Prepare for resource reclamationKMP_RECOVER_RECLAIM- Reclaim resources from failed instanceKMP_RECOVER_COMPLETE- Recovery completedKMP_VMCORE_START- Begin vmcore captureKMP_VMCORE_DATA- vmcore data chunkKMP_VMCORE_COMPLETE- vmcore capture complete
Kernel Morphing (Live Upgrade):
KMP_MORPH_PREPARE- Prepare for morph/upgradeKMP_MORPH_NEGOTIATE- Negotiate capabilitiesKMP_MORPH_TRANSFER_STATE- Transfer runtime stateKMP_MORPH_VALIDATE- Validate new kernel readinessKMP_MORPH_COMMIT- Commit to new kernel (atomic handover)KMP_MORPH_ROLLBACK- Abort morph, revert to old kernel
Error Handling:
KMP_ERROR- Error response with error code and messageKMP_RETRY- Request retry with exponential backoff
MONITORING
β
ββ Heartbeat timeout detected
β
βΌ
DETECTING_FAILURE
β
ββ Confirmed failure (multiple timeouts)
β
βΌ
PREPARING_RECOVERY
β
ββ Send KMP_RECOVER_PREPARE to surviving kernels
β
βΌ
CAPTURING_VMCORE (optional)
β
ββ Parallel vmcore capture by another instance
β
βΌ
RECLAIMING_RESOURCES
β
ββ Send KMP_RECOVER_RECLAIM
ββ Update device tree
ββ Broadcast new topology
β
βΌ
RECOVERY_COMPLETE
β
ββ Return to MONITORING
RUNNING
β
ββ User initiates morph (kmorphctl morph)
β
βΌ
MORPH_PREPARING
β
ββ Spawn new kernel instance
ββ Send KMP_MORPH_PREPARE to old kernel
β
βΌ
MORPH_NEGOTIATING
β
ββ Exchange capabilities (KMP_MORPH_NEGOTIATE)
ββ Verify compatibility
β
βΌ
MORPH_TRANSFERRING
β
ββ Transfer runtime state (KMP_MORPH_TRANSFER_STATE)
ββ Map state: memory regions, file descriptors, network connections
β
βΌ
MORPH_VALIDATING
β
ββ New kernel validates received state (KMP_MORPH_VALIDATE)
ββ Run health checks
ββ Decision: COMMIT or ROLLBACK
β
βββββββββββββββ
β β
βΌ βΌ
COMMITTING ROLLING_BACK
β β
ββ Atomic ββ Terminate new kernel
β handover ββ Return resources
β ββ Resume old kernel
β
βΌ
RUNNING (new kernel)
The vsock transport uses a two-path approach for optimal performance:
- Message size: < 256 bytes
- Latency: 1-2ΞΌs
- Use cases: Heartbeat, quick status, error notifications
- Implementation: Direct IPI ring buffer
- Message size: β₯ 256 bytes
- Latency: 200-500ΞΌs for 210KB
- Use cases: State transfer, vmcore data, capability negotiation
- Implementation: Shared memory with IPI notification
# Start the multikernel daemon
kmorphctl daemon start [--config=/etc/kmorph/kmorphd.conf]
# Stop the daemon
kmorphctl daemon stop [--graceful]
# Check daemon status
kmorphctl daemon status
# Reload configuration without restart
kmorphctl daemon reload# Spawn a kernel instance (requires kerf configuration)
kmorphctl spawn <instance-name> --kernel=<vmlinuz-path> [options]
Options:
--kernel-cmdline="..." # Override kernel command line
--initramfs=<path> # Specify initramfs
--monitor # Enable auto-healing (default: true)
--heartbeat-timeout=<sec> # Custom heartbeat timeout
# List running instances
kmorphctl list [--verbose]
# Show instance status
kmorphctl status <instance-name>
# Terminate instance
kmorphctl terminate <instance-name> [--force]# Show real-time monitoring dashboard
kmorphctl monitor [--instance=<name>]
# Display heartbeat status
kmorphctl heartbeat <instance-name>
# Show health metrics
kmorphctl health <instance-name> [--metrics=cpu,memory,io]
# Test heartbeat manually
kmorphctl ping <instance-name># Trigger manual recovery
kmorphctl recover <instance-name>
# Capture vmcore from failed instance
kmorphctl vmcore capture <instance-name> --output=<path>
# Analyze captured vmcore
kmorphctl vmcore analyze <vmcore-file>
# Show recovery history
kmorphctl recovery-log [--instance=<name>] [--since=<time>]# Prepare morph (spawn new kernel, no traffic switch)
kmorphctl morph prepare <instance-name> --kernel=<new-vmlinuz>
# Validate new kernel (run health checks)
kmorphctl morph validate <instance-name>
# Commit morph (atomic handover)
kmorphctl morph commit <instance-name>
# Rollback if validation fails
kmorphctl morph rollback <instance-name>
# Full morph in one command (prepare β validate β commit)
kmorphctl morph <instance-name> --kernel=<new-vmlinuz> [--auto-commit]# Show current policies
kmorphctl policy show [--instance=<name>]
# Set heartbeat timeout
kmorphctl policy set heartbeat-timeout <instance-name> <seconds>
# Set auto-healing behavior
kmorphctl policy set auto-heal <instance-name> <enabled|disabled>
# Set morph validation timeout
kmorphctl policy set morph-timeout <instance-name> <seconds>
# Export policy configuration
kmorphctl policy export [--output=<file>]
# Import policy configuration
kmorphctl policy import <file># 1. Configure resources with kerf
kerf create web-server --cpus=4-15 --memory=16GB
# 2. Start kmorphd daemon
kmorphctl daemon start
# 3. Spawn instance with auto-healing
kmorphctl spawn web-server --kernel=/boot/vmlinuz-6.8.0 --heartbeat-timeout=1 --monitor
# 4. Monitor health
kmorphctl monitor --instance=web-server
# 5. Automatic recovery happens if instance fails
# Check recovery log
kmorphctl recovery-log --instance=web-server# 1. Prepare morph (spawn new kernel)
kmorphctl morph prepare web-server --kernel=/boot/vmlinuz-6.9.0
# 2. Validate new kernel
kmorphctl morph validate web-server
# Output: Validation successful (health checks passed)
# 3. Commit morph (atomic handover, <10ms downtime)
kmorphctl morph commit web-server
# 4. Verify new kernel running
kmorphctl status web-server
# Output: Running kernel: 6.9.0# 1. Instance becomes unresponsive
kmorphctl status database-server
# Output: Status: UNRESPONSIVE (last heartbeat: 62s ago)
# 2. Capture vmcore for analysis
kmorphctl vmcore capture database-server --output=/var/crash/database-vmcore-$(date +%s)
# 3. Trigger recovery
kmorphctl recover database-server
# 4. Analyze vmcore later
kmorphctl vmcore analyze /var/crash/database-vmcore-*kerf (Configuration):
- Device tree creation and validation
- Resource allocation planning
- Instance definition (CPU, memory, I/O)
- Static configuration management
kmorph (Runtime):
- Instance execution and monitoring
- Health checking and auto-healing
- Live morphing and state management
- Runtime policy enforcement
# Step 1: Configure with kerf
kerf init --baseline=/boot/system.dts
kerf create web-server --cpus=4-15 --memory=16GB
kerf validate
# Step 2: Execute with kmorph
kmorphctl spawn web-server --kernel=/boot/vmlinuz
kmorphctl monitor
# Step 3: Morph with kmorph (updates kerf config)
kmorphctl morph web-server --kernel=/boot/vmlinuz-new
# kmorph automatically updates device tree via libkerfBoth kerf and kmorphd use libkerf for state management:
# libkerf Python API (used by both kerf and kmorphd)
from libkerf import DeviceTree, Instance
# Read instance configuration (used by both)
dt = DeviceTree.load("/var/lib/kerf/instances/web-server.dts")
instance = Instance.from_device_tree(dt)
# kerf: Create and validate
instance.validate_resources()
instance.save()
# kmorphd: Read and execute
instance = Instance.load("web-server")
instance.spawn(kernel="/boot/vmlinuz")
instance.monitor(heartbeat_timeout=5)-
Daemon (kmorphd): Rust
- Async runtime: Tokio
- vsock: Custom implementation using
VSOCK_TRANSPORT_F_LOCAL - Serialization: bincode for protocol, serde for config
- Logging: tracing + tracing-subscriber
- Metrics: Prometheus exporter
-
CLI (kmorphctl): Rust
- CLI framework: clap
- Unix socket client: tokio::net::UnixStream
- Output formatting: prettytable-rs, serde_json
-
Kernel Module (kerf-vsock.ko): C
- vsock transport implementation
- Integration with existing IPI infrastructure
- Shared memory management for bulk transfers
-
Shared Library (libkerf): Python + Rust
- Python: High-level API for scripting
- Rust: Core device tree parsing (dtc integration)
- FFI: PyO3 for Python bindings
Daemon:
- tokio (async runtime)
- vsock (custom implementation)
- serde, serde_yaml (configuration)
- tracing (logging)
- prometheus (metrics export)
CLI:
- clap (command-line parsing)
- tokio (async Unix socket client)
- prettytable-rs (table formatting)
- indicatif (progress bars)
Kernel Module:
- Linux kernel headers
- vsock kernel APIs
- IPI infrastructure from multikernel patches
-
Multi-instance coordination
- Coordinated morphs across related instances
- Load balancing during morphs
- Cascading failure detection
-
State transfer optimization
- Incremental state transfer (delta updates)
- Compression for large state
- Background pre-warming of new kernels
-
Advanced monitoring
- Per-syscall latency tracking
- Resource utilization trending
- Anomaly detection via ML
-
Integration ecosystem
- Kubernetes operator for multikernel pods
- Terraform provider for infrastructure-as-code
- Grafana dashboard templates
-
Enhanced recovery
- Partial recovery (reclaim only failed subsystems)
- Checkpoint/restore integration
- Live migration between physical hosts
- kerf Project - Resource configuration framework
- Multikernel Architecture Design
- vsock Protocol
- Live Update Orchestrator (LUO) - Comparison and design decisions
