CORTEX

Hardware-aware AI workload scheduler with kernel bypass networking.
GPU topology discovery, NVLink-aware placement, DPDK zero-copy transfers,
and predictive autoscaling for production GPU clusters.

Features · Architecture · Quick Start · Benchmarks · Contributing

CORTEX — GPU cluster dashboard with topology view, scheduling timeline, and memory allocation

Features

NVML Topology Discovery -- Reads physical GPU interconnect hierarchy at runtime
NVLink-Aware Bin-Packing -- Places workloads on GPUs connected via high-bandwidth NVLink
NUMA Affinity Scoring -- Weighted affinity model across NVLink, PCIe, NUMA, and socket boundaries
DPDK Kernel Bypass -- Eliminates kernel overhead for tensor transfers (~5us vs ~100us)
io_uring Zero-Copy -- Asynchronous I/O with zero-copy semantics for data plane operations
Buddy Allocator -- Efficient GPU memory management with splitting, coalescing, and defragmentation
Prophet + LSTM Autoscaling -- Ensemble forecasting for predictive cluster scaling
Energy & Carbon Tracking -- NVML power monitoring, dynamic frequency scaling, carbon-aware scheduling
Kubernetes Native -- Scheduler extender, device plugin, and custom metrics server

Architecture Overview

┌──────────────────────────────────────────────────────────────────────┐
│                        CORTEX Control Plane                         │
│                                                                     │
│  ┌─────────────┐  ┌──────────────┐  ┌───────────┐  ┌────────────┐  │
│  │  Scheduler   │  │  Autoscaler  │  │  Energy   │  │ Dashboard  │  │
│  │  (Rust)      │  │  (Python)    │  │  (Rust)   │  │ (React)    │  │
│  │             │  │              │  │           │  │            │  │
│  │ - Topology  │  │ - Prophet    │  │ - NVML    │  │ - Topology │  │
│  │ - Affinity  │  │ - LSTM       │  │ - Clock   │  │ - Heatmap  │  │
│  │ - Placement │  │ - K8s HPA    │  │ - Carbon  │  │ - Latency  │  │
│  └──────┬──────┘  └──────┬───────┘  └─────┬─────┘  └─────┬──────┘  │
│         │                │                │               │         │
│  ┌──────┴────────────────┴────────────────┴───────────────┴──────┐  │
│  │                    Kubernetes Integration                     │  │
│  │  ┌──────────────┐  ┌───────────────┐  ┌────────────────────┐  │  │
│  │  │ Scheduler    │  │ Device Plugin │  │ Custom Metrics     │  │  │
│  │  │ Extender(Go) │  │ (Go)          │  │ Server (Go)        │  │  │
│  │  └──────────────┘  └───────────────┘  └────────────────────┘  │  │
│  └───────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        Data Plane                                    │
│                                                                     │
│  ┌─────────────────┐     ┌─────────────────────────────────────┐    │
│  │  Network Layer   │     │  Memory Manager                     │    │
│  │  (Rust)          │     │  (Rust)                             │    │
│  │                  │     │                                     │    │
│  │  - DPDK Bypass   │     │  - Buddy Allocator                 │    │
│  │  - io_uring      │◄───►│  - Defragmenter                    │    │
│  │  - Tensor Xfer   │     │  - Pattern Monitor                 │    │
│  │  - Zero-copy     │     │  - Compactor                       │    │
│  └─────────────────┘     └─────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────────────┘

GPU Topology Awareness

CORTEX reads the physical GPU topology via NVML and makes placement decisions that respect the hardware interconnect hierarchy:

                    ┌─────────────┐
                    │   CPU 0     │
                    │  (NUMA 0)   │
                    └──────┬──────┘
                           │ QPI/UPI
            ┌──────────────┼──────────────┐
            │              │              │
       ┌────┴────┐   ┌────┴────┐   ┌────┴────┐
       │ PCIe    │   │ PCIe    │   │ PCIe    │
       │ Switch 0│   │ Switch 1│   │ Root    │
       └────┬────┘   └────┬────┘   │ Complex │
            │              │        └────┬────┘
       ┌────┴────┐   ┌────┴────┐        │
       │         │   │         │   ┌────┴────┐
   ┌───┴──┐ ┌───┴──┐┌──┴──┐┌──┴──┐│  CPU 1  │
   │GPU 0 │ │GPU 1 ││GPU 2││GPU 3││ (NUMA 1)│
   │      │ │      ││     ││     │└────┬────┘
   └───┬──┘ └───┬──┘└──┬──┘└──┬──┘     │
       │  NVLink │      │NVLink│   ┌────┴────┐
       └────────┘      └──────┘   │         │
                              ┌───┴──┐ ┌───┴──┐
                              │GPU 4 │ │GPU 5 │
                              └───┬──┘ └───┬──┘
                                  │NVLink│
                                  └──────┘

  Affinity Score Calculation:
  ┌────────────────────────────────────┐
  │ Same GPU         → Score: 1.0      │
  │ NVLink Peer      → Score: 0.9      │
  │ Same PCIe Switch → Score: 0.7      │
  │ Same NUMA Node   → Score: 0.5      │
  │ Cross NUMA (QPI) → Score: 0.2      │
  │ Cross Socket     → Score: 0.1      │
  └────────────────────────────────────┘

Memory Management: Buddy Allocator

  GPU Memory Space (e.g., 16 GB)
  ┌───────────────────────────────────────────────────────┐
  │                    Order 10 (16 GB)                    │
  ├─────────────────────────┬─────────────────────────────┤
  │     Order 9 (8 GB)      │      Order 9 (8 GB)         │
  ├────────────┬────────────┼────────────┬────────────────┤
  │  O8 (4GB)  │  O8 (4GB)  │  O8 (4GB)  │   O8 (4GB)    │
  ├──────┬─────┼──────┬─────┼──────┬─────┼──────┬────────┤
  │O7 2G │O7 2G│O7 2G │O7 2G│O7 2G │O7 2G│O7 2G │O7 2G  │
  └──────┴─────┴──────┴─────┴──────┴─────┴──────┴────────┘

  Allocation: Split larger blocks until right size found
  Deallocation: Merge buddy pairs back into larger blocks
  Defragmentation: Compact scattered allocations

Kernel Bypass Networking

  Traditional Path:              CORTEX Path:
  ┌──────────┐                   ┌──────────┐
  │ App      │                   │ App      │
  ├──────────┤                   ├──────────┤
  │ gRPC     │                   │ Tensor   │
  ├──────────┤                   │ Protocol │
  │ HTTP/2   │                   ├──────────┤
  ├──────────┤        vs.        │ io_uring │
  │ TCP      │                   │ zero-copy│
  ├──────────┤                   ├──────────┤
  │ Kernel   │                   │ DPDK     │
  │ Network  │                   │ (bypass) │
  │ Stack    │                   └────┬─────┘
  ├──────────┤                        │
  │ NIC      │                   ┌────┴─────┐
  └──────────┘                   │ NIC      │
                                 └──────────┘
  Latency: ~100us                Latency: ~5us

Components

Component	Language	Description
`scheduler/`	Rust	GPU topology discovery, NVLink-aware bin-packing, NUMA affinity
`network/`	Rust	DPDK kernel bypass, io_uring zero-copy, tensor transfer protocol
`memory/`	Rust	Buddy allocator for GPU memory, defragmentation, compaction
`autoscaler/`	Python	Prophet + LSTM ensemble forecasting, K8s HPA integration
`energy/`	Rust	NVML power monitoring, dynamic frequency scaling, carbon tracking
`kubernetes/`	Go	Scheduler extender, device plugin, custom metrics server
`dashboard/`	TypeScript/React	Real-time topology, memory heatmap, latency, energy dashboards

Quick Start

Prerequisites

Rust 1.75+ with nightly toolchain (for io_uring)
Go 1.21+
Node.js 20+ and pnpm
Python 3.11+ with Poetry
NVIDIA drivers with NVML support
Linux kernel 5.19+ (for io_uring features)
Kubernetes 1.28+ (for cluster deployment)

Build

# Build all Rust components
make build-rust

# Build Go components
make build-go

# Build dashboard
make build-dashboard

# Install Python autoscaler
make install-autoscaler

# Build everything
make all

Run Locally (Development)

# Start the scheduler with mock GPU topology
CORTEX_MOCK_TOPOLOGY=1 cargo run --bin cortex-scheduler

# Start the dashboard
cd dashboard && pnpm dev

# Start the autoscaler
cd autoscaler && poetry run cortex-autoscaler --dev

Deploy to Kubernetes

# Apply CRDs and RBAC
kubectl apply -f deploy/k8s/

# Deploy CORTEX components
make deploy

# Verify
kubectl get pods -n cortex-system

Benchmarks

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench --bench scheduler_bench
cargo bench --bench memory_bench

Configuration

CORTEX is configured via environment variables and a TOML config file:

[scheduler]
topology_refresh_interval_secs = 30
placement_strategy = "bin-pack"        # bin-pack | spread | affinity-first
nvlink_weight = 0.4
numa_weight = 0.3
pcie_weight = 0.2
memory_weight = 0.1

[network]
transport = "io-uring"                 # io-uring | dpdk | tcp
zero_copy = true
ring_size = 256
tensor_chunk_size = "4MB"

[memory]
min_block_order = 12                   # 4 KB minimum
max_block_order = 34                   # 16 GB maximum
defrag_threshold = 0.3
compaction_interval_secs = 60

[energy]
power_cap_watts = 300
carbon_region = "us-west-2"
frequency_scaling = true

[autoscaler]
forecast_horizon_minutes = 30
lstm_sequence_length = 60
prophet_changepoint_prior = 0.05
ensemble_weight_lstm = 0.6
ensemble_weight_prophet = 0.4

Contributing

We welcome contributions of all kinds. Please read our Contributing Guide to get started.

Before contributing, please also review our Code of Conduct.

License

MIT License -- See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
autoscaler		autoscaler
benchmarks		benchmarks
dashboard		dashboard
deploy		deploy
energy		energy
kubernetes		kubernetes
memory		memory
network		network
scheduler		scheduler
.gitignore		.gitignore
BUILD_PROMPT.md		BUILD_PROMPT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CORTEX

Table of Contents

Features

Architecture Overview

GPU Topology Awareness

Memory Management: Buddy Allocator

Kernel Bypass Networking

Components

Quick Start

Prerequisites

Build

Run Locally (Development)

Deploy to Kubernetes

Benchmarks

Configuration

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CORTEX

Table of Contents

Features

Architecture Overview

GPU Topology Awareness

Memory Management: Buddy Allocator

Kernel Bypass Networking

Components

Quick Start

Prerequisites

Build

Run Locally (Development)

Deploy to Kubernetes

Benchmarks

Configuration

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages