AEGIS AI Trainer

Layer-by-layer model training and modification framework for 80B+ parameter MoE models on consumer GPUs.
Built because nothing else could do it.

justcalljon.pro · Features · Architecture · Install · Usage

The Problem

You want to modify an 80B parameter MoE model. Maybe you need to abliterate refusal vectors, extend context to 512K, or merge a LoRA adapter into the base weights. Cool.

Except:

AirLLM streams layers for inference, but can't modify weights or write them back. It's read-only by design. (see issue)
Heretic does abliteration, but loads the entire model into memory. 80B at BF16 = 160GB minimum. Good luck fitting that on a consumer GPU. (see issue)
Unsloth/PEFT train LoRA adapters but don't merge them layer-by-layer. You're stuck if you can't fit the full model in VRAM.

So we built one that actually works.

The Solution

AEGIS Trainer streams model layers from NVMe through RAM to VRAM one at a time, applies operations to each layer, and writes the modified weights back. Your 80B MoE model never fully loads into memory. 11GB VRAM is enough.

NVMe (safetensors) --> 120GB RAM --> 11GB VRAM --> modify --> write back
                       one layer at a time

Features

Operations

Abliteration -- Directional ablation to remove refusal vectors. Targets all 512 experts per layer, not just the dense projection. Adapted from Heretic's approach but runs layer-by-layer instead of loading 160GB into memory.
LongRoPE -- Context window extension from 256K to 512K tokens. Only touches the 12 full-attention layers (DeltaNet layers don't use RoPE).
LoRA Merge -- Merge PEFT/LoRA adapters directly into base weights. Per-layer W_merged = W_original + (B @ A) * scaling.
Expert Pruning -- Zero out or scale specific MoE experts. Useful for removing underperforming routed experts.
Weight Inspection -- Non-destructive per-tensor statistics: min, max, mean, std, sparsity. Generates layer-by-layer reports.
Quantization -- Per-layer dtype casting for compression.

Architecture

QQMS Queue -- Priority queue with age-boosted scheduling, dead letter queue, and overflow management. Resource-aware throttling prevents OOM.
Checkpoint Manager -- Atomic JSON persistence. Resume interrupted runs from the exact layer that failed.
Resource Monitor -- Real-time RAM, VRAM, and CPU tracking. Intel Arc B580 safe (uses torch.xpu and sysfs, never torch.cuda).
Layer IO -- Composition-based safetensors load/save following AirLLM naming conventions.

TUI (Terminal UI)

Full Textual-based terminal interface with 7 tabs:

Tab	Key	Purpose
Dashboard	`1`	Resource monitoring, progress, queue summary
Models	`2`	Browse and inspect available models (HF + GGUF)
Build	`3`	Step-by-step operation configuration wizard
Inspect	`4`	Per-layer weight inspector with arrow-key navigation
Queue	`5`	Queue management -- reorder, cancel, pause/resume
Logs	`6`	Scrollable, filterable log viewer
Visualizer	`7`	Live weight visualization during operations

Weight Visualization

The Visualizer tab renders live weight distributions using Unicode braille scatter plots (2x4 dot resolution per character cell) with per-tensor-type color coding:

Cyan -- Attention projections (q/k/v/o)
Yellow -- MoE expert weights
Orange -- Shared expert weights
Green -- Layer norms
Purple -- Gating network

Includes before/after overlay during modifications and a Heretic-style step-by-step progress display.

CLI

aegis-trainer run \
  --model /path/to/model \
  --output /path/to/output \
  --op abliterate \
  --op longrope:target_length=524288

aegis-trainer inspect --model /path/to/model
aegis-trainer queue --status
aegis-trainer tui

Architecture

aegis_trainer/
  __init__.py             # Public API exports
  layer_trainer.py        # Core orchestrator (878 lines)
  layer_context.py        # Frozen dataclass per-layer metadata
  cli.py                  # Click CLI (673 lines)
  ops/
    base.py               # LayerOperation ABC
    abliteration.py       # Directional ablation (all 512 experts)
    longrope.py           # RoPE extension (12/48 layers)
    lora_merge.py         # LoRA adapter merge
    expert_prune.py       # MoE expert pruning
    weight_inspect.py     # Non-destructive stats
    quantize.py           # Dtype casting
  queue/
    qqms.py               # Resource-aware execution engine
    queue_item.py         # Age-boosted priority scheduling
    dlq.py                # Dead letter queue
    overflow.py           # DROP_LOWEST / BLOCK / SPILL_DISK
  utils/
    resource_monitor.py   # RAM/VRAM/CPU (Intel Arc safe)
    layer_io.py           # Safetensors load/save
    checkpoint.py         # Atomic JSON persistence
    profiler.py           # Layer/operation timing + ETA
  tui/
    app.py                # 7-tab Textual application
    theme.py              # AEGIS color scheme
    screens/              # Dashboard, Models, Build, Inspect, Queue, Logs, Visualizer
    widgets/              # ResourceBar, ProgressPanel, LayerMap, BrailleCanvas, WeightAtlas, WeightHistogram, HereticProgress

Qwen3-Next Support

First-class support for the Qwen3-Next hybrid DeltaNet + Attention architecture:

48 layers: 36 DeltaNet (linear attention) + 12 full attention
Layer pattern: 3x DeltaNet, 1x Attention, repeated 12 times
512 MoE experts, 10 active per token
Per-layer type awareness: operations like LongRoPE automatically skip DeltaNet layers (they don't use RoPE)
Color-coded UI: DeltaNet layers in yellow, Attention layers in cyan throughout the TUI

Installation

git clone https://github.com/jonhardwick-spec/aegis-trainer.git
cd aegis-trainer
pip install -e .

Requirements

Python 3.10+
PyTorch (with Intel XPU support for Arc GPUs, or standard CUDA)
textual >= 7.0 (for TUI)
safetensors, transformers, click, rich

Usage

Launch the TUI

aegis-trainer tui

Run abliteration on an 80B model with 11GB VRAM

aegis-trainer run \
  --model /path/to/Qwen3-Next-80B \
  --output /path/to/output \
  --op abliterate:strength=1.0

Inspect layer weights

aegis-trainer inspect --model /path/to/model --format table

Programmatic usage

from aegis_trainer import LayerTrainer, ResourceLimits
from aegis_trainer.ops import AbliterationOp, LongRoPEOp

trainer = LayerTrainer(
    model_path="/path/to/model",
    operations=[
        AbliterationOp(strength=1.0),
        LongRoPEOp(target_length=524288),
    ],
    output_path="/path/to/output",
    resource_limits=ResourceLimits(
        max_ram_gb=120,
        max_vram_gb=11,
        max_cpu_cores=8,
    ),
)
result = trainer.run()
print(result.summary())

Hardware Tested

Component	Spec
CPU	AMD Ryzen Threadripper PRO 5945WX (12-core)
RAM	128GB DDR4 (120GB allocated)
GPU	Intel Arc B580 12GB (Battlemage)
Storage	2x 477GB NVMe + 466GB SSD
OS	Ubuntu 25.10, Kernel 6.17

The entire 80B MoE model streams through this setup at ~2 layers/min during abliteration. MoE experts sit in RAM, attention kernels run on VRAM.

Contributing

Pull requests are welcome and encouraged. This project exists because existing tools couldn't handle 80B MoE modifications on consumer hardware, and there's a lot more ground to cover.

Things that need work:

CUDA backend testing (built on Intel Arc, needs validation on NVIDIA)
AMD ROCm support in the resource monitor
Additional operations (knowledge distillation, GPTQ, AWQ layer-level quant)
More model architectures (Llama, Mistral, DeepSeek-V3)
Safetensors index parsing for sharded models
Performance profiling and optimization on different hardware configs

How to contribute:

Fork the repo
Create a feature branch (git checkout -b feature/your-thing)
Make your changes, add tests if applicable
Open a PR with a clear description of what you changed and why

If you're working on something big, open an issue first so we don't duplicate effort.

No CLA, no bureaucracy. If your code works and makes the project better, it gets merged.

License

SSPL-1.0 (Server Side Public License)

If you offer this as a service, you open-source your entire stack. Use it however you want otherwise.

Author

Hardwick Software Services justcalljon.pro · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
aegis_trainer		aegis_trainer
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AEGIS AI Trainer

The Problem

The Solution

Features

Operations

Architecture

TUI (Terminal UI)

Weight Visualization

CLI

Architecture

Qwen3-Next Support

Installation

Requirements

Usage

Launch the TUI

Run abliteration on an 80B model with 11GB VRAM

Inspect layer weights

Programmatic usage

Hardware Tested

Contributing

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AEGIS AI Trainer

The Problem

The Solution

Features

Operations

Architecture

TUI (Terminal UI)

Weight Visualization

CLI

Architecture

Qwen3-Next Support

Installation

Requirements

Usage

Launch the TUI

Run abliteration on an 80B model with 11GB VRAM

Inspect layer weights

Programmatic usage

Hardware Tested

Contributing

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages