Sentinel is a neuro-symbolic, co-evolutionary framework for safe, autonomous DDoS mitigation in carrier-grade networks. It bridges the gap between the expressive power of deep reinforcement learning and the rigorous safety requirements of network management.
By combining a Multi-head Proximal Policy Optimization (PPO) agent with a deterministic Symbolic Safety Shield, Sentinel actively repairs unsafe neural proposals before execution. The framework employs a Hall-of-Fame (HoF) co-evolutionary curriculum against an adaptive PPO-driven attacker, ensuring robust generalization against both zero-day vectors and catastrophic forgetting.
- Neuro-Symbolic Defender: A hybrid architecture where a multi-head PPO agent computes probability distributions over continuous mitigation thresholds and discrete protocol focuses, strictly bounded by a
Symbolic Safety Shield. - Adaptive PPO Attacker: An independent reinforcement learning agent trained to bypass defenses by dynamically modulating attack rates and protocol mixes.
- Hall-of-Fame (HoF) Co-Evolution: An archival curriculum sampling mechanism that preserves historical attack vectors, enforcing monotonic robustness during adversarial training.
- Rule Crystallization: Extraction of human-readable decision trees (surrogate models) from the trained neural policy, achieving >97% held-out fidelity for operator trust and low-latency fallback.
- POMDP Formulation: Fully observable only via deployable aggregate features (e.g., link utilization, TCP/UDP ratios), omitting unrealistic privileged true-labels.
├── agents/ # Core RL and Symbolic agents
│ ├── ppo.py # Multi-head PyTorch PPO Defender
│ ├── rl_attacker.py # PyTorch PPO Attacker
│ ├── shield.py # Symbolic Safety Shield constraints
│ ├── sentinel.py # Integrated Neuro-Symbolic Wrapper
│ └── baselines.py # Static, Random, and Adaptive baselines
├── simulation/ # Environment definitions
│ ├── env.py # Continuous two-player wargame environment
│ └── gym_wrapper.py # Single-agent Gymnasium interface
├── training/ # Training pipelines
│ └── train_ppo.py # PPO Co-evolutionary training routine
├── analysis/ # Post-training evaluation & plotting
│ ├── evaluate_system.py # Multi-scenario benchmarking
│ ├── safety_analysis.py # Violation tier assessment
│ ├── compare_hof.py # HoF vs. No-HoF ablation testing
│ └── visualize.py # Generation of manuscript figures
├── tests/ # Integrity verification suite
└── run_paper_pipeline.sh # End-to-end reproducibility script
Sentinel requires a Python 3.10+ virtual environment.
# Clone the repository
git clone https://github.com/your-org/sentinel.git
cd sentinel
# Create and activate a virtual environment
python3 -m venv conda_venv
source conda_venv/bin/activate
# Install requirements
pip install -r requirements.txt(Core dependencies include torch, gymnasium, pandas, numpy, scikit-learn, matplotlib, and seaborn.)
Before executing large-scale experiments, verify environment and agent integrity:
python3 -m unittest discover testsThe repository includes an end-to-end bash script designed to reproduce the core manuscript results, including the baseline HoF and No-HoF ablations. This executes the 10-seed, 2,000-generation paper-mode pipeline and runs all subsequent evaluations.
bash run_paper_pipeline.shThe train_ppo.py script supports varied scaling modes for local development and resource management:
- Smoke Test (Verification):
python3 training/train_ppo.py --mode smoke - Development (3 seeds, 100 generations):
python3 training/train_ppo.py --mode dev - Full Scale (10 seeds, 2,000 generations):
python3 training/train_ppo.py --mode paper
All outputs, models, and artifacts are saved to isolated directories under outputs/run_<timestamp>_<mode>/. Evaluation scripts within analysis/ accept these directories as arguments to extract data, generate CSVs, and render high-fidelity manuscript figures.