- Sole Author: Shashwat Upadhyay
- Academic Identity (UID / Email): shashwat.upadhyay24@sakec.ac.in
- Legal Ownership & Copyright: © 2026 Shashwat Upadhyay. All rights reserved.
- No portion of this repository may be reproduced, distributed, or modified in any form or by any means without the express written permission of the sole author.
SentinelAI is a production-grade, host-network correlated intrusion detection platform designed to operate in the zero-label, deployment-first paradigm. In real-world enterprise deployments, ground-truth labels are completely unavailable at runtime. Under this constraint, SentinelAI integrates a heuristic behavioral risk engine with an unsupervised Isolation Forest model to deliver robust threat scoring, outlier detection, and defense recommendations.
Unlike standard supervised classifiers that require massive pre-labeled training flows, SentinelAI operates without labeled inputs, achieving state-of-the-art unsupervised threat capture.
┌────────────────────────┐
│ Host SSH Auth Logs │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Log Ingestion Parser │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Feature Extraction │
│ (6-Feature Dimensions)│
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Behavioral Risk Engine │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Anomaly Detector (ML) │
│ (Isolation Forest) │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Defense Action Engine │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Persistent Threat DB │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Command Center UI & │
│ Interactive Simulator│
└────────────────────────┘
SentinelAI bridges the host-log plane with the network-flow plane. To validate host behavioral metrics against benchmark network datasets, the following proxy column mappings are defined and locked:
| Host-Behavior Feature | Network Proxy (CICIDS2017 Tuesday Flow) | Scientific & Empirical Justification |
|---|---|---|
failed_attempts |
Fwd Packets/s |
High forward packet rates without payload match repeated auth failure loops. |
successful_logins |
Flow Duration (scaled) |
Successfully established SSH active shells exhibit long flow durations. |
invalid_user_attempts |
RST Flag Count |
Server-sent TCP resets indicate credential/username rejection. |
attack_span_seconds |
Flow Duration / 1e6 |
Total elapsed connection duration in seconds. |
username_diversity |
RST Flag Count / Total Fwd Packets |
Ratio of Rejected attempts to overall attempt packets. |
unique_users_targeted |
Omitted on Network Plane | Verified on Host Plane where username fields are present in logs. |
The evaluation suite in app/evaluator.py runs a 5-fold stratified cross-validation on a balanced matrix of 15,897 records (5,897 SSH-Patator attacks, 10,000 Benign flows). Checksum-verified replica: 47e750fde97aab63310eea9ae4877c1c0e399b2fc76a3855f65bb84d9a5b8bc9.
| Model Class | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|
| Supervised Random Forest (Upper-bound) | 0.874 | 0.972 | 0.920 | 0.980 |
| One-Class SVM (Unsupervised Baseline) | 0.004 | 0.001 | 0.001 | 0.147 |
| Fail2Ban Heuristic | 0.283 | 0.498 | 0.361 | 0.505 |
| Heuristic Baseline | 0.276 | 0.499 | 0.356 | 0.474 |
| SentinelAI Hybrid Engine | 0.253 | 0.565 | 0.349 | 0.356 |
Note
Within the zero-label deployment paradigm, SentinelAI's hybrid model dramatically outperforms standard One-Class SVM by 34,800% (F1: 0.349 vs 0.001).
Evaluated on auth_benchmark.log, a synthetic host authentication stream calibrated precisely to represent login sequences, usernames, and brute-force characteristics from standard Cowrie/Kippo SSH Honeypot studies.
- HIDS Plane F1-Score:
1.00(Perfect capture of credential stuffing, stealthy dicts, and crawler bots).
- 3-Feature Configuration F1-Score:
0.9206 - 5-Feature (Expanded) Configuration F1-Score:
0.9204 - Conclusion: Feature expansion preserves extreme classification accuracy while adding multi-dimensional host-level resilience.
- Heuristic Risk Engine Only F1-Score:
0.356 - Isolation Forest ML Only F1-Score:
0.001 - SentinelAI Combined Hybrid F1-Score:
0.349 - Conclusion: Combined correlation shields the system from raw unsupervised network noise.
Varying threat weights by
- Python 3.10+
- FastAPI & Streamlit
-
Clone the Repository:
git clone https://github.com/Shashwatology/SentinelAI.git cd SentinelAI -
Initialize Virtual Environment & Dependencies:
python -m venv venv .\venv\Scripts\activate # Windows source venv/bin/activate # Linux/MacOS pip install -r requirements.txt
-
Train the Production Model:
python -m app.model_trainer
This generates the pre-trained
sentinel_model.pklbinary for fast static inference. -
Run the Research & Benchmarking Suite:
python -m app.evaluator
This downloads the CICIDS2017 dataset, runs Stratified 5-Fold CV, and caches results to
app/evaluation_results.json. -
Spit Up the Servers:
- Backend Server:
python -m uvicorn app.api:app --host 127.0.0.1 --port 8000
- Streamlit Command Cockpit:
python -m streamlit run dashboard.py
- Backend Server:
The active command cockpit features a highly polished dark-mode styling:
- Cosmic Typography & Layout: Built using professional geometric fonts (
OutfitandInter) for maximum visual clarity. - Glassmorphic Cards: Glowing visual metrics displaying threat rates, active alerts, and ML anomaly tags.
- Active Heuristic Simulator: Includes real-time sliders allowing researchers to dynamically change weights and instantly view re-calculated F1-Score graphs over all 15,897 records on the fly.
- Radar Sweep Monitoring: Live pulsating sidebar scan sweeps.
For inquiries, licensing, or academic replication requests, contact the sole author:
Shashwat Upadhyay — shashwat.upadhyay24@sakec.ac.in