The project ships with an enhanced CLI that behaves like a mini SOC assistant on the command line.
It supports:
- NLPTriage ASCII banner
- Text cleaning preview
- Uncertainty-aware predictions (
uncertainfallback) - Difficulty modes (
--difficulty default|soc-medium|soc-hard) - MITRE ATT&CK technique mapping panel
- Top‑k probability table (Rich table)
- Bulk file processing (
--file incidents.txt) - Batch summary report + recommendations
- Optional JSON output for automation
- Progress bar for all operations
- Interactive SOC‑style loop
!!! tip "Install first" Make sure you’ve run:
pip install -e ".[dev]"
from the project root so the package and entry points are available.
From the repo root:
nlp-triage "EDR detected powershell.exe spawning from Outlook on LAPTOP-093 and reaching out to 185.22.11.4 over port 443."nlp-triageProvide a plain‑text file where each line is an incident description:
nlp-triage --file incidents.txtExample preview:
At the end of processing, NLPTriage prints a batch summary including:
- Per‑class distribution
- Most frequent MITRE techniques observed
- Uncertain predictions count
- Recommendations (e.g., raise threshold, retrain, add rules)
Each input is:
- Cleaned using training‑consistent normalization
- Vectorized using TF‑IDF model
- Classified with uncertainty handling and difficulty rules
- Mapped to MITRE ATT&CK via keyword + heuristic extraction
- Displayed using Rich panels for quick SOC triage
predict_with_uncertaintycompares the maximum class probability against--threshold(default 0.50).- If the max probability is below the threshold, the CLI returns
final_label = "uncertain"while still showing thebase_label. - A Rich panel labels each prediction as
low,medium, orhighcertainty (color coded red/yellow/green) so analysts can gauge trust quickly. - Use
--threshold 0.7(or higher) when you only want confident predictions in automation workflows.
You can control how strict MITRE matching and uncertainty logic behaves:
nlp-triage --difficulty soc-medium "Suspicious PowerShell execution on host."Modes:
- default — balanced mode (recommended)
- soc-medium — medium‑strict SOC mode (higher uncertainty marking)
- soc-hard — strict SOC mode (very conservative, many predictions become 'uncertain')
--json skips all Rich formatting and prints a dict with the following shape:
{
"raw_text": "...",
"cleaned": "...",
"base_label": "phishing",
"final_label": "phishing",
"max_prob": 0.83,
"threshold": 0.5,
"uncertainty_level": "medium",
"difficulty": "default",
"mitre_techniques": ["T1566.002"],
"probs_sorted": [...]
}This payload is what tests/test_cli.py asserts against, so you can rely on the keys staying stable even if the formatting evolves.
The CLI enriches each prediction with a lightweight MITRE ATT&CK® mapping based on the predicted event_type.
This mapping is illustrative, not exhaustive, and is meant to provide quick pivot points for further analysis.
| event_type | ATT&CK technique IDs |
|---|---|
phishing |
T1566 |
malware |
T1204, T1059, T1486 |
web_attack |
T1190, T1110 |
access_abuse |
T1078, T1110 |
data_exfiltration |
T1041, T1567 |
policy_violation |
T1052 |
benign_activity |
None (non-security / operational) |
uncertain |
None (requires analyst review) |
In the CLI:
- The “Top Class Probabilities” table shows a
MITRE Techniquescolumn populated from this mapping. - JSON / JSONL output includes:
probs_sorted[*].mitre_techniquesfinal_label_mitre_techniquesso downstream tools can pivot directly to ATT&CK documentation.
If pip install -e . is not available (e.g., in a quick experiment), you can invoke the CLI module directly after exporting PYTHONPATH:
PYTHONPATH=src python -m triage.cli "Short incident description here."Dependencies (joblib, rich, scikit-learn, etc.) still need to be available in the environment.
nlp-triage -h
usage: nlp-triage [-h] [--json] [--threshold THRESHOLD] [--max-classes MAX_CLASSES] [--difficulty {default,soc-medium,soc-hard}] [--input-file INPUT_FILE] [--output-file OUTPUT_FILE]
[text]
Cybersecurity Incident NLP Triage CLI
positional arguments:
text Incident description
options:
-h, --help show this help message and exit
--json Return raw JSON output instead of formatted text
--threshold THRESHOLD
Uncertainty threshold (default=0.5)
--max-classes MAX_CLASSES
Maximum number of classes to display in the probability table
--difficulty {default,soc-medium,soc-hard}
Difficulty / strictness mode for uncertainty handling. Use 'soc-hard' to mark more cases as 'uncertain'.
--input-file INPUT_FILE
Optional path to a text file for bulk mode; each non-empty line is treated as an incident description.
--output-file OUTPUT_FILE
Optional path to write JSONL predictions for bulk mode. Each line will contain one JSON object.


