Generate per–Network Function (per-NF) anomalous log batches for Open5GS v2.7.2 using only official log statements from the 2.7.2 codebase. This project builds a template catalog directly from the Open5GS v2.7.2 tag (via Git submodule or tarball), then composes plausible but wrong logs from clean traces.
- Authentic templates: All message strings come from official Open5GS v2.7.2 sources (kept as a Git submodule under
data/open5gs). - Per-NF generation: AMF, SMF, UPF, NRF, AUSF, UDM, PCF, BSF, SCP, NSSF.
- Strict realism: No fabricated strings — anomalies are realistic mutations of true logs.
- One-command pipeline: Build catalog + generate anomalies (both clean-driven and source-only).
- Rich metadata: Every anomaly batch is logged in JSON/CSV for downstream analysis.
- Order gap – reorder or skip mandatory steps.
- Missing step – drop key exchanges.
- Time skew – timestamps shifted or inconsistent.
- Conflict IDs – mismatched IMSI, TEID, session IDs.
- Storm – bursts of error/warning logs.
- Schema drift – key order changes, dropped/duplicated fields, separator variations.
- Truncate tail – prematurely cut log flows.
- Rate spike – unrealistic event frequency.
- Level flip – flip severity (INFO→ERROR, etc.).
- Range fuzz – fuzz numeric/enumeration values.
- pfcp_timer_edge (SMF/UPF): Association/session timeout & retry sequences.
- sbi_http_anomaly (any): 4xx/5xx SBI/HTTP sequences.
- association_flap (NRF/SMF/UPF/AMF): rapid register ↔ deregister.
- heartbeat_drop (NRF/PFCP): missing or delayed heartbeats.
- policy_churn (PCF): quick install/update/delete cycles.
- nas_security_mismatch (AMF): inconsistencies across Authentication/Security flows.
- duplicate_pdr_far (UPF): duplicate PDR/FAR/QER installs.
- zero_qer_bitrate (UPF): forces numeric params like bitrate to 0.
git clone https://github.com/UmakantKulkarni/open5gs_anomaly
cd open5gs_anomaly
# initialize Open5GS v2.7.2 submodule under data/open5gs_source_code
git submodule update --init --recursive
cd data/open5gs_source_code
git checkout v2.7.2
cd ../..Install dependencies:
python -m pip install -r requirements.txtRun everything (catalog + anomalies, both clean-driven & source-only):
python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --clean-root data/logs --out-root anomalous_logs --batches 14 --all --build-catalog --anomalous-file-rate 1.0 --per-file-anomaly-line-budget-min 5 --per-file-anomaly-line-budget-max 10 --anomaly-types-per-line-min 1 --anomaly-types-per-line-max 2 --seed 42 --interleaveThis produces:
anomalous_logs/
amf/batch_001_amf/amf_<original>.log
amf/src_batch_001_amf/amf_src_batch_001.log
smf/...
metadata.json
metadata.csv
Skip clean logs and generate anomalies purely from Open5GS templates:
python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --out-root ./anomalous_src --source-only --all --build-catalogUse --interleave for a sparse injection strategy. Clean lines are copied
verbatim and only a handful of anomalous lines are inserted at phase-aware
anchors. All message text still originates from the v2.7.2 templates.
python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --clean-root data/logs --out-root ./interleaved --interleave --interleave-profile medium
By default the generator learns phase terminology from your clean logs and
reuses it for future runs. To disable this behaviour use
--no-learn-phases. Pass --rebuild-phases to force a fresh mining pass and
overwrite existing *_learned_phases.json files.
The phase miner now consumes tokens from a rich regex tagger covering PFCP/NAS/NGAP messages, identifiers (IMSI, TEID, PDU Session ID, etc.) and HTTP metadata. These tags expand the learned phase vocabulary and enable more targeted anomaly injection. To stress-test the generator with a heavier profile after cloning the Open5GS submodule, run:
python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --clean-root data/logs --out-root ./interleaved --interleave --interleave-profile highDry run without writing files:
python3 -m src.open5gs_anomaly_gen --interleave --interleave-dry-run --clean-root data/logs
Verify an existing output folder:
python3 -m src.open5gs_anomaly_gen --verify-interleaved --out-root ./interleaved
Each run produces:
- metadata.json – config, parameters, anomaly family list, and full row records.
- metadata.csv – row-per-anomaly with fields:
anomaly_id,batch,nf,source,modes,file,notes
Inspect the learned phases and metadata to confirm anomalies were inserted:
# peek at mined phase terminology for AMF
cat amf_learned_phases.json | head
# summarise injection counts
python - <<'PY'
import json,sys
meta=json.load(open('anomalous_logs/metadata.json'))
for row in meta.get('rows',[])[:5]:
print(row['file'], row['injected_count'], row['mutated_count'])
PY- Batch directories:
batch_###_<NF>(for clean-driven) andsrc_batch_###_<NF>(for source-only). - Log files: Always prefixed with NF name, e.g.
amf_<original>.log,smf_src_batch_001.log. - Metadata files: One unified JSON + CSV at the root.
This ensures NF identity is preserved everywhere.
- --batches N → how many clean-driven anomaly batches per NF (default 8).
- --source-batches N → how many source-only anomaly batches per NF (default 8).
- --all → enable all anomaly families at once.
- --modes → fine-tune families manually (comma-separated).
- --seed → reproducible randomness.
- --no-learn-phases → skip phase mining and use built-in keyword hints.
- --rebuild-phases → force regeneration of learned phase models.
python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --clean-root data/logs --out-root ./anomalous_logs --batches 2 --source-batches 1 --allThis gives a smaller but representative anomaly set.
If you use this tool, we request you to please cite our paper:
Plain text (ACM-style):
Umakant Kulkarni and Sonia Fahmy. 2026. Janus: A Dual-Mask Attention Transformer for Log-based Anomaly Detection in Cellular Networks. Proc. ACM Meas. Anal. Comput. Syst. 10, 1, Article 14 (March 2026), 36 pages. https://doi.org/10.1145/3788096
BibTeX:
@article{10.1145/3788096,
author = {Kulkarni, Umakant and Fahmy, Sonia},
title = {Janus: A Dual-Mask Attention Transformer for Log-based Anomaly Detection in Cellular Networks},
year = {2026},
issue_date = {March 2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {10},
number = {1},
url = {https://doi.org/10.1145/3788096},
doi = {10.1145/3788096},
journal = {Proc. ACM Meas. Anal. Comput. Syst.},
month = mar,
articleno = {14},
numpages = {36},
keywords = {anomaly detection, cellular networks, attention, llm, masking, system logs, transformers}
}