Skip to content

UmakantKulkarni/open5gs_anomaly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open5GS v2.7.2 Log Anomaly Generator

Generate per–Network Function (per-NF) anomalous log batches for Open5GS v2.7.2 using only official log statements from the 2.7.2 codebase. This project builds a template catalog directly from the Open5GS v2.7.2 tag (via Git submodule or tarball), then composes plausible but wrong logs from clean traces.


Features

  • Authentic templates: All message strings come from official Open5GS v2.7.2 sources (kept as a Git submodule under data/open5gs).
  • Per-NF generation: AMF, SMF, UPF, NRF, AUSF, UDM, PCF, BSF, SCP, NSSF.
  • Strict realism: No fabricated strings — anomalies are realistic mutations of true logs.
  • One-command pipeline: Build catalog + generate anomalies (both clean-driven and source-only).
  • Rich metadata: Every anomaly batch is logged in JSON/CSV for downstream analysis.

Supported Anomaly Families

General Families

  • Order gap – reorder or skip mandatory steps.
  • Missing step – drop key exchanges.
  • Time skew – timestamps shifted or inconsistent.
  • Conflict IDs – mismatched IMSI, TEID, session IDs.
  • Storm – bursts of error/warning logs.
  • Schema drift – key order changes, dropped/duplicated fields, separator variations.
  • Truncate tail – prematurely cut log flows.
  • Rate spike – unrealistic event frequency.
  • Level flip – flip severity (INFO→ERROR, etc.).
  • Range fuzz – fuzz numeric/enumeration values.

NF-aware Advanced Families

  • pfcp_timer_edge (SMF/UPF): Association/session timeout & retry sequences.
  • sbi_http_anomaly (any): 4xx/5xx SBI/HTTP sequences.
  • association_flap (NRF/SMF/UPF/AMF): rapid register ↔ deregister.
  • heartbeat_drop (NRF/PFCP): missing or delayed heartbeats.
  • policy_churn (PCF): quick install/update/delete cycles.
  • nas_security_mismatch (AMF): inconsistencies across Authentication/Security flows.
  • duplicate_pdr_far (UPF): duplicate PDR/FAR/QER installs.
  • zero_qer_bitrate (UPF): forces numeric params like bitrate to 0.

Repository Setup

Clone with Submodule

git clone https://github.com/UmakantKulkarni/open5gs_anomaly
cd open5gs_anomaly

# initialize Open5GS v2.7.2 submodule under data/open5gs_source_code
git submodule update --init --recursive
cd data/open5gs_source_code
git checkout v2.7.2
cd ../..

Quick Start: Generate Anomalies

Install dependencies:

python -m pip install -r requirements.txt

Run everything (catalog + anomalies, both clean-driven & source-only):

python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --clean-root data/logs --out-root anomalous_logs --batches 14 --all --build-catalog --anomalous-file-rate 1.0 --per-file-anomaly-line-budget-min 5 --per-file-anomaly-line-budget-max 10 --anomaly-types-per-line-min 1 --anomaly-types-per-line-max 2 --seed 42 --interleave

This produces:

anomalous_logs/
  amf/batch_001_amf/amf_<original>.log
  amf/src_batch_001_amf/amf_src_batch_001.log
  smf/...
metadata.json
metadata.csv

Source-only Generation

Skip clean logs and generate anomalies purely from Open5GS templates:

python3 -m src.open5gs_anomaly_gen   --open5gs-src data/open5gs_source_code   --out-root ./anomalous_src   --source-only   --all   --build-catalog

Interleaved Anomaly Injection

Use --interleave for a sparse injection strategy. Clean lines are copied verbatim and only a handful of anomalous lines are inserted at phase-aware anchors. All message text still originates from the v2.7.2 templates.

python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --clean-root data/logs --out-root ./interleaved --interleave --interleave-profile medium

By default the generator learns phase terminology from your clean logs and reuses it for future runs. To disable this behaviour use --no-learn-phases. Pass --rebuild-phases to force a fresh mining pass and overwrite existing *_learned_phases.json files.

The phase miner now consumes tokens from a rich regex tagger covering PFCP/NAS/NGAP messages, identifiers (IMSI, TEID, PDU Session ID, etc.) and HTTP metadata. These tags expand the learned phase vocabulary and enable more targeted anomaly injection. To stress-test the generator with a heavier profile after cloning the Open5GS submodule, run:

python3 -m src.open5gs_anomaly_gen --open5gs-src data/open5gs_source_code --clean-root data/logs --out-root ./interleaved --interleave --interleave-profile high

Dry run without writing files:

python3 -m src.open5gs_anomaly_gen --interleave --interleave-dry-run --clean-root data/logs

Verify an existing output folder:

python3 -m src.open5gs_anomaly_gen --verify-interleaved --out-root ./interleaved

Metadata Outputs

Each run produces:

  • metadata.json – config, parameters, anomaly family list, and full row records.
  • metadata.csv – row-per-anomaly with fields: anomaly_id,batch,nf,source,modes,file,notes

Inspect the learned phases and metadata to confirm anomalies were inserted:

# peek at mined phase terminology for AMF
cat amf_learned_phases.json | head

# summarise injection counts
python - <<'PY'
import json,sys
meta=json.load(open('anomalous_logs/metadata.json'))
for row in meta.get('rows',[])[:5]:
    print(row['file'], row['injected_count'], row['mutated_count'])
PY

Folder & File Naming Convention

  • Batch directories: batch_###_<NF> (for clean-driven) and src_batch_###_<NF> (for source-only).
  • Log files: Always prefixed with NF name, e.g. amf_<original>.log, smf_src_batch_001.log.
  • Metadata files: One unified JSON + CSV at the root.

This ensures NF identity is preserved everywhere.


Customization

  • --batches N → how many clean-driven anomaly batches per NF (default 8).
  • --source-batches N → how many source-only anomaly batches per NF (default 8).
  • --all → enable all anomaly families at once.
  • --modes → fine-tune families manually (comma-separated).
  • --seed → reproducible randomness.
  • --no-learn-phases → skip phase mining and use built-in keyword hints.
  • --rebuild-phases → force regeneration of learned phase models.

Example for Quick Testing

python3 -m src.open5gs_anomaly_gen   --open5gs-src data/open5gs_source_code   --clean-root data/logs   --out-root ./anomalous_logs   --batches 2   --source-batches 1   --all

This gives a smaller but representative anomaly set.


Citation

If you use this tool, we request you to please cite our paper:

Plain text (ACM-style):
Umakant Kulkarni and Sonia Fahmy. 2026. Janus: A Dual-Mask Attention Transformer for Log-based Anomaly Detection in Cellular Networks. Proc. ACM Meas. Anal. Comput. Syst. 10, 1, Article 14 (March 2026), 36 pages. https://doi.org/10.1145/3788096

BibTeX:

@article{10.1145/3788096,
author = {Kulkarni, Umakant and Fahmy, Sonia},
title = {Janus: A Dual-Mask Attention Transformer for Log-based Anomaly Detection in Cellular Networks},
year = {2026},
issue_date = {March 2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {10},
number = {1},
url = {https://doi.org/10.1145/3788096},
doi = {10.1145/3788096},
journal = {Proc. ACM Meas. Anal. Comput. Syst.},
month = mar,
articleno = {14},
numpages = {36},
keywords = {anomaly detection, cellular networks, attention, llm, masking, system logs, transformers}
}

About

Open5GS Log Anomaly Generator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages