Sentinel

Signal validation and anomaly detection for enterprise log data.

Sentinel is a Python library that determines whether unstructured log data contains meaningful signals before investing resources in complex anomaly detection pipelines. It provides a modular architecture spanning ingestion, transformation, exploration, detection, visualization, and simulation.

Installation

# Base install (includes IsolationForest, explorer, transformer)
pip install .

# Development (pytest, ipykernel, nbformat)
pip install -e ".[dev]"

# Deep learning detectors (AutoencoderDetector, LNNDetector)
pip install ".[deep]"

# Visualization (plotly, SHAP, matplotlib)
pip install ".[viz]"

# Robust Random Cut Forest detector
pip install ".[rrcf]"

# Everything
pip install -e ".[all]"

Quick Start

import numpy as np
import pandas as pd
from sentinel.explorer import SignalDiagnostics, Thresholds
from sentinel.detectors import IsolationForestDetector
from sentinel.visualization import AnomalyVisualizer

# 1. Generate synthetic data
rng = np.random.RandomState(42)
n = 210
df = pd.DataFrame({
    "cpu": np.concatenate([rng.normal(50, 10, 200), rng.normal(95, 2, 10)]),
    "memory": np.concatenate([rng.normal(2048, 256, 200), rng.normal(7000, 100, 10)]),
}, index=pd.date_range("2025-01-01", periods=n, freq="15min"))

# 2. Signal diagnostics — is there enough signal to detect?
diag = SignalDiagnostics(df, columns=["cpu", "memory"])
report = diag.quality_report(thresholds=Thresholds.relaxed())
print(report)
print(report.interpret())

# 3. Detect anomalies
detector = IsolationForestDetector(contamination=0.05, random_state=42)
detector.fit(df)
df["anomaly"] = detector.predict(df[["cpu", "memory"]])
df["scores"] = -detector.decision_function(df[["cpu", "memory"]])

# 4. Visualize
viz = AnomalyVisualizer(df, score_col="scores", anomaly_col="anomaly")
viz.plot_static(title="Anomaly Detection Results")
viz.plot_score_distribution(threshold=df.loc[df["anomaly"] == 1, "scores"].mean())

Modules

Ingestion

Transforms raw log files into structured DataFrames. Use a built-in parser directly or dispatch via LogIngestor.

Parser	Log Format
`WASParser`	WebSphere Application Server
`HSMParser`	Hardware Security Module
`HDCParser`	High-Density Computing
`IBMMQParser`	IBM Message Queue
`ZTNAParser`	Cloudflare Zero Trust Network Access

from sentinel.ingestion import LogIngestor

# Quick dispatch
df = LogIngestor.ingest("path/to/logfile.log", log_type="WAS")

# Or use a parser directly
from sentinel.ingestion import WASParser
parser = WASParser("path/to/logfile.log")
df = parser.parse()

Custom parsers can be created by subclassing BaseLogParser:

from sentinel.ingestion import BaseLogParser
import pandas as pd

class MyParser(BaseLogParser):
    def parse(self):
        records = []
        with open(self.file_path) as f:
            for line in f:
                # your parsing logic here
                records.append({"timestamp": ..., "value": ...})
        return pd.DataFrame(records)

parser = MyParser("path/to/custom.log")
df = parser.parse()

Transformer

Rolling and string-based aggregation for time series feature engineering.

from sentinel.transformer import RollingAggregator, StringAggregator

# Rolling statistics over numeric columns
agg = RollingAggregator(window_size=12, aggregation_functions="mean")
rolled = agg.fit_transform(numeric_df)

# Categorical aggregation with time windows
str_agg = StringAggregator(df, timestamp_column="timestamp")
counts = str_agg.create_time_aggregation(
    time_window="1h",
    column_metrics={"status": ["count", "nunique"]},
    category_count_columns={"status": ["OK", "ERROR"]},
)

Explorer

Signal validation and data quality diagnostics. Checks whether your data has enough variance, sufficient entries, and detectable outliers before you invest compute in detection.

from sentinel.explorer import SignalDiagnostics, Thresholds, detect_drift

# Quality report with interpretable results
diag = SignalDiagnostics(df, columns=["cpu", "memory"])
report = diag.quality_report(thresholds=Thresholds.relaxed())
print(f"Score: {report.score:.0%}")
print(report.interpret())

# Distribution drift detection (Kolmogorov-Smirnov)
drift_results = detect_drift(df, column="cpu", window=200)
for r in drift_results:
    print(f"Window [{r['start_idx']}:{r['end_idx']}] — drifted: {r['drifted']}")

Detectors

Detector	Algorithm	Install group
`IsolationForestDetector`	Isolation Forest	base
`RRCFDetector`	Robust Random Cut Forest	`rrcf`
`AutoencoderDetector`	LSTM Autoencoder	`deep`
`LNNDetector`	Liquid Neural Network	`deep`
`BaseCustomDetector`	Abstract base for custom detectors	base

from sentinel.detectors import IsolationForestDetector

detector = IsolationForestDetector(contamination=0.05, random_state=42)
detector.fit(X_train)
predictions = detector.predict(X_test)       # -1 = anomaly, 1 = normal
scores = detector.decision_function(X_test)  # lower = more anomalous

Visualization

Static (matplotlib) and interactive (Plotly) anomaly plots, score distribution histograms, feature overlays, and SHAP-based model interpretability.

from sentinel.visualization import AnomalyVisualizer, SHAPVisualizer

# AnomalyVisualizer — score timeline, distribution, feature overlay
viz = AnomalyVisualizer(anomaly_df, score_col="scores", anomaly_col="anomaly")
viz.plot_static(threshold=0.5)              # matplotlib scatter with threshold line
viz.plot_dynamic(threshold=0.5)             # interactive Plotly chart
viz.plot_score_distribution(threshold=0.5)  # histogram of normal vs anomaly scores
viz.plot_features()                         # feature time series with anomaly markers

# SHAPVisualizer — model interpretability (tree-based detectors)
shap_viz = SHAPVisualizer(detector)
shap_viz.plot_summary(X)       # beeswarm: global feature importance
shap_viz.plot_bar(X)           # bar chart: mean |SHAP| per feature
shap_viz.plot_waterfall(X, 0)  # waterfall: single sample explanation
shap_viz.plot_force(X, 0)      # force plot: single sample
shap_viz.plot_dependence(X, feature="cpu")  # feature dependence scatter

Simulation

StreamingSimulation simulates real-time data streaming with live anomaly detection and visualization. Works in both Jupyter notebooks and standalone scripts.

from sentinel.simulation import StreamingSimulation

sim = StreamingSimulation(
    data=df,
    chunk_size=50,
    stream_interval=0.3,
    window_size=120,
    threshold=0.15,
    dynamic_threshold=True,
    percentile=95,
    events=events_df,  # optional incident overlay
)

# In Jupyter — inline animated chart
sim.run_notebook()

# In a terminal script — native matplotlib window
# sim.run()

Notebooks

#	Notebook	Description
01	Ingestion Quickstart	Built-in parsers (WAS, HSM), LogIngestor dispatch, custom parser with `BaseLogParser`
02	Transformer Quickstart	`RollingAggregator` and `StringAggregator` with custom metrics
03	Explorer Quickstart	`SignalDiagnostics`, `QualityReport` with `interpret()`, drift detection
04	Detectors Quickstart	`IsolationForestDetector` and `RRCFDetector` with `AnomalyVisualizer`
05	Deep Detectors Quickstart	`AutoencoderDetector` and `LNNDetector` with visualization
06	Visualization Quickstart	Full `AnomalyVisualizer` suite + all `SHAPVisualizer` methods
07	Simulation Quickstart	`StreamingSimulation` with live animated charts (static and dynamic thresholds)
08	End-to-End Pipeline	Complete pipeline: ingestion → transformation → exploration → detection → visualization → SHAP

Project Structure

sentinel/
├── src/sentinel/
│   ├── ingestion/       # Log parsers (WAS, HSM, HDC, IBMMQ, ZTNA, base)
│   ├── transformer/     # RollingAggregator, StringAggregator
│   ├── explorer/        # SignalDiagnostics, Thresholds, drift detection
│   ├── detectors/       # IsolationForest, RRCF, Autoencoder, LNN, custom base
│   ├── visualization/   # AnomalyVisualizer, SHAPVisualizer
│   └── simulation/      # StreamingSimulation, StreamingDataManager
├── tests/               # pytest test suite
├── notebooks/           # Quickstart notebooks (01–08)
├── examples/            # Standalone example scripts
├── paper/               # JOSS paper source
└── pyproject.toml       # Dependencies and build config

Paper

The JOSS paper is in paper/. A draft PDF is built automatically on each push via GitHub Actions.

Contributing

See CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.

License

Apache 2.0. See LICENSE.

Citation

@software{sentinel2026,
  title   = {Sentinel: Signal Validation and Anomaly Detection for Enterprise Log Data},
  author  = {Vergara Álvarez, José Manuel and Laverde Manotas, Nicolás and Aguilar Calle, Juan Pablo and Niño Castillo, Jeisson Vicente and Muñoz Pertuz, Julián David and Monsalve Muñoz, Daniel and Osorio Agudelo, Sebastián},
  year    = {2026},
  url     = {https://github.com/bancolombia/sentinel}
}

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github		.github
docs		docs
examples		examples
notebooks		notebooks
paper		paper
src/sentinel		src/sentinel
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
.readthedocs.yaml		.readthedocs.yaml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GETTINGSTARTED.md		GETTINGSTARTED.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentinel

Installation

Quick Start

Modules

Ingestion

Transformer

Explorer

Detectors

Visualization

Simulation

Notebooks

Project Structure

Paper

Contributing

License

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentinel

Installation

Quick Start

Modules

Ingestion

Transformer

Explorer

Detectors

Visualization

Simulation

Notebooks

Project Structure

Paper

Contributing

License

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages