Skip to content

Development

Rana Faraz edited this page Jun 23, 2026 · 1 revision

Development

Prerequisites

  • Python 3.10, 3.11, or 3.12
  • Git
  • (Optional) Docker for containerised gate runs

Local setup

git clone https://github.com/ranafaraz/GuardrAIl.git
cd GuardrAIl

# Create and activate a virtual environment
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

# Install offline/dev stack (what CI uses — no downloads, no API keys)
pip install -e ".[dev]"

# Optional extras
pip install -e ".[api]"       # FastAPI middleware
pip install -e ".[presidio]"  # Microsoft Presidio PII NER
pip install -e ".[toxicity]"  # Detoxify transformer toxicity

Running tests and quality checks

# Full offline test suite (45 tests, no downloads)
pytest -q

# One file
pytest tests/test_guard.py -q

# Lint
ruff check .
ruff check . --fix     # auto-fix import ordering etc.

# Regenerate eval report
python -m guardrail.evals.harness

# CI quality gate (exits 1 on metric regression)
python -m guardrail.evals.gate

# Red-team battery
guardrail redteam --policy default
guardrail redteam --policy coppa

# Docker gate run
docker build -t guardrail . && docker run --rm guardrail

Project structure

guardrail/
  config.py              Policy + presets (FERPA/COPPA/GDPR) + env Settings
  guard.py               Guard orchestrator (check_input, check_output)
  types.py               Violation / GuardResult / Action vocabulary
  text_utils.py          Shared tokenization and lexical scoring

  input_guards/
    injection.py         Prompt-injection / jailbreak detection (rules)
    pii.py               PII redaction (regex | presidio)

  output_guards/
    pii_leak.py          Output PII-leak detection and redaction
    toxicity.py          Toxicity detection (lexical | detoxify)
    schema.py            JSON-schema validation

  middleware/
    fastapi.py           ASGI middleware + FastAPI dependency

  evals/
    data/                Labelled sets (injection, pii, toxicity, refusal)
    metrics.py           Precision / recall / F1 / accuracy computation
    redteam.py           15-case adversarial battery
    harness.py           Runs all evals, writes RESULTS.md
    gate.py              CI quality gate (metric floors)

docs/
  ARCHITECTURE.md        Full architecture and pipeline documentation
  DECISIONS.md           Design decision log
  demo.gif               Terminal capture placeholder

examples/
  fastapi_app.py         Runnable FastAPI example with middleware
tests/                   pytest suite (45 offline tests)

How to write a new guard

All guards are pure functions or simple classes returning values from guardrail/types.py. A guard has one job: detect and return a Violation | None (or (text, [Violation]) for redaction guards). It does not decide what to do — that is the Policy's job.

Example: adding a new input guard

  1. Create guardrail/input_guards/my_guard.py:
from guardrail.types import Violation, Action

def detect_my_threat(text: str, threshold: float = 0.5) -> Violation | None:
    score = _compute_score(text)   # your detection logic
    if score >= threshold:
        return Violation(
            guard="my_guard",
            category="my_category",
            severity="high",
            score=score,
            action=Action.block,
        )
    return None
  1. Wire it into Guard.check_input() in guardrail/guard.py:
from guardrail.input_guards.my_guard import detect_my_threat

def check_input(self, text: str) -> GuardResult:
    # existing injection check ...
    my_result = detect_my_threat(text, self.policy.my_threshold)
    if my_result and self.policy.my_action == "block":
        return GuardResult(blocked=True, violations=[my_result])
    # ...
  1. Add the threshold and action to Policy in guardrail/config.py:
class Policy(BaseModel):
    my_threshold: float = 0.5
    my_action: Action = Action.block
  1. Add a GUARDRAIL_MY_THRESHOLD env var to Settings and document it in the Configuration wiki page.

  2. Add labelled test cases to guardrail/evals/data/ and update the harness to call your new metric.

  3. Run pytest -q and python -m guardrail.evals.gate — both must pass.

How to add a compliance policy preset

Presets live in guardrail/config.py as a dict of Policy instances:

POLICY_PRESETS: dict[str, Policy] = {
    "default": Policy(name="default", ...),
    "ferpa": Policy(name="ferpa", ...),
    # Add your preset here:
    "hipaa": Policy(
        name="hipaa",
        detect_pii=True,
        pii_action=Action.redact,
        injection_threshold=0.4,
        detect_toxicity=False,
    ),
}

Then it is available as Guard.from_policy("hipaa").

CI

GitHub Actions runs on every push and pull request:

  1. Lintruff check .
  2. Testspytest -q on Python 3.10, 3.11, 3.12 with offline backends
  3. Eval gatepython -m guardrail.evals.gate with offline backends

All three steps must pass. The eval gate catches safety-metric regressions before merge.

Optional backend imports must stay lazy

Any import of fastapi, presidio_analyzer, or detoxify must be inside a function body (not at module level), so import guardrail works on the base install without those packages:

# DO NOT do this at module level:
# from presidio_analyzer import AnalyzerEngine   # breaks base install

def _get_presidio():
    try:
        from presidio_analyzer import AnalyzerEngine
        return AnalyzerEngine()
    except ImportError:
        return None   # fall back to regex

Windows notes

  • The venv interpreter is at .venv\Scripts\python.exe
  • Windows console is cp1252 — do not print() non-ASCII characters from scripts; write UTF-8 to files or set PYTHONIOENCODING=utf-8

Clone this wiki locally