Skip to content

Configuration

Rana Faraz edited this page Jun 23, 2026 · 1 revision

Configuration

GuardrAIl is configured through a Policy object (pydantic data model), environment variables (GUARDRAIL_* prefix), and optional .env file. All settings have offline-safe defaults.

Policy presets

Load a preset by name:

from guardrail import Guard
g = Guard.from_policy("default")   # or "ferpa", "coppa", "gdpr"
Preset Injection action Input PII action Output PII action Toxicity action
default block redact redact flag
ferpa block redact redact flag
coppa block (lower threshold) block redact block
gdpr block redact redact flag

Custom policy

Override any field:

from guardrail import Guard, Policy

g = Guard(Policy(
    name="custom",
    injection_threshold=0.3,        # lower = more aggressive detection
    detect_pii=True,
    detect_toxicity=False,           # disable toxicity guard
    toxicity_action="block",        # override action
    pii_action="redact",
))

Environment variables

All env vars use the GUARDRAIL_ prefix. They override the preset's defaults.

Env var Default Description
GUARDRAIL_POLICY default Policy preset to load (default, ferpa, coppa, gdpr)
GUARDRAIL_PII_BACKEND regex PII detection backend (regex, presidio)
GUARDRAIL_TOXICITY_BACKEND lexical Toxicity detection backend (lexical, detoxify)
GUARDRAIL_INJECTION_THRESHOLD 0.5 Injection score threshold for blocking (0–1)
GUARDRAIL_TOXICITY_THRESHOLD 0.5 Toxicity score threshold (0–1)
GUARDRAIL_DETECT_PII true Enable/disable PII guard
GUARDRAIL_DETECT_TOXICITY true Enable/disable toxicity guard
GUARDRAIL_DETECT_INJECTION true Enable/disable injection guard
GUARDRAIL_PII_ACTION redact Action on PII (redact, block, flag)
GUARDRAIL_TOXICITY_ACTION flag Action on toxicity (block, flag)
GUARDRAIL_INJECTION_ACTION block Action on injection (block, flag)

.env.example

# Policy preset
GUARDRAIL_POLICY=default

# Backends (offline by default; switch for better coverage)
# GUARDRAIL_PII_BACKEND=presidio       # requires pip install -e ".[presidio]"
# GUARDRAIL_TOXICITY_BACKEND=detoxify  # requires pip install -e ".[toxicity]"

# Thresholds (lower = more aggressive)
# GUARDRAIL_INJECTION_THRESHOLD=0.5
# GUARDRAIL_TOXICITY_THRESHOLD=0.5

# Guard enable/disable
# GUARDRAIL_DETECT_PII=true
# GUARDRAIL_DETECT_TOXICITY=true
# GUARDRAIL_DETECT_INJECTION=true

# Actions
# GUARDRAIL_PII_ACTION=redact
# GUARDRAIL_TOXICITY_ACTION=flag
# GUARDRAIL_INJECTION_ACTION=block

Backend selection

PII backends

Backend Install Description
regex (default) included Fast regex patterns: email, phone, SSN, credit card, IP address
presidio pip install -e ".[presidio]" Microsoft Presidio NER: adds PERSON, LOCATION, ORG detection
pip install -e ".[presidio]"
# .env:
# GUARDRAIL_PII_BACKEND=presidio

Toxicity backends

Backend Install Description
lexical (default) included Word-list matching with lexical scoring, F1 = 0.94 offline
detoxify pip install -e ".[toxicity]" Detoxify transformer classifier (Unitary)
pip install -e ".[toxicity]"
# .env:
# GUARDRAIL_TOXICITY_BACKEND=detoxify

Threshold tuning

injection_threshold controls sensitivity: lower values catch more injections but raise false-positive rate. The default (0.5) is tuned for the bundled eval set.

COPPA preset uses a lower threshold (~0.34) to prioritise recall over precision — a missed injection in a children's app is a more severe failure than a false block.

To tune for your deployment:

  1. Collect representative benign and adversarial inputs
  2. Run guardrail check-input "..." on each and observe risk_score
  3. Set GUARDRAIL_INJECTION_THRESHOLD just above the highest benign risk_score
  4. Verify with guardrail redteam that catch rate remains acceptable

FastAPI middleware configuration

from fastapi import FastAPI
from guardrail import Guard
from guardrail.middleware import GuardrailMiddleware

app = FastAPI()
app.add_middleware(
    GuardrailMiddleware,
    guard=Guard.from_policy("coppa"),
    # Fields in POST body that contain the user prompt:
    input_fields=["prompt", "input", "message"],   # default
)

Blocked requests receive:

{
  "detail": "Input blocked by guardrail",
  "violations": [{"guard": "injection", "severity": "high", "score": 0.5}]
}

CLI

guardrail check-input "text to screen" [--policy PRESET]
guardrail redteam [--policy PRESET]
guardrail check-output "text to screen" [--policy PRESET]

Clone this wiki locally