Citadel Local is configured via .citadel-local.yaml (copy from .citadel-local.example.yaml). This guide explains all options and tuning strategies.
# Directories to skip during scan (glob patterns)
ignore:
- node_modules # JavaScript dependencies
- .git # Git metadata
- dist # Build outputs
- build # Build artifacts
- vendor # PHP dependencies
- .venv # Python virtual environment
- __pycache__ # Python bytecode
# File size limit (default: 2 MB)
# Larger files slow scans; secrets are usually in small config files
max_file_mb: 2
# Lines of context around findings in reports (default: 40)
# More context = more readable reports but larger JSON/reports
context_lines: 40
# LLM Council configuration
ollama:
# Enable/disable LLM-based analysis
enabled: true
# Ollama server endpoint
base_url: "http://127.0.0.1:11434"
# Triage model (fast, category/severity labeling)
triage_model: "llama3.2:3b"
# Deep analysis model (code reasoning, remediation)
deep_model: "qwen3-coder:30b"
# Skeptic model (false-positive reduction)
skeptic_model: "gpt-oss:20b"
# API timeout in seconds
timeout_s: 90Directories/patterns to skip during file collection. Use glob syntax:
node_modules— matches any directory namednode_modules.git— matches.gitanywhere**/*.pyc— matches.pycfiles recursivelytests/fixtures— matchestests/fixturesat root
Why? Large dependency directories and build outputs waste scan time and inflate false positives.
Default set:
ignore:
- node_modules
- .git
- dist
- build
- vendor
- .venv
- __pycache__Common additions:
ignore:
- coverage # coverage reports
- .pytest_cache # pytest cache
- .mypy_cache # mypy cache
- target # Java/Rust builds
- bin # Go builds
- venv # Python venv (alternative spelling)Skip files larger than this threshold.
Why?
- Secrets are rarely in 100MB log files
- Large files slow deterministic detectors
- Saves LLM API time
Tuning:
1— aggressive, faster (may miss large config files)2— balanced (recommended)5— permissive, slower0— no limit (not recommended)
Lines of context before/after a finding in reports and JSON.
Trade-offs:
- Lower (e.g.,
10) — compact reports, faster JSON output - Higher (e.g.,
50) — easier to understand code context 0— no context (not recommended)
Run deterministic detectors only (no LLM council).
Use case: Quick feedback loop during development, or if Ollama is unavailable.
Example:
# Run without LLM (fast)
citadel scan /path/to/repo
# Output still includes detector findings in findings.jsonOllama server endpoint.
Examples:
# Local development
base_url: "http://127.0.0.1:11434"
# Remote Ollama server
base_url: "http://ollama.internal.company.com:11434"
# Docker container
base_url: "http://host.docker.internal:11434"
# Kubernetes service
base_url: "http://ollama-service.default.svc.cluster.local:11434"Fast model for classifying findings (severity, category, confidence).
Recommended:
| Model | Params | Speed | Memory | Best for |
|---|---|---|---|---|
llama3.2:3b |
3B | ~1-2s | ~3GB | Fast triage (default) |
phi3:3.8b |
3.8B | ~1-2s | ~4GB | Fast, competitive quality |
qwen2:7b |
7B | ~3s | ~7GB | Balanced triage |
mistral:7b |
7B | ~3s | ~8GB | Good English |
Tuning: Stick with 3-7B models here. Larger models are slower without proportional benefit for classification.
Strong model for code analysis and remediation guidance.
Recommended:
| Model | Params | Speed | Memory | Best for |
|---|---|---|---|---|
qwen3-coder:30b |
30B | ~10-20s | ~28GB | Code reasoning + fixes (default) |
mistral-large:30b |
30B | ~10-20s | ~28GB | Strong all-rounder |
deepseek-coder:33b |
33B | ~12-25s | ~32GB | Very strong code model |
llama2:70b |
70B | ~30-60s | ~60GB | Top-tier quality, slower |
Tuning: Balance model strength with latency. For CI/CD, prefer faster models (20-30B). For deep analysis, use 33B+.
Model for reducing false positives. Should be thoughtful and critical.
Recommended:
| Model | Params | Speed | Memory | Best for |
|---|---|---|---|---|
gpt-oss:20b |
20B | ~8-15s | ~20GB | FP reduction (default) |
mistral:20b |
20B | ~8-15s | ~20GB | Strong reasoning |
neural-chat:7b |
7B | ~2-5s | ~7GB | Fast, lighter |
qwen2:7b |
7B | ~3s | ~7GB | Good skeptic reasoning |
Tuning: The skeptic's job is to argue against findings. Smaller models (7B) work fine; focus on thoughtful prompts over model size.
API request timeout in seconds.
Tuning:
30— strict, may timeout on slow machines60— balanced90— lenient (default)180— very lenient for large models or slow GPUs
# Run detectors only
ollama:
enabled: false
# Skip large files
max_file_mb: 1
# Reduce context
context_lines: 20
# Use smaller triage model
triage_model: "phi3:3.8b"
deep_model: "qwen2:7b" # smaller
skeptic_model: "neural-chat:7b" # smallerExpected speedup: 2-3x faster, minimal quality loss for high-confidence findings.
# Keep LLM council enabled
ollama:
enabled: true
# Larger file limit
max_file_mb: 5
# More context
context_lines: 50
# Stronger models
triage_model: "qwen2:7b"
deep_model: "deepseek-coder:33b" # larger, stronger
skeptic_model: "mistral:20b" # larger, more criticalExpected trade-off: 2-3x slower, higher quality remediation guidance.
# Use smallest working models
triage_model: "phi3:3.8b" # ~4GB
deep_model: "mistral:7b" # ~8GB
skeptic_model: "phi3:3.8b" # ~4GB
# Reduce other overhead
ignore:
- "*" # add aggressively as needed
max_file_mb: 1
context_lines: 10Note: This setup uses ~16GB peak memory but sacrifices accuracy. Test on your codebase first.
Choose based on your priorities:
-
Speed matters (CI/CD pipelines)
triage_model: "llama3.2:3b" deep_model: "qwen2:7b" skeptic_model: "phi3:3.8b"
→ ~15-20s total per finding
-
Quality matters (security audits)
triage_model: "qwen2:7b" deep_model: "deepseek-coder:33b" skeptic_model: "mistral:20b"
→ ~40-60s total per finding, better fixes
-
Balanced (recommended for most teams)
triage_model: "llama3.2:3b" deep_model: "qwen3-coder:30b" skeptic_model: "gpt-oss:20b"
→ ~20-30s per finding, good quality/speed
Override config with env vars:
CITADEL_OLLAMA_BASE_URL=http://ollama.internal:11434 \
CITADEL_OLLAMA_TRIAGE_MODEL=mistral:7b \
citadel scan /path/to/repoStore .citadel-local.yaml in the repo root to customize for that project:
# .citadel-local.yaml (in the repo being scanned)
ignore:
- node_modules
- tests/fixtures
- docs/examples # ignore example code
max_file_mb: 5 # this repo has large config filesDry run to see what files will be scanned:
citadel scan /path/to/repo --dry-run
# Output: list of files that will be scannedRun with verbose output:
citadel scan /path/to/repo --verbose
# Output: detector names, LLM calls, timing infoignore:
- node_modules
- .git
- dist
- build
- .env # already excluded by detectors
- venv
max_file_mb: 2
context_lines: 40
ollama:
enabled: true
triage_model: "llama3.2:3b"
deep_model: "qwen3-coder:30b"
skeptic_model: "gpt-oss:20b"ignore:
- node_modules
- .git
- dist
- build
- vendor
- venv
- .mypy_cache
- __pycache__
- .pytest_cache
- coverage
max_file_mb: 3
context_lines: 50
ollama:
enabled: true
triage_model: "qwen2:7b" # faster classification
deep_model: "deepseek-coder:33b" # strong analysis
skeptic_model: "mistral:20b" # thorough FP reduction
timeout_s: 120ignore:
- node_modules
- .git
- dist
- build
- vendor
- venv
- tests
max_file_mb: 1
context_lines: 20
ollama:
enabled: true
triage_model: "phi3:3.8b" # smallest, fast
deep_model: "qwen2:7b" # smaller, reasonable
skeptic_model: "neural-chat:7b" # small, critical
timeout_s: 60"Model not found" error
# List available models
curl http://127.0.0.1:11434/api/tags
# Pull a model
ollama pull llama3.2:3bSlow scans
- Reduce
max_file_mb - Use smaller models (3-7B instead of 30B+)
- Disable
ollama.enabled: falsefor quick feedback - Add more directories to
ignore
Out of memory
- Use smaller models (3-7B)
- Reduce context_lines
- Check GPU availability (
ollama status) - Run Ollama on a separate machine
Timeouts
- Increase
timeout_s(more lenient) - Use smaller models (faster)
- Check network connectivity to Ollama