state-policy-rag-starter is a starter repository for a retrieval-augmented generation workflow that helps state agencies answer policy questions using approved policy text and tightly scoped case data access.
- Ingests policy PDFs into a Qdrant vector store.
- Exposes an MCP service for policy search and a strict SQL stored procedure allowlist.
- Runs a RAG service that only answers from approved context and requires citations.
- Uses Ollama for in-state model serving so policy and case data do not leave state-controlled infrastructure.
- Provides a starter governance, deployment, and security package for State IT, Legal, and Procurement teams.
- Cost target: less than
$15Kfor a starter deployment on a single state-managed VM plus implementation time. - Data stays in-state: documents, vectors, prompts, and generated answers stay on infrastructure operated by or for the agency.
- Procurement-ready framing: see Security, Deployment, Architecture, and Hardware Setup.
- Governance companion: ai-rmf-starter
- UI companion: state-policy-rag-ui
- Core project scaffolding and Docker-based service configuration for local and pilot deployments
- RAG service implementation with strict temperature control and citation enforcement
- MCP server integration supporting policy search, SQL whitelisting, and audit logging
- Dedicated ingestion pipeline for extracting, chunking, embedding, and indexing policy documents
This project uses two different model roles:
- Embedding model: used for ingestion and MCP semantic search
- Generation model: used for final answer generation in the RAG service
Current local development defaults:
- Hugging Face embedding model:
sentence-transformers/all-MiniLM-L6-v2 - Ollama generation model:
llama3:8b-instruct-q4_K_M
Why this separation matters:
- the embedding model converts policy text and user queries into vectors for semantic retrieval
- the Ollama model generates the final answer from the retrieved policy context
- the ingest embedding model and MCP search embedding model must match, or the vector collection will reject the embeddings because of dimension mismatch
For higher-quality production retrieval, the repository also supports larger embedding models such as BAAI/bge-m3, but the smaller MiniLM model provides a faster and more practical local developer experience
This starter supports:
- Docker as the primary validated local path
- Podman as an alternative local runtime path
Quick links:
Runtime switching note:
- Do not run Docker and Podman copies of this stack at the same time on one machine because they compete for the same ports
- If you switch runtimes, bring the current stack down first, then start the other one
macOS is the directly tested local path for this repository. Windows and Linux quickstart instructions below are AI-assisted guidance and should be validated in your environment before production use.
Mac
- Clone the repository and enter it.
git clone <your-fork-or-repo-url>
cd state-policy-rag-starter- Create a local environment file.
cp .env.example .env- For local development, set a Hugging Face read token in
.envif model downloads are needed.
echo 'HF_TOKEN=your_huggingface_read_token' >> .env- Start the stack.
docker-compose up --buildOptional faster bootstrap:
bash scripts/bootstrap_local.shOptional Podman bootstrap:
bash scripts/bootstrap_local_podman.shIf you switch back to Docker after using Podman:
podman-compose down
docker-compose up --build- In a second shell, install ingest dependencies if needed and ingest a first policy PDF.
python3 -m pip install -r ingest/requirements.txt
QDRANT_PORT=6333 EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 \
python3 ingest/ingest.py \
--file examples/sample_policy.pdf \
--source_name "Sample Policy" \
--section "General"- Test semantic search.
curl -X POST http://localhost:8080/search_policies \
-H "Content-Type: application/json" \
-H "user: test.user@state.gov" \
-d '{"query":"What does the policy require the State Agency to display on the website home page?"}'- Test the RAG endpoint.
curl -X POST http://localhost:8081/ask \
-H "Content-Type: application/json" \
-H "user: test.user@state.gov" \
-d '{"query":"What does policy say about termination of rights?"}'If you hit setup or runtime issues during quickstart, see the Beginner Setup Guide, especially the troubleshooting section with copy-paste recovery commands.
Local networking note:
- If
localhostbehaves inconsistently on macOS, use127.0.0.1for local health, search, and RAG checks instead.
Windows
These steps are AI-assisted guidance. Validate locally before wider rollout.
- Clone the repository and enter it in PowerShell.
git clone <your-fork-or-repo-url>
cd state-policy-rag-starter- Create a local environment file.
Copy-Item .env.example .env- Add a Hugging Face read token to
.envif model downloads are needed.
Add-Content .env 'HF_TOKEN=your_huggingface_read_token'- Start the stack.
docker-compose up --build- Create a virtual environment, install ingest dependencies, and ingest a policy PDF.
py -3 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -r ingest/requirements.txt
$env:QDRANT_PORT='6333'
$env:EMBEDDING_MODEL='sentence-transformers/all-MiniLM-L6-v2'
python ingest/ingest.py --file examples/sample_policy.pdf --source_name "Sample Policy" --section "General"- Test semantic search.
curl.exe -X POST http://localhost:8080/search_policies -H "Content-Type: application/json" -H "user: test.user@state.gov" -d "{\"query\":\"What does the policy require the State Agency to display on the website home page?\"}"- Test the RAG endpoint.
curl.exe -X POST http://localhost:8081/ask -H "Content-Type: application/json" -H "user: test.user@state.gov" -d "{\"query\":\"What does policy say about termination of rights?\"}"Linux
These steps are AI-assisted guidance. Validate locally before wider rollout.
- Clone the repository and enter it.
git clone <your-fork-or-repo-url>
cd state-policy-rag-starter- Create a local environment file.
cp .env.example .env- Add a Hugging Face read token to
.envif model downloads are needed.
echo 'HF_TOKEN=your_huggingface_read_token' >> .env- Start the stack.
docker-compose up --build- Create a virtual environment, install ingest dependencies, and ingest a policy PDF.
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r ingest/requirements.txt
QDRANT_PORT=6333 EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 \
python3 ingest/ingest.py \
--file examples/sample_policy.pdf \
--source_name "Sample Policy" \
--section "General"- Test semantic search.
curl -X POST http://localhost:8080/search_policies \
-H "Content-Type: application/json" \
-H "user: test.user@state.gov" \
-d '{"query":"What does the policy require the State Agency to display on the website home page?"}'- Test the RAG endpoint.
curl -X POST http://localhost:8081/ask \
-H "Content-Type: application/json" \
-H "user: test.user@state.gov" \
-d '{"query":"What does policy say about termination of rights?"}'flowchart LR
A["Teams or Web Client"] --> B["rag_service<br/>FastAPI"]
B --> C["mcp_server<br/>FastAPI"]
C --> D["Qdrant<br/>policies collection"]
C --> E["CaseDB<br/>allowed procedures only"]
B --> F["Ollama<br/>local model serving"]
G["Policy PDF Ingest"] --> H["sentence-transformers<br/>BAAI/bge-m3"]
H --> D
- README.md: project overview and quickstart
- GOVERNANCE.md: usage, privacy, citation, and audit requirements
- docs/ARCHITECTURE.md: runtime topology, diagrams, trust boundaries, and request flows
- docs/HARDWARESETUP.md: hardware sizing and isolated network guidance for Azure or on-prem
- docs/SECURITY.md: threat model and technical controls
- docs/DEPLOY_STATE.md: step-by-step single-VM deployment guide
- docs/AUTOMATED_INGESTION.md: ETL design for scheduled policy refresh and vector synchronization
- docs/SETUP_GUIDE.md: beginner-friendly setup and run guide for Mac, Windows, and Linux
- docs/PODMAN.md: Podman runtime setup and validation guidance
- Architecture Guide for deployment, sequence, class, and state diagrams
- Hardware Setup Guide for VM sizing, storage, and network isolation recommendations
- Security Guide for threat model and mitigations
- State Deployment Guide for pilot rollout steps
- Automated Ingestion Guide for the planned scheduled refresh pipeline
docker-composeis the validated local command path for this repo- local ingest writes to host-exposed Qdrant on port
6333 - the ingest embedding model and MCP search embedding model must match
- if you change
EMBEDDING_MODEL, delete and recreate thepoliciescollection before re-ingesting HF_TOKENhelps avoid slow or rate-limited Hugging Face downloads during first-time model setupbash scripts/bootstrap_local.shis the fastest way to warm the services, ingest a sample policy, and wait for/ready
- Advanced Semantic Search: improve retrieval precision with re-ranking models layered on top of vector search
- Automated Data Refresh: move from manual ingestion to a scheduled and repeatable pipeline
- Expanded Policy Coverage: support additional policy formats such as HTML and DOCX alongside PDF
- Enhanced UI/UX: develop a dedicated frontend for policy exploration and guided question workflows
This repository is meant to be open and practical.
- Feel free to fork it for your own agency, internal prototype, or public-sector adaptation
- Feedback is welcome from State IT, architects, legal teams, security reviewers, and builders working on responsible AI
- Issues, suggestions, and improvements are all useful, especially around governance, deployment, and in-state operating models
This starter is designed for agencies that need a practical path to policy-grounded assistance without sending protected data to external hosted LLM services and without allowing open-ended SQL access.
If this repository is useful, please consider forking it, sharing feedback, and giving it a star on GitHub.