Talent Taiwan — Guarded RAG Demo

Production-grade reference architecture for a government regulatory chatbot. Built as a skill-test deliverable for Talent Taiwan.

Live demo: talent-taiwan-rag.streamlit.app

What this is

A working Streamlit demo comparing two architectures for answering regulatory questions about Taiwan's Employment Gold Card:

Naive setup — single system prompt, no retrieval, no guards. The Chatbase-style baseline.
Guarded RAG — input guard, retrieval over scraped Talent Taiwan content, citations, output guard.

Both run on Gemini 3 Flash. The architecture is the only variable.

Three demos

Side-by-side test — same question, both pipelines, instant comparison.
How it works — every answer with its full forensic trace (input check, retrieval scores, generation, output check, correlation ID).
Content sync — live incremental sync that re-fetches a page from goldcard.nat.gov.tw, hash-compares, re-embeds only on diff. Cost projection at production scale.

Local setup

git clone https://github.com/hmtcelik/futureward-chatbot.git
cd futureward-chatbot
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env — paste your GEMINI_API_KEY
streamlit run app.py

The repo includes pre-scraped content, a built ChromaDB index, and the manifest, so the demo runs immediately. To re-crawl from scratch:

python -m scripts.initial_crawl

Architecture

LLM — Gemini 3 Flash (gemini-3-flash-preview) via the new google-genai SDK.
Embeddings — gemini-embedding-001, 3,072 dimensions, asymmetric task types (RETRIEVAL_DOCUMENT for indexing, RETRIEVAL_QUERY for search).
Vector store — ChromaDB persistent client, file-based, committed to the repo so Streamlit Cloud has a hot index on first boot.
Frontend — Streamlit 1.56 with custom editorial CSS (Fraunces serif + Inter Tight + JetBrains Mono).
Guards — two-layer LLM-judge architecture. Input guard fails closed (refuses on parse error); output guard fails open (escalates rather than blocking). Pipeline parallelizes input guard + embedding via asyncio.gather.
Observability — structlog JSON logging, correlation IDs threaded through every pipeline stage, copyable from the UI for log lookup.

Repository layout

app.py                  Streamlit entry point + navigation router
views/                  Page implementations (home, comparison, inside, sync)
src/
  config.py             Pydantic settings (env-driven)
  logger.py             structlog setup
  models.py             Pydantic data contracts
  scraper/              async crawler + change detection
  rag/                  chunking, embedding, vector store, retrieval
  guards/               input + output LLM judges + prompts
  llm/                  Gemini client + naive/guarded chatbots
  pipeline/             incremental sync logic
  ui/                   shared theme + components + runtime
scripts/
  initial_crawl.py      crawl + chunk + embed (one-shot pipeline)
  snapshot_originals.py back up extracted content for the sync demo
data/
  scraped/extracted/    cleaned per-doc JSON (committed)
  scraped/originals/    pristine snapshots (committed)
  chroma_db/            persistent vector store (committed)
  manifest.json         URL → content_hash registry
  decisions.md          design-decision log
  crawl_notes.md        scope notes for the PDF write-up
tests/                  guard + chatbot live-API smoke tests

Testing

pip install pytest pytest-asyncio
SKIP_LIVE_TESTS=1 pytest               # imports only
pytest tests/test_guards.py            # 6 live guard cases
pytest tests/test_chatbots.py          # 5 live chatbot cases (~$0.005)

Deploy

The repo is structured so a one-click deploy on Streamlit Community Cloud works:

Push to a public GitHub repo.
Connect the repo on Streamlit Cloud, point to app.py.
Paste GEMINI_API_KEY = "…" into the Cloud dashboard's Secrets editor.
Deploy.

Pre-built ChromaDB and scraped content travel with the repo, so first boot serves real answers instantly — no crawl on cold start.

Author

Hamit Çelik · Skill test submission for the AI Technical Consultant role at Talent Taiwan, April 2026.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Talent Taiwan — Guarded RAG Demo

What this is

Three demos

Local setup

Architecture

Repository layout

Testing

Deploy

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
data		data
scripts		scripts
src		src
tests		tests
views		views
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Talent Taiwan — Guarded RAG Demo

What this is

Three demos

Local setup

Architecture

Repository layout

Testing

Deploy

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages