TakeoffPK — AI-Powered Student Visa Guide for Pakistanis 🇵🇰

Helping Pakistani students navigate the complex world of international student visas using Retrieval-Augmented Generation (RAG) and Large Language Models.

Live Demo: http://13.235.238.227:8080

Overview

Thousands of Pakistani students struggle every year to find accurate, up-to-date visa information for studying abroad — most rely on outdated blogs or expensive consultants. TakeoffPK is an end-to-end RAG chatbot grounded in official government documents, achieving 100% accuracy on a custom 29-question evaluation suite across 6 countries.

Countries Covered

Country	Visa Types
🇺🇸 USA	F-1 Student Visa (UG, Masters, PhD)
🇬🇧 UK	Student Visa — CAS, Tier 4 (UG, PG, PhD)
🇨🇦 Canada	Study Permit — PAL (UG, Masters, PhD)
🇩🇪 Germany	Student Visa, PhD Visa, EU Blue Card, DAAD
🇦🇺 Australia	Student Visa Subclass 500 (UG, PG, PhD)
🇹🇷 Turkey	Student Visa, Türkiye Burslari Scholarship

Architecture

%%{init: {'theme': 'dark', 'flowchart': {'defaultRenderer': 'elk', 'curve': 'basis'}}}%%
flowchart LR
  subgraph SOURCES["📂 Knowledge Base — 20 Official PDFs"]
    direction TB
    USA["🇺🇸 <b>USA</b><br/>F-1 Visa · SEVIS · Embassy"]
    UK["🇬🇧 <b>UK</b><br/>Student Visa · CAS · UKVI"]
    CA["🇨🇦 <b>Canada</b><br/>Study Permit · PAL · IRCC"]
    DE["🇩🇪 <b>Germany</b><br/>Student Visa · DAAD · PhD"]
    AU["🇦🇺 <b>Australia</b><br/>Subclass 500 · OSHC · GTE"]
    TR["🇹🇷 <b>Turkey</b><br/>Türkiye Burslari · Student Visa"]
  end
  subgraph PIPELINE["⚙️ RAG Pipeline"]
    direction LR
    INGEST["📥 <b>Document Ingestion</b><br/>Load PDFs · Split into chunks"]
    EMBED["🤗 <b>Embedding</b><br/>Convert text → 384d vectors<br/>via HuggingFace Inference API"]
    SEARCH["🌲 <b>Semantic Search</b><br/>Find top-5 relevant chunks<br/>via Pinecone Vector DB"]
    GENERATE["⚡ <b>Answer Generation</b><br/>Grounds answer in context<br/>via Groq LLaMA 3.3 70b"]
    RESPOND["🐍 <b>Response Delivery</b><br/>Adds disclaimer · Returns<br/>formatted answer to user"]
    INGEST --> EMBED
    EMBED --> SEARCH
    SEARCH --> GENERATE
    GENERATE --> RESPOND
  end
  subgraph CICD["🔄 CI/CD — Automated on every push"]
    direction LR
    TEST["✅ <b>Quality Check</b><br/>Linting · 15 unit tests"]
    BUILD["🐳 <b>Containerise</b><br/>Docker image built<br/>4.4 GB → 300 MB optimised"]
    REGISTRY["📦 <b>Image Registry</b><br/>Pushed to AWS ECR<br/>Versioned and stored"]
    DEPLOY["☁️ <b>Deploy</b><br/>AWS EC2 t3.micro<br/>Free tier · port 8080"]
    TEST --> BUILD --> REGISTRY --> DEPLOY
  end
  USER(["👤 <b>Pakistani Student</b><br/>Asks visa question"])
  USA & UK & CA & DE & AU & TR --> INGEST
  RESPOND --> USER
  DEPLOY -.->|"hosts"| RESPOND
  classDef source fill:#1a2a1a,stroke:#2d5a2d,color:#7fc97f
  classDef pipeline fill:#1a1a2e,stroke:#3d3d7a,color:#9090d4
  classDef cicd fill:#2a1a1a,stroke:#6a3030,color:#d49090
  classDef user fill:#1a2a2a,stroke:#2d6a6a,color:#90d4d4
  class USA,UK,CA,DE,AU,TR source
  class INGEST,EMBED,SEARCH,GENERATE,RESPOND pipeline
  class TEST,BUILD,REGISTRY,DEPLOY cicd
  class USER user

Tech Stack

Layer	Technology
LLM	Groq — llama-3.3-70b-versatile
Embeddings	HuggingFace Inference API — all-MiniLM-L6-v2 (384d)
Vector Database	Pinecone Serverless
Backend	Python, Flask
Frontend	HTML, CSS, JavaScript
Containerization	Docker
Registry	AWS ECR
Deployment	AWS EC2 t3.micro
CI/CD	GitHub Actions
Testing	pytest · custom batch evaluator · LangSmith
Linting	flake8

Evaluation

Batch Test — 29 Questions across 6 Countries

A keyword-matching evaluation script (batch_test.py) tests the live app against 28 country-specific questions. Any answer containing at least one expected keyword passes.

Country	Questions	Passed	Accuracy
🇬🇧 UK	5	5	100%
🇨🇦 Canada	5	5	100%
🇩🇪 Germany	4	4	100%
🇦🇺 Australia	4	4	100%
🇺🇸 USA	5	5	100%
🇹🇷 Turkey	2	2	100%
Cross-Country	2	2	100%
Total	28	28	100%

LangSmith Evaluation — LLM-as-Judge

langsmith_eval.py runs a deeper evaluation using a second LLM as judge (temperature=0.0) across 3 dimensions on a 5-question ground truth dataset:

Metric	What it checks
Correctness	Is the answer factually accurate against a written ground truth?
Groundedness	Is the answer supported by the retrieved Pinecone chunks, or hallucinated?
Relevance	Does the answer actually address the question asked?

Results across 3 experiments:

Experiment	Correctness	Groundedness	Relevance
#1 — baseline	0.60	0.20	1.00
#2	0.80	0.40	1.00
#3 — latest	0.80	0.40	1.00

Correctness and groundedness both doubled from the baseline to experiment #2 and have since stabilized, indicating consistent system behavior. Relevance has been perfect across all runs.

Note on groundedness: The score is intentionally lower than correctness because the system prompt injects verified 2025 policy facts (e.g. SDS discontinuation, AUD/CAD fund requirements) as a safety layer. The judge only evaluates against retrieved Pinecone chunks and penalizes these additions even though they are correct and deliberate.

LangSmith Results:

Project Structure

TakeoffPK/
├── src/
│   ├── __init__.py
│   ├── helper.py          ← PDF loading, text splitting, embeddings
│   └── prompt.py          ← System prompt for the LLM
├── Data/                  ← Add PDFs here locally (not tracked by Git)
│   ├── usa/
│   ├── uk/
│   ├── canada/
│   ├── germany/
│   ├── australia/
│   └── turkey/
├── templates/
│   └── chat.html          ← Frontend UI
├── tests/
│   └── test_app.py        ← 15 unit tests (pytest)
├── app.py                 ← Flask application
├── store_index.py         ← One-time PDF ingestion into Pinecone
├── batch_test.py          ← 29-question accuracy evaluation (run locally)
├── langsmith_eval.py      ← LLM-as-judge evaluation via LangSmith
├── requirements.txt       ← Dependencies
├── Dockerfile
├── .dockerignore
├── .env.example
├── .gitignore
└── .github/
    └── workflows/
        └── main.yaml      ← CI/CD pipeline

Getting Started

Prerequisites

Python 3.10
Conda or virtualenv
Free API keys: Pinecone · Groq · HuggingFace

Setup

# 1. Clone
git clone https://github.com/slaiba123/TakeoffPK.git
cd TakeoffPK

# 2. Create environment
conda create -n TakeoffPK python=3.10 -y
conda activate TakeoffPK
pip install -r requirements.txt

# 3. Configure environment variables
cp .env.example .env
# Fill in your API keys in .env

# 4. Add official PDFs to the correct Data/ subfolder (see PDF Sources below)

# 5. Index documents into Pinecone (run once)
python store_index.py

# 6. Run the app
python app.py
# Open: http://localhost:8080

Running Tests

# Unit tests
pytest tests/ -v

# Accuracy evaluation (requires app running on port 8080)
python batch_test.py

# LangSmith evaluation (requires LANGCHAIN_API_KEY in .env)
python langsmith_eval.py

CI/CD Pipeline

Every push to main triggers:

Push to main
     │
     ▼
① CI  →  flake8 linting + pytest (15 unit tests)
     │
     ▼
② Build  →  Docker image built and pushed to AWS ECR
     │
     ▼
③ Deploy  →  EC2 pulls latest image, restarts container

Deployment (AWS Free Tier)

The app runs on AWS EC2 t3.micro (1 vCPU, 1GB RAM) inside Docker, deployed automatically via GitHub Actions.

Estimated monthly cost: $0 — within AWS free tier limits (EC2 + ECR + EBS).

To deploy your own instance, you need an EC2 instance running Ubuntu 22.04 with Docker installed, an ECR repository, and the following secrets added to your GitHub repo:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_ECR_LOGIN_URI
ECR_REPOSITORY_NAME
PINECONE_API_KEY
GROQ_API_KEY
HUGGINGFACE_API_KEY

Full step-by-step setup: AWS Deployment Guide (or refer to the workflow file at .github/workflows/main.yaml)

PDF Sources

All data sourced from official government and embassy websites:

Country	Source
🇺🇸 USA	travel.state.gov · pk.usembassy.gov
🇬🇧 UK	assets.publishing.service.gov.uk
🇨🇦 Canada	ircc.canada.ca
🇩🇪 Germany	germany.info · daad.de
🇦🇺 Australia	immi.homeaffairs.gov.au
🇹🇷 Turkey	islamabad-emb.mfa.gov.tr

⚠️ Disclaimer

This tool is for informational purposes only. Visa rules change frequently — always verify with the official embassy or consulate before making any application decisions. This project is not affiliated with any government body or embassy.

Author

Laiba Mushtaq — Computer Engineering Student GitHub: @slaiba123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TakeoffPK — AI-Powered Student Visa Guide for Pakistanis 🇵🇰

Overview

Countries Covered

Architecture

Tech Stack

Evaluation

Batch Test — 29 Questions across 6 Countries

LangSmith Evaluation — LLM-as-Judge

Project Structure

Getting Started

Prerequisites

Setup

Running Tests

CI/CD Pipeline

Deployment (AWS Free Tier)

PDF Sources

⚠️ Disclaimer

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
Data		Data
src		src
static		static
templates		templates
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
batch_test.py		batch_test.py
dockerignore		dockerignore
langsmith_eval.py		langsmith_eval.py
requirements.txt		requirements.txt
setup.py		setup.py
store_index.py		store_index.py

Folders and files

Latest commit

History

Repository files navigation

TakeoffPK — AI-Powered Student Visa Guide for Pakistanis 🇵🇰

Overview

Countries Covered

Architecture

Tech Stack

Evaluation

Batch Test — 29 Questions across 6 Countries

LangSmith Evaluation — LLM-as-Judge

Project Structure

Getting Started

Prerequisites

Setup

Running Tests

CI/CD Pipeline

Deployment (AWS Free Tier)

PDF Sources

⚠️ Disclaimer

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages