Skip to content

slaiba123/TakeoffPK

Repository files navigation

TakeoffPK β€” AI-Powered Student Visa Guide for Pakistanis πŸ‡΅πŸ‡°

Helping Pakistani students navigate the complex world of international student visas using Retrieval-Augmented Generation (RAG) and Large Language Models.

Live Demo: http://13.235.238.227:8080


Overview

Thousands of Pakistani students struggle every year to find accurate, up-to-date visa information for studying abroad β€” most rely on outdated blogs or expensive consultants. TakeoffPK is an end-to-end RAG chatbot grounded in official government documents, achieving 100% accuracy on a custom 29-question evaluation suite across 6 countries.


Countries Covered

Country Visa Types
πŸ‡ΊπŸ‡Έ USA F-1 Student Visa (UG, Masters, PhD)
πŸ‡¬πŸ‡§ UK Student Visa β€” CAS, Tier 4 (UG, PG, PhD)
πŸ‡¨πŸ‡¦ Canada Study Permit β€” PAL (UG, Masters, PhD)
πŸ‡©πŸ‡ͺ Germany Student Visa, PhD Visa, EU Blue Card, DAAD
πŸ‡¦πŸ‡Ί Australia Student Visa Subclass 500 (UG, PG, PhD)
πŸ‡ΉπŸ‡· Turkey Student Visa, TΓΌrkiye Burslari Scholarship

Architecture

%%{init: {'theme': 'dark', 'flowchart': {'defaultRenderer': 'elk', 'curve': 'basis'}}}%%
flowchart LR
  subgraph SOURCES["πŸ“‚ Knowledge Base β€” 20 Official PDFs"]
    direction TB
    USA["πŸ‡ΊπŸ‡Έ <b>USA</b><br/>F-1 Visa Β· SEVIS Β· Embassy"]
    UK["πŸ‡¬πŸ‡§ <b>UK</b><br/>Student Visa Β· CAS Β· UKVI"]
    CA["πŸ‡¨πŸ‡¦ <b>Canada</b><br/>Study Permit Β· PAL Β· IRCC"]
    DE["πŸ‡©πŸ‡ͺ <b>Germany</b><br/>Student Visa Β· DAAD Β· PhD"]
    AU["πŸ‡¦πŸ‡Ί <b>Australia</b><br/>Subclass 500 Β· OSHC Β· GTE"]
    TR["πŸ‡ΉπŸ‡· <b>Turkey</b><br/>TΓΌrkiye Burslari Β· Student Visa"]
  end
  subgraph PIPELINE["βš™οΈ RAG Pipeline"]
    direction LR
    INGEST["πŸ“₯ <b>Document Ingestion</b><br/>Load PDFs Β· Split into chunks"]
    EMBED["πŸ€— <b>Embedding</b><br/>Convert text β†’ 384d vectors<br/>via HuggingFace Inference API"]
    SEARCH["🌲 <b>Semantic Search</b><br/>Find top-5 relevant chunks<br/>via Pinecone Vector DB"]
    GENERATE["⚑ <b>Answer Generation</b><br/>Grounds answer in context<br/>via Groq LLaMA 3.3 70b"]
    RESPOND["🐍 <b>Response Delivery</b><br/>Adds disclaimer · Returns<br/>formatted answer to user"]
    INGEST --> EMBED
    EMBED --> SEARCH
    SEARCH --> GENERATE
    GENERATE --> RESPOND
  end
  subgraph CICD["πŸ”„ CI/CD β€” Automated on every push"]
    direction LR
    TEST["βœ… <b>Quality Check</b><br/>Linting Β· 15 unit tests"]
    BUILD["🐳 <b>Containerise</b><br/>Docker image built<br/>4.4 GB β†’ 300 MB optimised"]
    REGISTRY["πŸ“¦ <b>Image Registry</b><br/>Pushed to AWS ECR<br/>Versioned and stored"]
    DEPLOY["☁️ <b>Deploy</b><br/>AWS EC2 t3.micro<br/>Free tier · port 8080"]
    TEST --> BUILD --> REGISTRY --> DEPLOY
  end
  USER(["πŸ‘€ <b>Pakistani Student</b><br/>Asks visa question"])
  USA & UK & CA & DE & AU & TR --> INGEST
  RESPOND --> USER
  DEPLOY -.->|"hosts"| RESPOND
  classDef source fill:#1a2a1a,stroke:#2d5a2d,color:#7fc97f
  classDef pipeline fill:#1a1a2e,stroke:#3d3d7a,color:#9090d4
  classDef cicd fill:#2a1a1a,stroke:#6a3030,color:#d49090
  classDef user fill:#1a2a2a,stroke:#2d6a6a,color:#90d4d4
  class USA,UK,CA,DE,AU,TR source
  class INGEST,EMBED,SEARCH,GENERATE,RESPOND pipeline
  class TEST,BUILD,REGISTRY,DEPLOY cicd
  class USER user
Loading

Tech Stack

Layer Technology
LLM Groq β€” llama-3.3-70b-versatile
Embeddings HuggingFace Inference API β€” all-MiniLM-L6-v2 (384d)
Vector Database Pinecone Serverless
Backend Python, Flask
Frontend HTML, CSS, JavaScript
Containerization Docker
Registry AWS ECR
Deployment AWS EC2 t3.micro
CI/CD GitHub Actions
Testing pytest Β· custom batch evaluator Β· LangSmith
Linting flake8

Evaluation

Batch Test β€” 29 Questions across 6 Countries

A keyword-matching evaluation script (batch_test.py) tests the live app against 28 country-specific questions. Any answer containing at least one expected keyword passes.

Country Questions Passed Accuracy
πŸ‡¬πŸ‡§ UK 5 5 100%
πŸ‡¨πŸ‡¦ Canada 5 5 100%
πŸ‡©πŸ‡ͺ Germany 4 4 100%
πŸ‡¦πŸ‡Ί Australia 4 4 100%
πŸ‡ΊπŸ‡Έ USA 5 5 100%
πŸ‡ΉπŸ‡· Turkey 2 2 100%
Cross-Country 2 2 100%
Total 28 28 100%

LangSmith Evaluation β€” LLM-as-Judge

langsmith_eval.py runs a deeper evaluation using a second LLM as judge (temperature=0.0) across 3 dimensions on a 5-question ground truth dataset:

Metric What it checks
Correctness Is the answer factually accurate against a written ground truth?
Groundedness Is the answer supported by the retrieved Pinecone chunks, or hallucinated?
Relevance Does the answer actually address the question asked?

Results across 3 experiments:

Experiment Correctness Groundedness Relevance
#1 β€” baseline 0.60 0.20 1.00
#2 0.80 0.40 1.00
#3 β€” latest 0.80 0.40 1.00

Correctness and groundedness both doubled from the baseline to experiment #2 and have since stabilized, indicating consistent system behavior. Relevance has been perfect across all runs.

Note on groundedness: The score is intentionally lower than correctness because the system prompt injects verified 2025 policy facts (e.g. SDS discontinuation, AUD/CAD fund requirements) as a safety layer. The judge only evaluates against retrieved Pinecone chunks and penalizes these additions even though they are correct and deliberate.

LangSmith Results:

LangSmith Evaluation Results


Project Structure

TakeoffPK/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ helper.py          ← PDF loading, text splitting, embeddings
β”‚   └── prompt.py          ← System prompt for the LLM
β”œβ”€β”€ Data/                  ← Add PDFs here locally (not tracked by Git)
β”‚   β”œβ”€β”€ usa/
β”‚   β”œβ”€β”€ uk/
β”‚   β”œβ”€β”€ canada/
β”‚   β”œβ”€β”€ germany/
β”‚   β”œβ”€β”€ australia/
β”‚   └── turkey/
β”œβ”€β”€ templates/
β”‚   └── chat.html          ← Frontend UI
β”œβ”€β”€ tests/
β”‚   └── test_app.py        ← 15 unit tests (pytest)
β”œβ”€β”€ app.py                 ← Flask application
β”œβ”€β”€ store_index.py         ← One-time PDF ingestion into Pinecone
β”œβ”€β”€ batch_test.py          ← 29-question accuracy evaluation (run locally)
β”œβ”€β”€ langsmith_eval.py      ← LLM-as-judge evaluation via LangSmith
β”œβ”€β”€ requirements.txt       ← Dependencies
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ .dockerignore
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
└── .github/
    └── workflows/
        └── main.yaml      ← CI/CD pipeline

Getting Started

Prerequisites

Setup

# 1. Clone
git clone https://github.com/slaiba123/TakeoffPK.git
cd TakeoffPK

# 2. Create environment
conda create -n TakeoffPK python=3.10 -y
conda activate TakeoffPK
pip install -r requirements.txt

# 3. Configure environment variables
cp .env.example .env
# Fill in your API keys in .env

# 4. Add official PDFs to the correct Data/ subfolder (see PDF Sources below)

# 5. Index documents into Pinecone (run once)
python store_index.py

# 6. Run the app
python app.py
# Open: http://localhost:8080

Running Tests

# Unit tests
pytest tests/ -v

# Accuracy evaluation (requires app running on port 8080)
python batch_test.py

# LangSmith evaluation (requires LANGCHAIN_API_KEY in .env)
python langsmith_eval.py

CI/CD Pipeline

Every push to main triggers:

Push to main
     β”‚
     β–Ό
β‘  CI  β†’  flake8 linting + pytest (15 unit tests)
     β”‚
     β–Ό
β‘‘ Build  β†’  Docker image built and pushed to AWS ECR
     β”‚
     β–Ό
β‘’ Deploy  β†’  EC2 pulls latest image, restarts container

Deployment (AWS Free Tier)

The app runs on AWS EC2 t3.micro (1 vCPU, 1GB RAM) inside Docker, deployed automatically via GitHub Actions.

Estimated monthly cost: $0 β€” within AWS free tier limits (EC2 + ECR + EBS).

To deploy your own instance, you need an EC2 instance running Ubuntu 22.04 with Docker installed, an ECR repository, and the following secrets added to your GitHub repo:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_ECR_LOGIN_URI
ECR_REPOSITORY_NAME
PINECONE_API_KEY
GROQ_API_KEY
HUGGINGFACE_API_KEY

Full step-by-step setup: AWS Deployment Guide (or refer to the workflow file at .github/workflows/main.yaml)


PDF Sources

All data sourced from official government and embassy websites:

Country Source
πŸ‡ΊπŸ‡Έ USA travel.state.gov Β· pk.usembassy.gov
πŸ‡¬πŸ‡§ UK assets.publishing.service.gov.uk
πŸ‡¨πŸ‡¦ Canada ircc.canada.ca
πŸ‡©πŸ‡ͺ Germany germany.info Β· daad.de
πŸ‡¦πŸ‡Ί Australia immi.homeaffairs.gov.au
πŸ‡ΉπŸ‡· Turkey islamabad-emb.mfa.gov.tr

⚠️ Disclaimer

This tool is for informational purposes only. Visa rules change frequently β€” always verify with the official embassy or consulate before making any application decisions. This project is not affiliated with any government body or embassy.


Author

Laiba Mushtaq β€” Computer Engineering Student GitHub: @slaiba123

About

AI-powered student visa guide for Pakistanis πŸ‡΅πŸ‡° | RAG + LangChain + Groq | Deployed on AWS EC2 with CI/CD

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors