Skip to content

wycoconut/silaju-project

Repository files navigation

SILAJU — Multi-label Email/Text Classifier

Small project that trains and serves a multi-label classifier for text (email descriptions). It includes utilities to extract emails, label data using a local or remote LLM, store/fetch data from Supabase, and serve predictions via a FastAPI server.

What this repository contains

  • src/ — main source code. Key files:
    • api_server.py — FastAPI app exposing /predict, /data/random, /data/relatedarticles, and /health endpoints.
    • DataLabeler.py — scripts to label data via local LM (LM Studio) or OpenRouter.
    • GmailEmailExtractor.py, DataExporter.py, DataImporter — helpers for data ingestion/export.
  • models/ — trained model and TF-IDF vectorizer (joblib files).
  • data/ — raw and processed datasets.
  • .config/credential.example.json — example credentials file (see below).

Workflow Diagram

Project WorkFlow
How each component is used in the project to create a multi-label classifier AI model

Videos

SILAJU - AI-Powered Content Classification and Generation System

SILAJU - AI-Powered Content Classification and Generation System

[EXPLAINER VIDEO] SILAJU - AI-Powered Content Classification and Generation System

[EXPLAINER]SILAJU - AI-Powered Content Classification and Generation System

Requirements

  • Python: >= 3.13 (see pyproject.toml)
  • Recommended: create a virtual environment and install dependencies listed in pyproject.toml using uv.

Quick install (from project root):

# create and activate a venv
python3 -m venv .venv
source .venv/bin/activate

# install dependencies with pip (reads pyproject metadata)
pip install -e .

Using uv to install dependencies

If you want to use uv to install all Python dependencies (recommended for speed and reproducibility):

# install uv (if not installed)
pip install uv

# install all dependencies listed in pyproject.toml
uv pip install -e .

Alternatively, install main packages directly with pip

pip install fastapi uvicorn pandas scikit-learn joblib openai openrouter supabase

Credentials (important)

The repository includes an example credentials file at .config/credential.example.json. The runtime code (e.g., src/api_server.py and src/DataLabeler.py) expects your runtime credentials to be placed at .config/credentials.json (note the plural).

Copy the example and fill in your keys:

cp .config/credential.example.json .config/credentials.json

Open .config/credentials.json and replace the placeholder values. The file contains two main sections: web and supabase.

Example structure (trimmed):

{
	"web": {
		"client_id": "YOUR_CLIENT_ID_HERE",
		"project_id": "YOUR_PROJECT_ID_HERE",
		"client_secret": "YOUR_CLIENT_SECRET_HERE",
		"OPENROUTER_API_KEY": "YOUR_OPENROUTER_API_KEY_HERE"
	},
	"supabase": {
		"URL": "https://YOUR_PROJECT_ID.supabase.co",
		"KEY": "YOUR_ANON_PUBLIC_KEY"
	}
}

How to obtain each key

  • Google OAuth credentials (web.client_id, web.client_secret, web.project_id):

    1. Go to Google Cloud Console: https://console.cloud.google.com/
    2. Create a new project (or select an existing one).
    3. In the left menu go to "APIs & Services" → "OAuth consent screen" and configure if prompted.
    4. Go to "Credentials" → "Create Credentials" → "OAuth client ID" → Choose "Web application".
    5. After creation you will have a Client ID and Client Secret — paste them into client_id and client_secret.
    6. The project_id is shown on your project dashboard in the Cloud Console.

    Notes: If you use any Google API (for Gmail extraction), you may need to add authorized redirect URIs to the OAuth client depending on how you run the extractor. For local scripts, a redirect URI is often not required if using service account or installed app flows — check the extractor code if unsure.

  • OpenRouter API Key (web.OPENROUTER_API_KEY):

    1. Sign up at https://openrouter.ai/ (or your chosen OpenRouter provider).
    2. Create or view your API keys from the dashboard.
    3. Place the key under web.OPENROUTER_API_KEY.

    Note: src/DataLabeler.py supports a local LM (LM Studio) when USE_LOCAL_AI = True. If you use LM Studio locally, OpenRouter key may be unused.

  • Supabase (supabase.URL, supabase.KEY):

    1. Create a project at https://app.supabase.com/ and note the project URL (format: https://<project>.supabase.co).
    2. In your project go to Settings → API → Project API keys.
    3. Copy the anon (public) or a service role key depending on what the server needs (the app uses the client to read/write data; the anon key is often sufficient for read-only requests but use a role key for write sensitive operations).

Security

  • Do NOT commit .config/credentials.json to git. Add it to .gitignore.
  • Keep secrets safe. Consider using environment variables or a secret manager in production.

Running the API server

From the project root, with your virtualenv activated and .config/credentials.json populated:

# run the FastAPI server (reload for development)
python -m uvicorn src.api_server:app --reload --host 0.0.0.0 --port 8000

Endpoints to try

  • POST /predict — body: { "description": "Some text to classify" } — returns the 5 binary labels.
  • GET /data/random — fetch a sample row from Supabase table email_data (requires Supabase configured).
  • GET /health — basic health and model timestamp.

Labeling data with LLMs

  • src/DataLabeler.py can use a local LM (LM Studio) or OpenRouter. Toggle USE_LOCAL_AI at the top of the file.
  • If using local LM Studio, set it up and point LOCAL_AI_BASE_URL accordingly (default http://localhost:1234/v1).

Useful tips

  • Model artifacts expected by api_server.py are in models/ — filenames include a timestamp. Update MODEL_TIMESTAMP in api_server.py if you add new artifacts.
  • A WordPress Plugin was developed to use with this API. Check out the project here : https://github.com/wycoconut/silaju-ai-plugin

About

Small project that trains and serves a multi-label classifier for text (emails). Includes utilities to extract emails, auto label data, Supabase upload, and serve predictions via a FastAPI server.

Topics

Resources

Stars

Watchers

Forks

Contributors