MiniVault API

A lightweight, fully local REST API that simulates prompt/response behavior like ModelVault or OpenAI—no cloud required. Supports local LLMs via Ollama, with a stub fallback for universal testability.

🎥 Demo Video

See a full walkthrough of setup, stubbed and Ollama-backed responses, and API testing via CLI, Swagger, and Postman:

Watch the demo video on YouTube

🚀 Features

POST /generate — Synchronous prompt/response endpoint
POST /stream — Streams output token-by-token (SSE)
Stub fallback if Ollama is not running
Logs all requests/responses to logs/log.jsonl
CLI and Postman collection for easy testing
OpenAPI docs at /docs

🗂️ Project Structure

minivault-api/
├── app.py                  # FastAPI app
├── model_handler.py        # Model/stub logic
├── log_writer.py           # Logging utility
├── cli.py                  # CLI tool (supports both /generate and /stream)
├── postman_collection.json # Postman config
├── requirements.txt        # Python dependencies
├── logs/
│   └── log.jsonl           # Log file
└── README.md               # This file

⚡ Quickstart

1. Install dependencies

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

🦙 Ollama Setup (macOS & Windows)

macOS

Download Ollama:
https://ollama.com/download
Open the .dmg, drag Ollama to Applications, and start the Ollama application (double-click in Applications folder).
Add Ollama to your PATH (if needed):
If ollama is not found in Terminal, run:
```
export PATH="/Applications/Ollama.app/Contents/MacOS:$PATH"
```
(Add this to your ~/.zshrc for persistence.)
Verify installation:
```
ollama --version
```
Download and run a model:
```
ollama run llama3
```
- The first run will download the model (may take a few minutes).
- You can interact with the model in the terminal, or just close it after the download.
Ready! Ollama’s API will be available at http://localhost:11434 for your MiniVault API.

Windows

Download Ollama:
https://ollama.com/download
Run the installer and start the Ollama application from the Start Menu.
Add Ollama to your PATH (if needed):
The installer should add Ollama to your PATH automatically. If not, add the install directory (e.g., C:\Program Files\Ollama) to your PATH manually.
Open Command Prompt or PowerShell and verify installation:
```
ollama --version
```
Download and run a model:
```
ollama run llama3
```
- The first run will download the model (may take a few minutes).
Ready! Ollama’s API will be available at http://localhost:11434 for your MiniVault API.

🧪 Testing Options

CLI (supports both full and streaming responses)

1. Make sure the API server is running

uvicorn app:app --reload

2. In a new terminal, activate your environment (if needed):

source env/bin/activate

3. Run the CLI for a full response:

python cli.py "What is the capital of France?"

4. Run the CLI for a streaming response:

python cli.py "Tell me a joke" --stream

The --stream flag will print tokens as they arrive from the /stream endpoint.
Omit --stream to use the /generate endpoint for a full response.

Example Output

$ python cli.py "Tell me a joke" --stream
[Streaming response]
Why did the chicken cross the road? To get to the other side!

📬 API Endpoints

Note: Swagger UI does not display streaming responses in real time. For real-time streaming, use the CLI or Postman.

`POST /generate`

Input: { "prompt": "..." }
Output: { "response": "..." }
Behavior: Uses Ollama if available, else stub.

`POST /stream`

Input: { "prompt": "..." }
Output: SSE stream, one token per event
Behavior: Streams tokens from Ollama or stub

Postman

Import postman_collection.json
Try /generate and /stream
Limitation: Postman does not support real-time SSE streaming. You will only see the full response after streaming is finished. For real-time streaming, use the CLI or curl:
```
curl -N -X POST http://127.0.0.1:8000/stream -H "Content-Type: application/json" -d '{"prompt": "Tell me a joke"}'
```

Swagger UI

Visit http://127.0.0.1:8000/docs
Note: Streaming responses will only appear after the stream is finished.

📜 Logging

All interactions are logged to logs/log.jsonl in JSONL format:

{
  "timestamp": "...",
  "prompt": "...",
  "response": "...",
  "streamed": true,
  "model": "ollama"
}

🔎 Design Choices & Tradeoffs

Topic	Decision	Reason
Ollama vs Hugging Face	Ollama	Fast setup, small disk, production-ready local LLMs
Stub Fallback	Yes	Ensures project runs even without Ollama
Streaming	SSE (not WebSockets)	Simpler, natively supported, less boilerplate
Split endpoints	`/generate` and `/stream`	Clarity, avoids confusion about response type
Web UI	Skipped	Out of scope; Postman + CLI + Swagger suffice
Token delay simulation	Included in `/stream` stub	Adds realism, mimics LLM latency
Logging format	JSONL	Efficient for appending, analytics, replay

✅ Submission Checklist

✅ Local REST API (stub + optional LLM)
✅ Logs all prompts/responses
✅ Postman, CLI, Swagger support
✅ No internet/cloud dependencies
✅ Bonus: streaming token-by-token

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniVault API

🎥 Demo Video

Watch the demo video on YouTube

🚀 Features

🗂️ Project Structure

⚡ Quickstart

1. Install dependencies

🦙 Ollama Setup (macOS & Windows)

macOS

Windows

🧪 Testing Options

CLI (supports both full and streaming responses)

1. Make sure the API server is running

2. In a new terminal, activate your environment (if needed):

3. Run the CLI for a full response:

4. Run the CLI for a streaming response:

Example Output

📬 API Endpoints

`POST /generate`

`POST /stream`

Postman

Swagger UI

📜 Logging

🔎 Design Choices & Tradeoffs

✅ Submission Checklist

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
logs		logs
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cli.py		cli.py
log_writer.py		log_writer.py
model_handler.py		model_handler.py
postman_collection.json		postman_collection.json
requirements.txt		requirements.txt

MaanilVerma/Model-Vault

Folders and files

Latest commit

History

Repository files navigation

MiniVault API

🎥 Demo Video

Watch the demo video on YouTube

🚀 Features

🗂️ Project Structure

⚡ Quickstart

1. Install dependencies

🦙 Ollama Setup (macOS & Windows)

macOS

Windows

🧪 Testing Options

CLI (supports both full and streaming responses)

1. Make sure the API server is running

2. In a new terminal, activate your environment (if needed):

3. Run the CLI for a full response:

4. Run the CLI for a streaming response:

Example Output

📬 API Endpoints

POST /generate

POST /stream

Postman

Swagger UI

📜 Logging

🔎 Design Choices & Tradeoffs

✅ Submission Checklist

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`POST /generate`

`POST /stream`

Packages