Little Scripts

About

A monorepo containing various utility scripts, tools, and applications for development, automation, and AI-powered tasks. Like what you see, or find it useful? Drop a star!!!

📁 Projects

🚀 ColQwen2.5 FastAPI Service

A FastAPI-based service for generating embeddings from images and text queries using the ColQwen2.5 model.

Key Features:

🖼️ Generate embeddings for images and text
⚡ High-performance inference with Flash Attention 2
🏗️ RESTful API for easy integration
📊 Interactive API documentation
🏥 Built-in health monitoring

Quick Start:

cd colqwen_fastapi
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7000 --reload

📖 View Full Documentation

🔍 ColModernVBert FastAPI Service

A high-performance FastAPI service for generating embeddings from images and text queries using the ColModernVBert model from the ColPali engine.

Key Features:

🖼️ Generate embeddings for images with automatic boundary detection
📝 Create embeddings for text queries optimized for retrieval
🧩 Calculate patch dimensions for image sizes
🔬 Generate interpretability maps showing query-document token correspondence
⚡ GPU acceleration with Flash Attention 2 support
🔄 Built-in service restart endpoint for container orchestration
🏥 Comprehensive health monitoring and model information

Quick Start:

cd colmodernvbert_fastapi
pip install -r requirements.txt
python main.py

Docker:

cd colmodernvbert_fastapi
docker compose up --build

📖 View Full Documentation

🗄️ DuckDB Analytics FastAPI Service

A lightweight FastAPI service providing columnar analytics storage for OCR data using DuckDB. Designed to work seamlessly with document processing pipelines.

What it does:

📄 Document Management: Store and retrieve document metadata with versioning
📊 Page-Level Storage: Efficient storage of OCR results with full text and markdown
🎯 Region Tracking: Store structured bounding boxes and content for document regions
🔍 Full-Text Search: Fast text search across all indexed documents
💾 Custom Queries: Execute read-only SQL queries for analytics
🏗️ Maintenance API: Initialize, clear, or reset database storage

Key technical features:

Columnar storage with DuckDB for analytical queries
Automatic data compression (3-5x typical ratio)
Read-only query API with security filters
Integration-ready for OCR pipelines
Request tracking and performance monitoring

Quick Start:

cd duckdb_fastapi
cp .env.example .env
pip install -r requirements.txt
python main.py

Docker:

cd duckdb_fastapi
docker compose up -d

📖 View Full Documentation

🎵 ColQwen2.5-Omni Audio RAG System

An Audio Retrieval-Augmented Generation (RAG) app that combines ColQwen2.5-Omni multimodal model with OpenAI's GPT-4 audio capabilities for intelligent audio content analysis.

What it does:

🎵 Audio Processing: Process video URLs and extract audio from Video content automatically
🧠 Advanced Audio Understanding: Uses ColQwen2.5-Omni model for creating semantic audio embeddings
💬 Intelligent Q&A: Ask questions about audio content and get contextual answers
🔊 Audio Responses: Receive answers in both text and audio format using OpenAI's audio API
📊 Chunk-based Processing: Configurable audio chunking for optimal processing and retrieval
🌐 Beautiful Web Interface: Intuitive Gradio-based UI with multiple tabs for different functions

Key technical features:

ColQwen2.5-Omni model for audio embedding generation
OpenAI GPT-4 audio API for natural language responses
GPU acceleration with Flash Attention 2 support
Batch processing for efficient large audio handling
Real-time audio processing pipeline

Usage: Run python main.py and follow the intuitive web interface to process videos and ask questions!

📖 View Full Documentation

🤖 ColPali(ColNomic) + Qdrant + MinIO Retrieval System

A powerful multimodal document retrieval system that combines ColPali embeddings with vector search for intelligent document analysis.

What it does:

🔍 Conversational Search: Just ask questions in natural language - no commands needed
💬 AI-Powered Responses: Get intelligent, contextual answers about your documents
📄 PDF & Image Support: Process complex visual documents with charts, diagrams, and mixed content
⚡ Optimized Performance: 13x faster search with binary quantization and reranking optimization
🤖 Streamlined Interface: Simple conversational CLI that starts ready to use

Key technical features:

Binary quantization for 90%+ storage reduction
Mean pooling reranking optimization (enabled by default)
Background image processing pipeline
Docker deployment with Qdrant + MinIO
Graceful handling of optional services (OpenAI, MinIO)

Usage: Simply run python main.py and start asking questions about your documents!

📖 View Full Documentation

🖼️ EOMT Panoptic Segmentation App

An interactive web application for panoptic segmentation using the EOMT (Encoder-only Mask Transformer) model - a minimalist Vision Transformer approach for image segmentation.

What it does:

🖥️ Interactive web interface for image segmentation
🎨 Multiple visualization types (masks, overlays, contours, analytics)
⚡ Real-time processing with detailed segment statistics
🧪 Built-in test images for experimentation

Key highlights: Up to 4× faster than complex methods, Gradio interface, comprehensive analytics

📖 View Full Documentation

📊 ViDoRe Benchmark Runner

A little script to run the ViDoRe (v1 and v2) benchmarks from MTEB using the official ViDoRe model wrapper.

Quick Start:

cd vidore_benchmark
pip install -r requirements.txt \
  && pip install --upgrade --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu129
python app.py

📖 View Full Documentation

📝 DeepSeek OCR Service

A GPU-friendly FastAPI wrapper around deepseek-ai/DeepSeek-OCR that turns images and PDFs into cleaned text, markdown with inline figures, bounding boxes, and annotated previews.

What it does:

🖼️ Accepts JPEG/PNG/WebP images or PDFs (multi-page) up to the configured upload limit.
🎛️ Offers five processing modes (Gundam through Large) and task-specific prompts for markdown, plain OCR, descriptions, locate queries, or custom instructions.
🧱 Returns markdown, raw output, structured bounding boxes, cropped figures (base64), and annotated overview images in a single JSON payload.
📡 Exposes /health, /info, and /api/ocr endpoints with automatic CORS handling and detailed startup logs.

Quick Start:

cd deepseek-ocr
cp .env.example .env
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python main.py

Docker (GPU):

cd deepseek-ocr
docker compose up --build

📖 View Full Documentation

📝 PaddleOCR-VL Service

A GPU-aware FastAPI service based on PaddleOCR-VL for extracting rich document structure from images and PDFs.

What it does:

📄 Accepts images (JPEG/PNG/BMP/TIFF) and PDFs up to 50 MB
🧠 Returns structured blocks with bounding boxes, aggregated text, and Markdown
⚙️ Lazy-loads PaddleOCR-VL for quick startup and GPU-friendly concurrency
🩺 Exposes health status along with interactive API docs

Key highlights: Docker + Compose manifests with CUDA 13.0 base image, injectable pipeline factory for testing, detailed logging.

📖 View Full Documentation

🎨 Z-Image-Turbo

A high-performance image generation application powered by the Z-Image-Turbo model with a Gradio interface for text-to-image synthesis. Based on the official Gradio demo, refactored with additional features.

What it does:

🖼️ Text-to-Image Generation: Generate high-quality images from text prompts using the Z-Image-Turbo diffusion transformer model
📐 Multiple Resolutions: Support for 20+ resolution options across 1024px and 1280px bases with various aspect ratios
✨ Prompt Enhancement: AI-powered prompt expansion via Qwen API for better image generation
⚡ Performance Optimization: Automatic Flash Attention 2/3 detection and optional PyTorch compilation
🌐 Multi-Language UI: Interface available in English, Chinese, Korean, Spanish, Japanese, French, German, and Portuguese

Quick Start:

cd z-image-turbo
cp .env.example .env
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
python app.py

Key highlights: Gradio-based web UI, configurable attention backends, model warmup for production, 12GB+ VRAM recommended.

📖 View Full Documentation

🔧 Future Projects

More utility scripts and tools will be added to this monorepo over time. Each project will have its own directory with dedicated documentation.

🚀 Quick Start

Prerequisites

Python 3.10+
Docker & Docker Compose (for projects requiring infrastructure)

Getting Started

Clone the repository:

git clone https://github.com/athrael.soju/little-scripts.git
cd little-scripts

Navigate to a specific project:
```
cd colnomic_qdrant_rag
```
Follow the project-specific README for detailed setup instructions.

📖 Project Structure

little-scripts/
├── colqwen_fastapi/               # ColQwen2.5 FastAPI embedding service
├── colmodernvbert_fastapi/        # ColModernVBert FastAPI embedding service
├── duckdb_fastapi/                # DuckDB analytics service for OCR data
├── colqwen_omni/                  # Audio RAG system with ColQwen2.5-Omni
├── colnomic_qdrant_rag/           # Multimodal document retrieval system
├── eomt_panoptic_seg/             # Image segmentation web app
├── deepseek-ocr/                  # FastAPI wrapper for DeepSeek-OCR
├── paddleocr_vl/                  # PaddleOCR-VL FastAPI service
├── vidore_benchmark/              # ViDoRe benchmark runner
├── z-image-turbo/                 # Z-Image-Turbo text-to-image generation
└── [future-projects]/             # Additional projects will be added here

🤝 Contributing

We welcome contributions to any of the projects in this monorepo!

Development Setup

Before contributing, please set up pre-commit hooks to ensure code quality:

Install pre-commit:
```
pip install pre-commit
```
Install the hooks:
```
pre-commit install
```
Run hooks on all files (optional):
```
pre-commit run --all-files
```

The pre-commit hooks will automatically run on each commit to check for:

Code formatting and style
Import sorting
Trailing whitespace and other common issues
Project-specific linting rules

Adding a New Project

Create a new directory for your project
Include a comprehensive README.md with:
- Project description and features
- Installation instructions
- Usage examples
- Configuration details
Add your project to the main README's project list
Follow the existing code style and documentation patterns

Contributing to Existing Projects

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

🛠️ Troubleshooting

Pre-commit Hook Issues

Problem: "pre-commit not found" when committing from Cursor/VS Code

Symptoms:

Commits work fine from terminal but fail from IDE with error: pre-commit not found. Did you forget to activate your virtualenv?
Pre-commit hooks are installed and work in terminal

Cause: The pre-commit hook is trying to use a Python executable path that's not accessible from the IDE's environment (e.g., WSL paths when running on Windows).

Solution: Modify the .git/hooks/pre-commit file to use the system Python instead of a specific virtualenv path:

Open the pre-commit hook file:
```
# Edit .git/hooks/pre-commit
```

Find the conditional block (around line 12-21) and replace:

# Change this:
if [ -x "$INSTALL_PYTHON" ]; then
    exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"

# To this:
if command -v python > /dev/null; then
    exec python -mpre_commit "${ARGS[@]}"

Test the fix:

# This should now work from both terminal and IDE
git commit -m "Test commit"

Alternative Solutions:

Option A: Commit from terminal where virtualenv is activated
Option B: Skip hooks temporarily: git commit --no-verify -m "message"
Option C: Use IDE's integrated terminal with virtualenv activated

Problem: Ruff formatting conflicts

Symptoms:

Pre-commit hooks fail with ruff formatting errors
Code appears correctly formatted but hooks still fail

Solution:

Run ruff manually to see specific issues:
```
ruff check .
ruff format .
```

Configure ruff settings in pyproject.toml (if needed):

[tool.ruff]
line-length = 88
target-version = "py310"

Run pre-commit on all files to fix batch issues:
```
pre-commit run --all-files
```

Docker and Infrastructure Issues

Problem: Docker services not starting

Symptoms:

Services fail to start with port conflicts
Docker containers exit immediately

Common Solutions:

Check if ports are already in use:

# Windows
netstat -ano | findstr :6333

# Linux/macOS
lsof -i :6333

Stop conflicting services:

docker-compose down
docker system prune -f

Restart Docker daemon and try again

Problem: Permission issues with Docker volumes

Solution:

# Fix volume permissions
sudo chown -R $USER:$USER ./data

Python Environment Issues

Problem: Module not found errors

Symptoms:

ImportError or ModuleNotFoundError when running scripts
Works in one environment but not another

Solutions:

Verify you're in the correct environment:

which python
pip list | grep [package-name]

Reinstall dependencies:
```
pip install -r requirements.txt
```
Check Python path conflicts:
```
python -c "import sys; print(sys.path)"
```

Getting Help

If you encounter issues not covered here:

Check project-specific READMEs for additional troubleshooting
Search existing issues in the repository
Create a new issue with:
- Clear problem description
- Steps to reproduce
- Environment details (OS, Python version, etc.)
- Error messages and logs

📝 License

Open source - feel free to use and modify as needed.

🏷️ Repository Topics

ai-tools
analytics
audio-processing
automation
colmodernvbert
colpali
colqwen
computer-vision
diffusion-models
document-processing
document-retrieval
duckdb
embeddings
fastapi
gradio
image-generation
machine-learning
multimodal-search
ocr
openai-api
panoptic-segmentation
python
qdrant
rag-system
reranking
speech-to-text
text-to-image
transformers
utilities
vector-database

⭐ If you find this repository useful, please consider giving it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Little Scripts

About

📁 Projects

🚀 Quick Start

Prerequisites

Getting Started

📖 Project Structure

🤝 Contributing

Development Setup

Adding a New Project

Contributing to Existing Projects

🛠️ Troubleshooting

Pre-commit Hook Issues

Problem: "pre-commit not found" when committing from Cursor/VS Code

Problem: Ruff formatting conflicts

Docker and Infrastructure Issues

Problem: Docker services not starting

Problem: Permission issues with Docker volumes

Python Environment Issues

Problem: Module not found errors

Getting Help

📝 License

🏷️ Repository Topics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
colmodernvbert_fastapi		colmodernvbert_fastapi
colnomic_qdrant_rag		colnomic_qdrant_rag
colqwen_fastapi		colqwen_fastapi
colqwen_omni		colqwen_omni
deepseek-ocr		deepseek-ocr
duckdb_fastapi		duckdb_fastapi
eomt_panoptic_seg		eomt_panoptic_seg
paddleocr_vl		paddleocr_vl
vidore_benchmark		vidore_benchmark
z-image-turbo		z-image-turbo
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
little-scripts.jpg		little-scripts.jpg
little-scripts.svg		little-scripts.svg
requirements.txt		requirements.txt

License

athrael-soju/little-scripts

Folders and files

Latest commit

History

Repository files navigation

Little Scripts

About

📁 Projects

🚀 Quick Start

Prerequisites

Getting Started

📖 Project Structure

🤝 Contributing

Development Setup

Adding a New Project

Contributing to Existing Projects

🛠️ Troubleshooting

Pre-commit Hook Issues

Problem: "pre-commit not found" when committing from Cursor/VS Code

Problem: Ruff formatting conflicts

Docker and Infrastructure Issues

Problem: Docker services not starting

Problem: Permission issues with Docker volumes

Python Environment Issues

Problem: Module not found errors

Getting Help

📝 License

🏷️ Repository Topics

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages