A monorepo containing various utility scripts, tools, and applications for development, automation, and AI-powered tasks. Like what you see, or find it useful? Drop a star!!!
π ColQwen2.5 FastAPI Service
A FastAPI-based service for generating embeddings from images and text queries using the ColQwen2.5 model.
Key Features:
- πΌοΈ Generate embeddings for images and text
- β‘ High-performance inference with Flash Attention 2
- ποΈ RESTful API for easy integration
- π Interactive API documentation
- π₯ Built-in health monitoring
Quick Start:
cd colqwen_fastapi
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7000 --reloadπ ColModernVBert FastAPI Service
A high-performance FastAPI service for generating embeddings from images and text queries using the ColModernVBert model from the ColPali engine.
Key Features:
- πΌοΈ Generate embeddings for images with automatic boundary detection
- π Create embeddings for text queries optimized for retrieval
- π§© Calculate patch dimensions for image sizes
- π¬ Generate interpretability maps showing query-document token correspondence
- β‘ GPU acceleration with Flash Attention 2 support
- π Built-in service restart endpoint for container orchestration
- π₯ Comprehensive health monitoring and model information
Quick Start:
cd colmodernvbert_fastapi
pip install -r requirements.txt
python main.pyDocker:
cd colmodernvbert_fastapi
docker compose up --buildποΈ DuckDB Analytics FastAPI Service
A lightweight FastAPI service providing columnar analytics storage for OCR data using DuckDB. Designed to work seamlessly with document processing pipelines.
What it does:
- π Document Management: Store and retrieve document metadata with versioning
- π Page-Level Storage: Efficient storage of OCR results with full text and markdown
- π― Region Tracking: Store structured bounding boxes and content for document regions
- π Full-Text Search: Fast text search across all indexed documents
- πΎ Custom Queries: Execute read-only SQL queries for analytics
- ποΈ Maintenance API: Initialize, clear, or reset database storage
Key technical features:
- Columnar storage with DuckDB for analytical queries
- Automatic data compression (3-5x typical ratio)
- Read-only query API with security filters
- Integration-ready for OCR pipelines
- Request tracking and performance monitoring
Quick Start:
cd duckdb_fastapi
cp .env.example .env
pip install -r requirements.txt
python main.pyDocker:
cd duckdb_fastapi
docker compose up -dπ΅ ColQwen2.5-Omni Audio RAG System
An Audio Retrieval-Augmented Generation (RAG) app that combines ColQwen2.5-Omni multimodal model with OpenAI's GPT-4 audio capabilities for intelligent audio content analysis.
What it does:
- π΅ Audio Processing: Process video URLs and extract audio from Video content automatically
- π§ Advanced Audio Understanding: Uses ColQwen2.5-Omni model for creating semantic audio embeddings
- π¬ Intelligent Q&A: Ask questions about audio content and get contextual answers
- π Audio Responses: Receive answers in both text and audio format using OpenAI's audio API
- π Chunk-based Processing: Configurable audio chunking for optimal processing and retrieval
- π Beautiful Web Interface: Intuitive Gradio-based UI with multiple tabs for different functions
Key technical features:
- ColQwen2.5-Omni model for audio embedding generation
- OpenAI GPT-4 audio API for natural language responses
- GPU acceleration with Flash Attention 2 support
- Batch processing for efficient large audio handling
- Real-time audio processing pipeline
Usage: Run python main.py and follow the intuitive web interface to process videos and ask questions!
π€ ColPali(ColNomic) + Qdrant + MinIO Retrieval System
A powerful multimodal document retrieval system that combines ColPali embeddings with vector search for intelligent document analysis.
What it does:
- π Conversational Search: Just ask questions in natural language - no commands needed
- π¬ AI-Powered Responses: Get intelligent, contextual answers about your documents
- π PDF & Image Support: Process complex visual documents with charts, diagrams, and mixed content
- β‘ Optimized Performance: 13x faster search with binary quantization and reranking optimization
- π€ Streamlined Interface: Simple conversational CLI that starts ready to use
Key technical features:
- Binary quantization for 90%+ storage reduction
- Mean pooling reranking optimization (enabled by default)
- Background image processing pipeline
- Docker deployment with Qdrant + MinIO
- Graceful handling of optional services (OpenAI, MinIO)
Usage: Simply run python main.py and start asking questions about your documents!
πΌοΈ EOMT Panoptic Segmentation App
An interactive web application for panoptic segmentation using the EOMT (Encoder-only Mask Transformer) model - a minimalist Vision Transformer approach for image segmentation.
What it does:
- π₯οΈ Interactive web interface for image segmentation
- π¨ Multiple visualization types (masks, overlays, contours, analytics)
- β‘ Real-time processing with detailed segment statistics
- π§ͺ Built-in test images for experimentation
Key highlights: Up to 4Γ faster than complex methods, Gradio interface, comprehensive analytics
π ViDoRe Benchmark Runner
A little script to run the ViDoRe (v1 and v2) benchmarks from MTEB using the official ViDoRe model wrapper.
Quick Start:
cd vidore_benchmark
pip install -r requirements.txt \
&& pip install --upgrade --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu129
python app.pyπ DeepSeek OCR Service
A GPU-friendly FastAPI wrapper around deepseek-ai/DeepSeek-OCR that turns images and PDFs into cleaned text, markdown with inline figures, bounding boxes, and annotated previews.
What it does:
- πΌοΈ Accepts JPEG/PNG/WebP images or PDFs (multi-page) up to the configured upload limit.
- ποΈ Offers five processing modes (Gundam through Large) and task-specific prompts for markdown, plain OCR, descriptions, locate queries, or custom instructions.
- π§± Returns markdown, raw output, structured bounding boxes, cropped figures (base64), and annotated overview images in a single JSON payload.
- π‘ Exposes
/health,/info, and/api/ocrendpoints with automatic CORS handling and detailed startup logs.
Quick Start:
cd deepseek-ocr
cp .env.example .env
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python main.pyDocker (GPU):
cd deepseek-ocr
docker compose up --buildπ PaddleOCR-VL Service
A GPU-aware FastAPI service based on PaddleOCR-VL for extracting rich document structure from images and PDFs.
What it does:
- π Accepts images (JPEG/PNG/BMP/TIFF) and PDFs up to 50 MB
- π§ Returns structured blocks with bounding boxes, aggregated text, and Markdown
- βοΈ Lazy-loads PaddleOCR-VL for quick startup and GPU-friendly concurrency
- π©Ί Exposes health status along with interactive API docs
Key highlights: Docker + Compose manifests with CUDA 13.0 base image, injectable pipeline factory for testing, detailed logging.
π¨ Z-Image-Turbo
A high-performance image generation application powered by the Z-Image-Turbo model with a Gradio interface for text-to-image synthesis. Based on the official Gradio demo, refactored with additional features.
What it does:
- πΌοΈ Text-to-Image Generation: Generate high-quality images from text prompts using the Z-Image-Turbo diffusion transformer model
- π Multiple Resolutions: Support for 20+ resolution options across 1024px and 1280px bases with various aspect ratios
- β¨ Prompt Enhancement: AI-powered prompt expansion via Qwen API for better image generation
- β‘ Performance Optimization: Automatic Flash Attention 2/3 detection and optional PyTorch compilation
- π Multi-Language UI: Interface available in English, Chinese, Korean, Spanish, Japanese, French, German, and Portuguese
Quick Start:
cd z-image-turbo
cp .env.example .env
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
python app.pyKey highlights: Gradio-based web UI, configurable attention backends, model warmup for production, 12GB+ VRAM recommended.
π§ Future Projects
More utility scripts and tools will be added to this monorepo over time. Each project will have its own directory with dedicated documentation.
- Python 3.10+
- Docker & Docker Compose (for projects requiring infrastructure)
-
Clone the repository:
git clone https://github.com/athrael.soju/little-scripts.git cd little-scripts -
Navigate to a specific project:
cd colnomic_qdrant_rag -
Follow the project-specific README for detailed setup instructions.
little-scripts/
βββ colqwen_fastapi/ # ColQwen2.5 FastAPI embedding service
βββ colmodernvbert_fastapi/ # ColModernVBert FastAPI embedding service
βββ duckdb_fastapi/ # DuckDB analytics service for OCR data
βββ colqwen_omni/ # Audio RAG system with ColQwen2.5-Omni
βββ colnomic_qdrant_rag/ # Multimodal document retrieval system
βββ eomt_panoptic_seg/ # Image segmentation web app
βββ deepseek-ocr/ # FastAPI wrapper for DeepSeek-OCR
βββ paddleocr_vl/ # PaddleOCR-VL FastAPI service
βββ vidore_benchmark/ # ViDoRe benchmark runner
βββ z-image-turbo/ # Z-Image-Turbo text-to-image generation
βββ [future-projects]/ # Additional projects will be added here
We welcome contributions to any of the projects in this monorepo!
Before contributing, please set up pre-commit hooks to ensure code quality:
-
Install pre-commit:
pip install pre-commit
-
Install the hooks:
pre-commit install
-
Run hooks on all files (optional):
pre-commit run --all-files
The pre-commit hooks will automatically run on each commit to check for:
- Code formatting and style
- Import sorting
- Trailing whitespace and other common issues
- Project-specific linting rules
- Create a new directory for your project
- Include a comprehensive README.md with:
- Project description and features
- Installation instructions
- Usage examples
- Configuration details
- Add your project to the main README's project list
- Follow the existing code style and documentation patterns
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Symptoms:
- Commits work fine from terminal but fail from IDE with error:
pre-commit not found. Did you forget to activate your virtualenv? - Pre-commit hooks are installed and work in terminal
Cause: The pre-commit hook is trying to use a Python executable path that's not accessible from the IDE's environment (e.g., WSL paths when running on Windows).
Solution:
Modify the .git/hooks/pre-commit file to use the system Python instead of a specific virtualenv path:
-
Open the pre-commit hook file:
# Edit .git/hooks/pre-commit -
Find the conditional block (around line 12-21) and replace:
# Change this: if [ -x "$INSTALL_PYTHON" ]; then exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}" # To this: if command -v python > /dev/null; then exec python -mpre_commit "${ARGS[@]}"
-
Test the fix:
# This should now work from both terminal and IDE git commit -m "Test commit"
Alternative Solutions:
- Option A: Commit from terminal where virtualenv is activated
- Option B: Skip hooks temporarily:
git commit --no-verify -m "message" - Option C: Use IDE's integrated terminal with virtualenv activated
Symptoms:
- Pre-commit hooks fail with ruff formatting errors
- Code appears correctly formatted but hooks still fail
Solution:
-
Run ruff manually to see specific issues:
ruff check . ruff format .
-
Configure ruff settings in
pyproject.toml(if needed):[tool.ruff] line-length = 88 target-version = "py310"
-
Run pre-commit on all files to fix batch issues:
pre-commit run --all-files
Symptoms:
- Services fail to start with port conflicts
- Docker containers exit immediately
Common Solutions:
-
Check if ports are already in use:
# Windows netstat -ano | findstr :6333 # Linux/macOS lsof -i :6333
-
Stop conflicting services:
docker-compose down docker system prune -f
-
Restart Docker daemon and try again
Solution:
# Fix volume permissions
sudo chown -R $USER:$USER ./dataSymptoms:
- ImportError or ModuleNotFoundError when running scripts
- Works in one environment but not another
Solutions:
-
Verify you're in the correct environment:
which python pip list | grep [package-name] -
Reinstall dependencies:
pip install -r requirements.txt
-
Check Python path conflicts:
python -c "import sys; print(sys.path)"
If you encounter issues not covered here:
- Check project-specific READMEs for additional troubleshooting
- Search existing issues in the repository
- Create a new issue with:
- Clear problem description
- Steps to reproduce
- Environment details (OS, Python version, etc.)
- Error messages and logs
Open source - feel free to use and modify as needed.
- ai-tools
- analytics
- audio-processing
- automation
- colmodernvbert
- colpali
- colqwen
- computer-vision
- diffusion-models
- document-processing
- document-retrieval
- duckdb
- embeddings
- fastapi
- gradio
- image-generation
- machine-learning
- multimodal-search
- ocr
- openai-api
- panoptic-segmentation
- python
- qdrant
- rag-system
- reranking
- speech-to-text
- text-to-image
- transformers
- utilities
- vector-database
β If you find this repository useful, please consider giving it a star!