Skip to content

athrael-soju/little-scripts

Repository files navigation

Little Scripts

Little Scripts Logo

About

A monorepo containing various utility scripts, tools, and applications for development, automation, and AI-powered tasks. Like what you see, or find it useful? Drop a star!!!

πŸ“ Projects

πŸš€ ColQwen2.5 FastAPI Service

A FastAPI-based service for generating embeddings from images and text queries using the ColQwen2.5 model.

Key Features:

  • πŸ–ΌοΈ Generate embeddings for images and text
  • ⚑ High-performance inference with Flash Attention 2
  • πŸ—οΈ RESTful API for easy integration
  • πŸ“Š Interactive API documentation
  • πŸ₯ Built-in health monitoring

Quick Start:

cd colqwen_fastapi
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7000 --reload

πŸ“– View Full Documentation

πŸ” ColModernVBert FastAPI Service

A high-performance FastAPI service for generating embeddings from images and text queries using the ColModernVBert model from the ColPali engine.

Key Features:

  • πŸ–ΌοΈ Generate embeddings for images with automatic boundary detection
  • πŸ“ Create embeddings for text queries optimized for retrieval
  • 🧩 Calculate patch dimensions for image sizes
  • πŸ”¬ Generate interpretability maps showing query-document token correspondence
  • ⚑ GPU acceleration with Flash Attention 2 support
  • πŸ”„ Built-in service restart endpoint for container orchestration
  • πŸ₯ Comprehensive health monitoring and model information

Quick Start:

cd colmodernvbert_fastapi
pip install -r requirements.txt
python main.py

Docker:

cd colmodernvbert_fastapi
docker compose up --build

πŸ“– View Full Documentation

πŸ—„οΈ DuckDB Analytics FastAPI Service

A lightweight FastAPI service providing columnar analytics storage for OCR data using DuckDB. Designed to work seamlessly with document processing pipelines.

What it does:

  • πŸ“„ Document Management: Store and retrieve document metadata with versioning
  • πŸ“Š Page-Level Storage: Efficient storage of OCR results with full text and markdown
  • 🎯 Region Tracking: Store structured bounding boxes and content for document regions
  • πŸ” Full-Text Search: Fast text search across all indexed documents
  • πŸ’Ύ Custom Queries: Execute read-only SQL queries for analytics
  • πŸ—οΈ Maintenance API: Initialize, clear, or reset database storage

Key technical features:

  • Columnar storage with DuckDB for analytical queries
  • Automatic data compression (3-5x typical ratio)
  • Read-only query API with security filters
  • Integration-ready for OCR pipelines
  • Request tracking and performance monitoring

Quick Start:

cd duckdb_fastapi
cp .env.example .env
pip install -r requirements.txt
python main.py

Docker:

cd duckdb_fastapi
docker compose up -d

πŸ“– View Full Documentation

🎡 ColQwen2.5-Omni Audio RAG System

An Audio Retrieval-Augmented Generation (RAG) app that combines ColQwen2.5-Omni multimodal model with OpenAI's GPT-4 audio capabilities for intelligent audio content analysis.

What it does:

  • 🎡 Audio Processing: Process video URLs and extract audio from Video content automatically
  • 🧠 Advanced Audio Understanding: Uses ColQwen2.5-Omni model for creating semantic audio embeddings
  • πŸ’¬ Intelligent Q&A: Ask questions about audio content and get contextual answers
  • πŸ”Š Audio Responses: Receive answers in both text and audio format using OpenAI's audio API
  • πŸ“Š Chunk-based Processing: Configurable audio chunking for optimal processing and retrieval
  • 🌐 Beautiful Web Interface: Intuitive Gradio-based UI with multiple tabs for different functions

Key technical features:

  • ColQwen2.5-Omni model for audio embedding generation
  • OpenAI GPT-4 audio API for natural language responses
  • GPU acceleration with Flash Attention 2 support
  • Batch processing for efficient large audio handling
  • Real-time audio processing pipeline

Usage: Run python main.py and follow the intuitive web interface to process videos and ask questions!

πŸ“– View Full Documentation

πŸ€– ColPali(ColNomic) + Qdrant + MinIO Retrieval System

A powerful multimodal document retrieval system that combines ColPali embeddings with vector search for intelligent document analysis.

What it does:

  • πŸ” Conversational Search: Just ask questions in natural language - no commands needed
  • πŸ’¬ AI-Powered Responses: Get intelligent, contextual answers about your documents
  • πŸ“„ PDF & Image Support: Process complex visual documents with charts, diagrams, and mixed content
  • ⚑ Optimized Performance: 13x faster search with binary quantization and reranking optimization
  • πŸ€– Streamlined Interface: Simple conversational CLI that starts ready to use

Key technical features:

  • Binary quantization for 90%+ storage reduction
  • Mean pooling reranking optimization (enabled by default)
  • Background image processing pipeline
  • Docker deployment with Qdrant + MinIO
  • Graceful handling of optional services (OpenAI, MinIO)

Usage: Simply run python main.py and start asking questions about your documents!

πŸ“– View Full Documentation

πŸ–ΌοΈ EOMT Panoptic Segmentation App

An interactive web application for panoptic segmentation using the EOMT (Encoder-only Mask Transformer) model - a minimalist Vision Transformer approach for image segmentation.

What it does:

  • πŸ–₯️ Interactive web interface for image segmentation
  • 🎨 Multiple visualization types (masks, overlays, contours, analytics)
  • ⚑ Real-time processing with detailed segment statistics
  • πŸ§ͺ Built-in test images for experimentation

Key highlights: Up to 4Γ— faster than complex methods, Gradio interface, comprehensive analytics

πŸ“– View Full Documentation

πŸ“Š ViDoRe Benchmark Runner

A little script to run the ViDoRe (v1 and v2) benchmarks from MTEB using the official ViDoRe model wrapper.

Quick Start:

cd vidore_benchmark
pip install -r requirements.txt \
  && pip install --upgrade --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu129
python app.py

πŸ“– View Full Documentation

πŸ“ DeepSeek OCR Service

A GPU-friendly FastAPI wrapper around deepseek-ai/DeepSeek-OCR that turns images and PDFs into cleaned text, markdown with inline figures, bounding boxes, and annotated previews.

What it does:

  • πŸ–ΌοΈ Accepts JPEG/PNG/WebP images or PDFs (multi-page) up to the configured upload limit.
  • πŸŽ›οΈ Offers five processing modes (Gundam through Large) and task-specific prompts for markdown, plain OCR, descriptions, locate queries, or custom instructions.
  • 🧱 Returns markdown, raw output, structured bounding boxes, cropped figures (base64), and annotated overview images in a single JSON payload.
  • πŸ“‘ Exposes /health, /info, and /api/ocr endpoints with automatic CORS handling and detailed startup logs.

Quick Start:

cd deepseek-ocr
cp .env.example .env
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python main.py

Docker (GPU):

cd deepseek-ocr
docker compose up --build

πŸ“– View Full Documentation

πŸ“ PaddleOCR-VL Service

A GPU-aware FastAPI service based on PaddleOCR-VL for extracting rich document structure from images and PDFs.

What it does:

  • πŸ“„ Accepts images (JPEG/PNG/BMP/TIFF) and PDFs up to 50 MB
  • 🧠 Returns structured blocks with bounding boxes, aggregated text, and Markdown
  • βš™οΈ Lazy-loads PaddleOCR-VL for quick startup and GPU-friendly concurrency
  • 🩺 Exposes health status along with interactive API docs

Key highlights: Docker + Compose manifests with CUDA 13.0 base image, injectable pipeline factory for testing, detailed logging.

πŸ“– View Full Documentation

🎨 Z-Image-Turbo

A high-performance image generation application powered by the Z-Image-Turbo model with a Gradio interface for text-to-image synthesis. Based on the official Gradio demo, refactored with additional features.

What it does:

  • πŸ–ΌοΈ Text-to-Image Generation: Generate high-quality images from text prompts using the Z-Image-Turbo diffusion transformer model
  • πŸ“ Multiple Resolutions: Support for 20+ resolution options across 1024px and 1280px bases with various aspect ratios
  • ✨ Prompt Enhancement: AI-powered prompt expansion via Qwen API for better image generation
  • ⚑ Performance Optimization: Automatic Flash Attention 2/3 detection and optional PyTorch compilation
  • 🌐 Multi-Language UI: Interface available in English, Chinese, Korean, Spanish, Japanese, French, German, and Portuguese

Quick Start:

cd z-image-turbo
cp .env.example .env
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
python app.py

Key highlights: Gradio-based web UI, configurable attention backends, model warmup for production, 12GB+ VRAM recommended.

πŸ“– View Full Documentation

πŸ”§ Future Projects

More utility scripts and tools will be added to this monorepo over time. Each project will have its own directory with dedicated documentation.

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Docker & Docker Compose (for projects requiring infrastructure)

Getting Started

  1. Clone the repository:

    git clone https://github.com/athrael.soju/little-scripts.git
    cd little-scripts
  2. Navigate to a specific project:

    cd colnomic_qdrant_rag
  3. Follow the project-specific README for detailed setup instructions.

πŸ“– Project Structure

little-scripts/
β”œβ”€β”€ colqwen_fastapi/               # ColQwen2.5 FastAPI embedding service
β”œβ”€β”€ colmodernvbert_fastapi/        # ColModernVBert FastAPI embedding service
β”œβ”€β”€ duckdb_fastapi/                # DuckDB analytics service for OCR data
β”œβ”€β”€ colqwen_omni/                  # Audio RAG system with ColQwen2.5-Omni
β”œβ”€β”€ colnomic_qdrant_rag/           # Multimodal document retrieval system
β”œβ”€β”€ eomt_panoptic_seg/             # Image segmentation web app
β”œβ”€β”€ deepseek-ocr/                  # FastAPI wrapper for DeepSeek-OCR
β”œβ”€β”€ paddleocr_vl/                  # PaddleOCR-VL FastAPI service
β”œβ”€β”€ vidore_benchmark/              # ViDoRe benchmark runner
β”œβ”€β”€ z-image-turbo/                 # Z-Image-Turbo text-to-image generation
└── [future-projects]/             # Additional projects will be added here

🀝 Contributing

We welcome contributions to any of the projects in this monorepo!

Development Setup

Before contributing, please set up pre-commit hooks to ensure code quality:

  1. Install pre-commit:

    pip install pre-commit
  2. Install the hooks:

    pre-commit install
  3. Run hooks on all files (optional):

    pre-commit run --all-files

The pre-commit hooks will automatically run on each commit to check for:

  • Code formatting and style
  • Import sorting
  • Trailing whitespace and other common issues
  • Project-specific linting rules

Adding a New Project

  1. Create a new directory for your project
  2. Include a comprehensive README.md with:
    • Project description and features
    • Installation instructions
    • Usage examples
    • Configuration details
  3. Add your project to the main README's project list
  4. Follow the existing code style and documentation patterns

Contributing to Existing Projects

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Commit your changes (git commit -m 'Add some amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

πŸ› οΈ Troubleshooting

Pre-commit Hook Issues

Problem: "pre-commit not found" when committing from Cursor/VS Code

Symptoms:

  • Commits work fine from terminal but fail from IDE with error: pre-commit not found. Did you forget to activate your virtualenv?
  • Pre-commit hooks are installed and work in terminal

Cause: The pre-commit hook is trying to use a Python executable path that's not accessible from the IDE's environment (e.g., WSL paths when running on Windows).

Solution: Modify the .git/hooks/pre-commit file to use the system Python instead of a specific virtualenv path:

  1. Open the pre-commit hook file:

    # Edit .git/hooks/pre-commit
  2. Find the conditional block (around line 12-21) and replace:

    # Change this:
    if [ -x "$INSTALL_PYTHON" ]; then
        exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"
    
    # To this:
    if command -v python > /dev/null; then
        exec python -mpre_commit "${ARGS[@]}"
  3. Test the fix:

    # This should now work from both terminal and IDE
    git commit -m "Test commit"

Alternative Solutions:

  • Option A: Commit from terminal where virtualenv is activated
  • Option B: Skip hooks temporarily: git commit --no-verify -m "message"
  • Option C: Use IDE's integrated terminal with virtualenv activated

Problem: Ruff formatting conflicts

Symptoms:

  • Pre-commit hooks fail with ruff formatting errors
  • Code appears correctly formatted but hooks still fail

Solution:

  1. Run ruff manually to see specific issues:

    ruff check .
    ruff format .
  2. Configure ruff settings in pyproject.toml (if needed):

    [tool.ruff]
    line-length = 88
    target-version = "py310"
  3. Run pre-commit on all files to fix batch issues:

    pre-commit run --all-files

Docker and Infrastructure Issues

Problem: Docker services not starting

Symptoms:

  • Services fail to start with port conflicts
  • Docker containers exit immediately

Common Solutions:

  1. Check if ports are already in use:

    # Windows
    netstat -ano | findstr :6333
    
    # Linux/macOS
    lsof -i :6333
  2. Stop conflicting services:

    docker-compose down
    docker system prune -f
  3. Restart Docker daemon and try again

Problem: Permission issues with Docker volumes

Solution:

# Fix volume permissions
sudo chown -R $USER:$USER ./data

Python Environment Issues

Problem: Module not found errors

Symptoms:

  • ImportError or ModuleNotFoundError when running scripts
  • Works in one environment but not another

Solutions:

  1. Verify you're in the correct environment:

    which python
    pip list | grep [package-name]
  2. Reinstall dependencies:

    pip install -r requirements.txt
  3. Check Python path conflicts:

    python -c "import sys; print(sys.path)"

Getting Help

If you encounter issues not covered here:

  1. Check project-specific READMEs for additional troubleshooting
  2. Search existing issues in the repository
  3. Create a new issue with:
    • Clear problem description
    • Steps to reproduce
    • Environment details (OS, Python version, etc.)
    • Error messages and logs

πŸ“ License

Open source - feel free to use and modify as needed.

🏷️ Repository Topics

  • ai-tools
  • analytics
  • audio-processing
  • automation
  • colmodernvbert
  • colpali
  • colqwen
  • computer-vision
  • diffusion-models
  • document-processing
  • document-retrieval
  • duckdb
  • embeddings
  • fastapi
  • gradio
  • image-generation
  • machine-learning
  • multimodal-search
  • ocr
  • openai-api
  • panoptic-segmentation
  • python
  • qdrant
  • rag-system
  • reranking
  • speech-to-text
  • text-to-image
  • transformers
  • utilities
  • vector-database

⭐ If you find this repository useful, please consider giving it a star!

About

A monorepo containing various utility scripts, tools, and applications for development, automation, and AI-powered tasks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •