Semantixel is an advanced semantic media retrieval system for local and connected image sources. It seamlessly indexes visual content using CLIP embeddings, extracts on-image text via OCR, and exposes robust search workflows for text-to-image, image-to-image, and OCR-backed retrieval through a lightweight web interface.
The project is tailored for personal knowledge bases, research datasets, screenshot archives, and comprehensive media collections where traditional keyword search is insufficient.
- Natural-Language Image Retrieval: Leverage CLIP text and image embeddings for intuitive search.
- Visual Similarity Search: Find related images efficiently from a reference image.
- OCR-Assisted Retrieval: Search through screenshots, documents, and images containing text.
- Video Frame Extraction and Indexing: Enable semantic search across video assets by analyzing extracted frames.
- Multi-Source Media Support: Utilize source-aware media identifiers for robust indexing.
- Google Drive Integration: Authenticate via OAuth to seamlessly index and serve cloud-based images.
- Interactive Web Interface: Browse results, preview media, and explore relationships in an intuitive graph-based view.
- Configurable Indexing Behavior: Customize settings effortlessly through the desktop application.
Semantixel integrates three distinct retrieval strategies:
- Visual Retrieval: Embeds image and text queries into a shared CLIP space.
- OCR Retrieval: Extracts and indexes text from images for semantic and BM25 search.
- Metadata-Aware Serving: Resolves indexed items via source-aware media identifiers instead of relying purely on local file paths.
At a high level:
- Media is discovered from configured local directories and connected sources.
- Images and extracted video frames are embedded utilizing CLIP.
- OCR text is extracted and stored for both semantic and BM25 search mechanisms.
- Embeddings and metadata are managed within ChromaDB and the BM25 index.
- The REST API serves search results and media content directly to the web UI.
- Python 3.11
- CUDA-capable GPU (Recommended for optimal indexing and search performance)
- Conda or an alternative Python environment manager
Create and activate a new environment:
conda create -n semantixel python=3.11 -y
conda activate semantixel
pip install -r requirements.txtLaunch the settings utility:
python settings.pyRun a full local scan to index your files:
python main.py --scanStart the application server:
python main.py --serveAlternatively, execute the default combined workflow:
python main.pyRuntime configuration is maintained in config.yaml. Key settings include:
include_directories: Local directories to scan.exclude_directories: Local directories to ignore.batch_size: The number of items processed per indexing batch.clip: Configuration for the CLIP provider and model checkpoints.text_embed: Settings for the text embedding provider.ocr_provider: Selection of the OCR backend.google_drive: Configuration for Google Drive integration.
Semantixel natively supports indexing and serving images directly from Google Drive.
Example Configuration:
google_drive:
enabled: true
client_secret_file: path/to/client_secret.json
token_file: google_drive_token.json
redirect_uri: http://localhost:23107/integrations/google_drive/auth/callback
folder_ids: []
include_shared_drives: false
page_size: 100Integration Steps:
- Create a Google Cloud OAuth client of type
Web application. - Set the redirect URI to
http://localhost:23107/integrations/google_drive/auth/callback. - Download the client secret JSON file.
- Update
config.yamlaccordingly. - Start the application and authenticate via
Connect Google Drivein the web UI. - Run
python main.py --scanto commence indexing Drive images.
Note: OAuth secrets and token files must remain secure and excluded from version control.
Semantixel supports comprehensive search capabilities:
- Caption Search: Retrieve images or video frames using natural language descriptions.
- Similar Image Search: Discover visually related images starting from a reference image or identifier.
- Text Content Search: Locate images based on OCR-detected text.
- Graph Exploration: Analyze and inspect similarity relationships between indexed assets.
- Query a screenshot archive using natural language such as "dashboard with a warning banner" or "terminal output showing build failure".
- Locate visually similar product photography, design mockups, or duplicate assets across a large catalog.
- Identify images containing specific OCR-detected phrases like invoice numbers, application labels, or error messages.
- Navigate significant segments within video files by retrieving semantically relevant extracted frames.
- Establish a unified knowledge base combining local storage with cloud-hosted libraries.
Key directories:
semantixel/: Core backend services, API endpoints, providers, and source integrations.settings/: Desktop configuration interface.UI/: Web interface and Flow Launcher integrations.docs/: Technical documentation and system design notes.db/: ChromaDB and BM25 artifacts generated during runtime.
- Local media access is strictly confined to configured inclusion directories.
- External URLs are rigorously validated prior to ingestion for image queries.
- Google Drive access is securely delegated via OAuth 2.0 API calls.
- Security credentials and tokens must remain local and ignored by Git.
