A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.
-
Updated
Jan 1, 2026 - Rust
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.
🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync
Real-time webcam demo using SmolVLM with vLLM backend
This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment
A small VLM that sees everything
Scripts for combining SmolVLM and LLM
⭐ Comparing VLMs with CNNs for garbage classification
Real-time vision demo using SmolVLM with llama.cpp backend
A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.
A benchmark suite for lightweight generative multimodal Vision-Language Models, comparing ViLT and SmolVLM under resource-constrained inference environments. Demonstrates CPU-only deployment, model evaluation, and multimodal reasoning with images and text, highlighting practical GenAI engineering for real-world applications.
A Flask-based web app for managing multimodal datasets text and images with CRUD operations via SQLite, and seamless export as a structured Parquet dataset to Hugging Face Hub.
Receipt OCR using Fine-tuned VLMs
A some what optimized implementation of some light weight and popular models
Vision-Language Model (VLM) for real-time video analysis and description via webcam.
Add a description, image, and links to the smolvlm topic page so that developers can more easily learn about it.
To associate your repository with the smolvlm topic, visit your repo's landing page and select "manage topics."