mutimodal

"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.

open-source ocr streamlit mutimodal llm meta-ai ollama llama-3-2-vision local-ocr

Updated Nov 21, 2024
Python

J1mL1 / DocMMIR

Star

[EMNLP 2025 Findings] Official code for "DocMMIR: A Framework for Document Multi-modal Information Retrieval".

information-retrieval document-retrieval mutimodal

Updated Nov 3, 2025
Python

gogotalk / furkids-ai-confounder-recruitment

Star

Furkids AI 招募儲備技術合夥人｜Decode the silent language of pets, build the world’s leading multimodal intelligence system 🐾🚀

opencv computer-vision deep-learning pytorch health-tech emotion-recognition distributed-training aws-deployment mutimodal animal-behavior-modelling emotion-ai ai-startup pet-tech ai-cofounder startup-equtity founder-track taiwan-startup

Updated Feb 16, 2026

ashutoshkr45 / QD-RetNet

Star

QD-RetNet: Efficient Retinal Disease Classification via Quantized Knowledge Distillation [MIUA-2025]

knowledge-distillation quantization-aware-training retinal-disease-detection mutimodal

Updated Jul 20, 2025
Python

Current BEV methods face two major limitations: height prediction relies solely on cameras, leading to inherently unstable and non-robust estimates; sensor calibration errors cause feature misalignment in BEV space, degrading fusion performance. To overcome these issues, we propose GeoHeightBEV, a multimodal roadside BEV perception framework.

camera detection lidar bev 3d-object-detection mutimodal bev-perception roadside-perception

Updated Mar 18, 2026
Python

PRITHIVSAKTHIUR / Cheers-HF-Demo

Star

Cheers-HF-Demo is an advanced, highly optimized full-stack web application built on the Gradio framework, engineered to interface seamlessly with the ai9stars/Cheers multimodal

web-app scikit-learn pillow pytorch scipy matplotlib gradio torchvision huggingface-transformers mutimodal einops huggingface-spaces

Updated Mar 20, 2026
Python

Improve this page

Add a description, image, and links to the mutimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mutimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutimodal

Here are 12 public repositories matching this topic...

duyu09 / MKTY-System

johnnyhank / MIRA-Multimodal-Intelligent-Robotic-Assistant

video-db / videodb-node

rekkles2 / Gaze-CIFAR-10

kingabzpro / Gemini-2-Pro-Chat

anusha-chebolu / multimodal-rag

dwain-barnes / llama3.2-vision-ocr-streamlit

J1mL1 / DocMMIR

gogotalk / furkids-ai-confounder-recruitment

ashutoshkr45 / QD-RetNet

akml2013 / GeoHeightBEV

PRITHIVSAKTHIUR / Cheers-HF-Demo

Improve this page

Add this topic to your repo