[AI LLM + Medicine and Healthcare] Minh Khoe Tue Y Smart Healthcare System【人工智能大模型与医疗保健毕业设计项目】明康慧医(MKTY)智慧医疗系统)
-
Updated
Oct 26, 2025 - Vue
[AI LLM + Medicine and Healthcare] Minh Khoe Tue Y Smart Healthcare System【人工智能大模型与医疗保健毕业设计项目】明康慧医(MKTY)智慧医疗系统)
基于Qwen Agent框架,融合JAKA机械臂、视觉检测、语音识别与合成、MCP数据库的多模态大模型
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
Gemini 2 Pro app for Image, Audio, and Document understanding + Code Execution.
A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for text and image-based retrieval.
"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.
[EMNLP 2025 Findings] Official code for "DocMMIR: A Framework for Document Multi-modal Information Retrieval".
Furkids AI 招募儲備技術合夥人|Decode the silent language of pets, build the world’s leading multimodal intelligence system 🐾🚀
QD-RetNet: Efficient Retinal Disease Classification via Quantized Knowledge Distillation [MIUA-2025]
Current BEV methods face two major limitations: height prediction relies solely on cameras, leading to inherently unstable and non-robust estimates; sensor calibration errors cause feature misalignment in BEV space, degrading fusion performance. To overcome these issues, we propose GeoHeightBEV, a multimodal roadside BEV perception framework.
Cheers-HF-Demo is an advanced, highly optimized full-stack web application built on the Gradio framework, engineered to interface seamlessly with the ai9stars/Cheers multimodal
Add a description, image, and links to the mutimodal topic page so that developers can more easily learn about it.
To associate your repository with the mutimodal topic, visit your repo's landing page and select "manage topics."