AI Blind Glasses Project

An AI-powered smart glasses project designed for the visually impaired, combining ESP32 hardware with cloud/local Large Vision-Language Model (VLM) inference capabilities to enable real-time scene description and voice interaction.

Project Structure

.
├── CameraWebServer_PDM_Audio/   # ESP32 firmware code (C++)
│   ├── CameraWebServer_PDM_Audio.ino  # Main program
│   ├── app_httpd.cpp                  # HTTP & WebSocket server logic
│   └── camera_pins.h                  # Hardware pin definitions
└── eval_benchmark/              # Experiment evaluation & inference scripts (Python)
    ├── configs/                 # Experiment configuration files (.yaml)
    ├── scripts/                 # Common run scripts (.py, .sh, .bat)
    ├── src/                     # Core evaluation logic
    ├── 02_batch_infer_stream_tts_realtime_eval_v6_wifi_capture.py # Real-time inference & TTS evaluation
    ├── manifest_nlp_v6_mixedbest.csv  # Evaluation dataset manifest
    └── requirements.txt         # Python dependencies

1. Hardware (ESP32)

The CameraWebServer_PDM_Audio folder contains firmware to be flashed to an ESP32-S3 development board (e.g., Seeed Studio XIAO ESP32S3).

Features

Concurrent Mode: Camera (DVP) and PDM microphone (I2S) run simultaneously.
Web Services: Provides JPEG capture, MJPEG video stream, and WebSocket PCM16 audio stream.

Flashing Instructions

Open CameraWebServer_PDM_Audio.ino in Arduino IDE.
Select board XIAO_ESP32S3, enable OPI PSRAM. For other settings in Tools, refer to the Seeed Studio XIAO ESP32S3 documentation.
Fill in WiFi credentials (ssid, password).
Compile and flash.

2. Experiment Evaluation (eval_benchmark)

This module is used to run inference experiments with various large models (Gemini, Qwen, MiniCPM) and evaluate performance (latency, quality).

Environment Setup

cd eval_benchmark
pip install -r requirements.txt

Common Commands

Start MiniCPM-V 4.5 llama.cpp

Refer to MiniCPM-V-CookBook.

cd PATH_To_YOUR_LLAMACPP\llama.cpp\build\bin\Release

llama-server.exe -m "PATH_TO_YOUR_MODEL\ggml-model-Q4_K_M.gguf" --mmproj "PATH_TO_YOUR_PROJMODEL\mmproj-model-f16.gguf" -c 4096 -ngl 99 --port 8080 --host 0.0.0.0

Similarly, MiniCPM-o also supports the Llama.cpp server deployment method.

Open-ended Inference & Evaluation

python 02_batch_infer_llamacpp_nlp_v3_nothink.py

02_batch_infer_llamacpp_nlp_v3_nothink.py: Non-thinking mode, sometimes gives quick judgments, lower latency.
manifest_nlp_v6_mixedbest.csv: Collection of best prompts for each subtask selected through multiple experiments.
predictions_nlp_v6_mixedbest_nothink.csv: Model output in non-thinking mode.

Run Ablation Experiments

python eval_benchmark/scripts/run_local_only.py

Run Specific Configuration (e.g., resize 448 ablation)

python -m eval_benchmark.src.run_eval --config eval_benchmark/configs/ablation_resize_448.yaml

Run Wi-Fi E2E Experiment (connect to ESP32 camera)

python eval_benchmark/scripts/run_wifi_e2e.py --camera_url http://<ESP32_IP>/capture

Run MiniCPM-o Experiments

python eval_benchmark/scripts/run_omni_experiments.py

Call Cloud APIs (Gemini, GPT, Qwen)

Before using, set the corresponding API Key and proxy:

# Anaconda Prompt example
set HTTP_PROXY=http://127.0.0.1:7897
set HTTPS_PROXY=http://127.0.0.1:7897

# Run Gemini 2.5 Flash experiment
set GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY"
python eval_benchmark/scripts/run_cloud_api.py --provider gemini25

# Run Qwen experiment
set DASHSCOPE_API_KEY="YOUR_QWEN_API_KEY"
python eval_benchmark/scripts/run_cloud_api.py --provider qwen

Real-time Streaming Inference & TTS Evaluation

python eval_benchmark/02_batch_infer_stream_tts_realtime_eval_v6_wifi_capture.py \
    --manifest manifest_nlp_v6_mixedbest.csv \
    --camera_url http://<ESP32_IP>/capture \
    --use_camera 1 \
    --pack_mode raw \
    --out predictions_v6_wifi_raw.csv \
    --log predictions_v6_wifi_raw_log.txt

Aggregate Experiment Results

python -m eval_benchmark.src.aggregate --runs_dir eval_benchmark/runs --out_dir eval_benchmark

Practical Usage Demo

python demo_asr_vlm_stream_tts_glasses_esp32mic_v4_vad.py --mic_ws ws://YOUR_ESP32_IP/ws_audio --camera_url http://YOUR_ESP32_IP/capture --openai_base http://127.0.0.1:8080/v1 --openai_model "ggml-model-Q4_K_M.gguf" --whisper_model tiny --max_edge 896 --rotate 90

License

This project is served under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CameraWebServer_PDM_Audio		CameraWebServer_PDM_Audio
eval_benchmark		eval_benchmark
README.md		README.md
README_ZN.md		README_ZN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Blind Glasses Project

Project Structure

1. Hardware (ESP32)

Features

Flashing Instructions

2. Experiment Evaluation (eval_benchmark)

Environment Setup

Common Commands

Start MiniCPM-V 4.5 llama.cpp

Open-ended Inference & Evaluation

Run Ablation Experiments

Run Specific Configuration (e.g., resize 448 ablation)

Run Wi-Fi E2E Experiment (connect to ESP32 camera)

Run MiniCPM-o Experiments

Call Cloud APIs (Gemini, GPT, Qwen)

Real-time Streaming Inference & TTS Evaluation

Aggregate Experiment Results

Practical Usage Demo

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Blind Glasses Project

Project Structure

1. Hardware (ESP32)

Features

Flashing Instructions

2. Experiment Evaluation (eval_benchmark)

Environment Setup

Common Commands

Start MiniCPM-V 4.5 llama.cpp

Open-ended Inference & Evaluation

Run Ablation Experiments

Run Specific Configuration (e.g., resize 448 ablation)

Run Wi-Fi E2E Experiment (connect to ESP32 camera)

Run MiniCPM-o Experiments

Call Cloud APIs (Gemini, GPT, Qwen)

Real-time Streaming Inference & TTS Evaluation

Aggregate Experiment Results

Practical Usage Demo

License

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages