StreamASR - Real-time Speech Recognition Service

🎯 OpenAI Realtime API Compatible Real-time Speech Recognition Service

🌐 Language: English | 中文

📋 Project Overview

StreamASR is a high-performance real-time speech recognition service that provides WebSocket interface for real-time audio stream transcription (converting OpenAI standard v1/audio/transcriptions interface to real-time speech recognition service). The project integrates VAD (Voice Activity Detection) and multiple ASR engines, supporting flexible configuration and deployment.

✨ Core Features

🎤 Real-time Speech Recognition - Low-latency audio stream processing based on WebSocket
🧠 Smart VAD Detection - Integrated Sherpa-ONNX voice activity detection with automatic audio submission trigger
🔇 AI Noise Reduction - Built-in denoiser using GTCRN model for enhanced speech recognition in noisy environments
🔄 OpenAI Compatible - Supports OpenAI-compatible ASR interface with configurable multiple models
📊 Structured Logging - Detailed logging and monitoring based on logrus
🐳 Docker Support - Complete containerized deployment solution
🔧 Version Management - Automated version management and build process
🌐 Multi-language SDK - Provides Go and TypeScript client SDKs

🚀 Quick Start

📋 Prerequisites

Go 1.23+ - Server runtime environment
VAD Model File - Sherpa-ONNX VAD model (silero_vad.onnx)
Denoiser Model File - Sherpa-ONNX GTCRN model (gtcrn_simple.onnx)
ASR Service - OpenAI-compatible speech recognition API

⚡ Quick Launch

Method 1: Using Makefile (Recommended)

# Clone the project
git clone https://github.com/go-restream/stt.git
cd stt

# Install dependencies and build
make install
make build

# Start the service
make run

Method 2: Manual Compilation

# Install dependencies
go mod download

# Build the project
go build -o streamASR main.go

# Start the service
./streamASR -c config.yaml

Method 3: Docker Deployment

# Using docker-compose
make docker-deploy

# Or manual build
make docker-build
make docker-compose-up

🎯 Verify Installation

After the service starts, you can verify it through:

# View version information
./build/streamASR -v

# Health check
curl http://localhost:8088/health

# Check service status
curl http://localhost:8088/status

🌐 Web Interface

StreamASR provides a built-in Web UI tool that allows users to perform real-time speech recognition testing directly through their browser.

📸 Web UI Preview

🎯 Accessing Web UI

After starting the service, visit in your browser:

# Main interface
http://localhost:8088/

# Or directly access static files
http://localhost:8088/static/index.html

✨ Web UI Features

🎤 Real-time Audio Visualization - Dynamic display of audio waveforms and volume levels
🔧 Configuration Options - Support for sample rate selection (16kHz/48kHz) and VAD toggle
⚡ Real-time Transcription - Real-time display of speech recognition results
🎨 Theme Switching - Support for multiple visual themes (Deep Blue Tech, Purple Cyber, Green Matrix)
💾 Result Saving - Support for copying and saving transcription results
🤖 AI Summary - Integrated AI functionality for intelligent summarization of transcription content

🎮 Usage Steps

Open Browser Visit http://localhost:8088
Configure Parameters Select sample rate and VAD detection toggle
Click Start Launch speech recognition
Authorize Microphone Browser will request microphone permission
Start Speaking View real-time transcription results
Save Results Use save button to copy transcription text

🔧 Technical Features

WebSocket Connection - Low-latency communication based on WebSocket
Auto Reconnection - Support for automatic reconnection on disconnection
Heartbeat Detection - 30-second heartbeat to maintain stable connection
Error Handling - Comprehensive error prompts and status display

🔧 Configuration

Service Configuration File (config.yaml)

# Service port configuration
service_port: "8088"

# OpenAI compatible ASR interface configuration
asr:
  base_url: "http://localhost:3000/v1"        # ASR interface base URL
  api_key: "your-api-key"                    # ASR interface API key
  model: "FireRed-large"                     # ASR model name

# OpenAI compatible LLM interface configuration (optional)
llm:
  base_url: "https://api.deepseek.com/v1"    # LLM interface base URL
  api_key: "your-llm-api-key"                # LLM interface API key
  model: "deepseek-chat"

# Audio configuration
audio:
  enable: true
  save_dir: "./audio"                        # Audio file save directory
  keep_files: 10                             # Keep recent wav file records
  sample_rate: 16000                         # Sample rate (16kHz/48kHz)
  channels: 1                                # Number of channels
  bit_depth: 16                              # Bit depth
  buffer_size: 10                            # 10-second buffer

# VAD configuration
vad:
  enable: true
  model: "./model/silero_vad.onnx"          # VAD model path
  threshold: 0.5                             # Speech detection threshold
  min_silence_duration: 1                    # Minimum silence duration (seconds)
  min_speech_duration: 0.1                   # Minimum speech duration (seconds)
  window_size: 512                           # Window size
  max_speech_duration: 8.0                   # Maximum speech duration (seconds)
  sample_rate: 16000                         # Sample rate
  num_threads: 1                             # Number of threads
  provider: "cpu"                            # Compute provider

# Denoiser configuration (AI Noise Reduction)
denoiser:
  enable: true                               # Enable/disable denoiser
  model: "./model/gtcrn_simple.onnx"         # GTCRN denoiser model path
  sample_rate: 16000                         # Sample rate
  num_threads: 1                             # Number of threads
  debug: 0                                   # Debug level (0-3)
  bypass_for_testing: false                  # Bypass denoiser for testing
  max_processing_time_ms: 50                 # Maximum processing time (ms)

# Logging configuration
logging:
  level: "info"                              # Log level
  file: ""                                   # Log file path, empty means output to stderr
  format: "json"                             # Log format: json, text

🐳 Docker Deployment

Quick Start with Docker Compose (Recommended)

# One-command deployment (build and start all services)
make docker-deploy

# Check service status
make docker-ps

# View real-time logs
make docker-compose-logs

# Stop all services
make docker-compose-down

Docker Compose Configuration

The docker-compose.yml provides complete service orchestration:

version: '3.8'
services:
  streamASR:
    build: .
    ports:
      - "8088:8088"
    volumes:
      - ./config/config.yaml:/app/config/config.yaml:ro
      - ./build/model:/app/model:ro
      - ./audio:/app/audio
      - ./logs:/app/logs
    environment:
      - VERSION=v0.1.2
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8088/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Manual Docker Commands

# Build Docker image
make docker-build

# Run container with volume mounts
make docker-run

# Enter container for debugging
make docker-exec

# View container logs
make docker-logs

# Stop and remove container
make docker-stop

Development Mode

# Build development image and run interactive container
make docker-debug

# Run in development mode with hot reload
make docker-dev

Dockerfile Features

Multi-stage build - Optimizes image size
Version information injection - Automatically injects version, build time and other information
Health check - Built-in health check mechanism
Non-root user - Secure container runtime environment
Production-ready - Optimized for production deployment

Container Management

# List all containers
docker ps -a

# Monitor resource usage
docker stats

# Clean up unused resources
make docker-clean

For detailed Docker deployment guide, please refer to: docs/DOCKER.md | English Version

📋 Command Line Options

# Display version information
./streamASR -v
./streamASR --version

# Specify configuration file
./streamASR -c config.yaml

# View help information
./streamASR -h

🧪 Client SDK

Go SDK

package main

import (
    "streamASR/sdk/golang/client"
)

func main() {
    // Create client
    recognizer := client.NewRecognizer("ws://localhost:8088")

    // Connect and start recognition
    err := recognizer.Connect()
    if err != nil {
        panic(err)
    }
    defer recognizer.Close()

    // Handle audio...
}

TypeScript SDK

import { StreamASRClient } from '@streamasr/typescript-sdk';

const client = new StreamASRClient({
    url: 'ws://localhost:8088',
    autoConnect: true
});

// Listen for transcription results
client.on('transcription', (result) => {
    console.log('Recognition result:', result.text);
});

📊 Monitoring and Logging

Structured Logging

The service uses logrus for structured logging:

{
  "component": "mont_srv_status",
  "action": "health_check_status",
  "version": "v0.1.2-171f62c",
  "build_time": "2025-11-02T05:24:39Z",
  "git_commit": "171f62c",
  "level": "info",
  "msg": "✔ Starting StreamASR v0.1.2-171f62c with config: config.yaml"
}

Health Check

# Basic health check
curl http://localhost:8088/health

# Return example
{
  "status": "healthy",
  "version": "v0.1.2-171f62c",
  "uptime": "2h30m15s",
  "asr_engine": "available"
}

🔧 Version Management

The project adopts semantic version management and supports automated version releases:

# View current version
make version

# Version upgrade
make version-bump-patch    # v0.1.2 -> v0.1.3
make version-bump-minor    # v0.1.2 -> v0.2.0
make version-bump-major    # v0.1.2 -> v1.0.0

# Create Git tag
make tag

# Build Docker image
make docker-build          # Generate streamasr:latest and streamasr:v0.1.2

For detailed version management guide, please refer to: docs/VERSION.md

🛠️ Development Guide

Development Environment Setup

# Clone project
git clone https://github.com/go-restream/stt.git
cd stt

# Install dependencies
make install

# Run tests
make test

# Build
make build

# Run
make run

Development Mode

# Docker development mode
make docker-debug

# View logs
make docker-logs

# Enter container for debugging
make docker-exec

Testing

# Run unit tests
make test

# Run integration tests
go test ./...

🐛 Troubleshooting

Common Issues

VAD Model File Missing

# Ensure VAD model file exists
ls -la model/silero_vad.onnx

Denoiser Model File Missing

# Ensure denoiser model file exists
ls -la model/gtcrn_simple.onnx

ASR Service Connection Failed

# Check ASR service configuration
curl -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"model":"FireRed-large","file":"..."}' \
     $ASR_BASE_URL/audio/transcriptions

Port Occupied

# Check port occupation
lsof -i :8088

# Modify port in configuration file
vim config.yaml

Debug Mode

Enable verbose logging:

# Modify configuration file
vim config.yaml
# Set logging.level: "debug"

# Or set environment variable
export LOG_LEVEL=debug
./streamASR

📊 Performance Metrics

Response Latency: < 500ms end-to-end recognition latency
Concurrency Support: Supports multiple concurrent WebSocket connections
Audio Processing: Supports 16kHz/48kHz sample rates
VAD Latency: < 100ms voice activity detection latency
Denoiser Latency: < 20ms additional processing time for noise reduction
Noise Reduction: Improved ASR accuracy in noisy environments

🤝 Contributing

We welcome community contributions! Please follow these steps:

Fork the project repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Create a Pull Request

Development Guidelines

Follow Go coding standards
Add unit tests
Update relevant documentation
Pass all CI checks

📞 Support & Help

📚 Complete Documentation

Docker Deployment Guide - Complete Docker deployment instructions
Version Management Documentation - Version management specifications and usage
Changelog - Detailed version change records

🆘 Getting Help

Method	Description	Response Time
GitHub Issues	Bug reports and feature requests	24-48 hours
GitHub Discussions	Technical discussions and Q&A	Community response

🏷️ Version Updates

v0.1.2 (2025-11-03)

✨ New Features

🔇 AI Noise Reduction - Built-in denoiser using GTCRN model for enhanced speech recognition
🏷️ Version Management System - Complete version management and build process
🐳 Docker Support - Complete containerized deployment solution
📋 Makefile Integration - Automated build and deployment scripts
📖 Documentation Enhancement - Detailed deployment and development documentation

🔧 Technical Improvements

🎯 Audio Pipeline Enhancement - Integrated denoiser between VAD and ASR processing
🔧 Project Structure Optimization - Clearer code organization and module division
📝 Logging Enhancement - Startup logs include version information
🛠️ Build Process - Support for automatic version information injection
🧪 Comprehensive Testing - Unit tests for denoiser functionality

v0.1.1

✨ New Features

🎤 Real-time Speech Recognition - WebSocket-based audio stream processing
🧠 VAD Integration - Sherpa-ONNX voice activity detection
🔄 ASR Interface - OpenAI-compatible speech recognition API
📊 Health Check - Service status monitoring interface

🎯 Summary

StreamASR is a feature-complete, easy-to-deploy real-time speech recognition service. Through Docker containerization, version management system, and comprehensive documentation, it provides a reliable speech recognition solution for production environments.

⭐ If this project helps you, please give us a Star!

🎯 StreamASR - Making Speech Recognition Simple and Powerful

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
config		config
denoiser		denoiser
docs		docs
enhance		enhance
internal		internal
llm		llm
pkg		pkg
scripts		scripts
sdk		sdk
static		static
vad		vad
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README-en.md		README-en.md
README-zh.md		README-zh.md
README.md		README.md
VERSION		VERSION
asrTool.png		asrTool.png
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
package-lock.json		package-lock.json
package.json		package.json

License

go-restream/stt

Folders and files

Latest commit

History

Repository files navigation

StreamASR - Real-time Speech Recognition Service

📋 Project Overview

✨ Core Features

🚀 Quick Start

📋 Prerequisites

⚡ Quick Launch

Method 1: Using Makefile (Recommended)

Method 2: Manual Compilation

Method 3: Docker Deployment

🎯 Verify Installation

🌐 Web Interface

📸 Web UI Preview

🎯 Accessing Web UI

✨ Web UI Features

🎮 Usage Steps

🔧 Technical Features

🔧 Configuration

Service Configuration File (config.yaml)

🐳 Docker Deployment

Quick Start with Docker Compose (Recommended)

Docker Compose Configuration

Manual Docker Commands

Development Mode

Dockerfile Features

Container Management

📋 Command Line Options

🧪 Client SDK

Go SDK

TypeScript SDK

📊 Monitoring and Logging

Structured Logging

Health Check

🔧 Version Management

🛠️ Development Guide

Development Environment Setup

Development Mode

Testing

🐛 Troubleshooting

Common Issues

Debug Mode

📊 Performance Metrics

🤝 Contributing

Development Guidelines

📞 Support & Help

📚 Complete Documentation

🆘 Getting Help

🏷️ Version Updates

v0.1.2 (2025-11-03)

✨ New Features

🔧 Technical Improvements

v0.1.1

✨ New Features

🎯 Summary

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages