Audio fingerprinting system built from scratch in Go. Identify songs from short audio clips using Shazam-like algorithms, with optional client-side WebAssembly processing for complete privacy.
- π΅ Shazam-Grade Matching - Identifies songs from 5-15 second clips with background noise
- π Privacy-Preserving - Optional WASM processing keeps audio in browser
- πΌ Universal Audio Support - MP3, WAV, FLAC, AAC, M4A, OGG via FFmpeg
- πΉ YouTube Integration - Auto-download and extract metadata from URLs
- π» Multiple Interfaces - CLI tool, REST API, and WASM web frontend
Prerequisites:
# Clone and build
git clone https://github.com/himanishpuri/AcousticDNA.git
cd AcousticDNA
go mod download
# Build CLI
go build -o acousticDNA ./cmd/cli/
# Build server
go build -o server ./cmd/server/
# Build WASM (optional)
GOOS=js GOARCH=wasm go build -o web/public/fingerprint.wasm ./cmd/wasm/# Add song from file
./acousticDNA add song.mp3 --title "Sandstorm" --artist "Darude"
# Add from YouTube
./acousticDNA add --youtube-url "https://youtube.com/watch?v=VIDEO_ID"
# Match audio
./acousticDNA match recording.wav
# List songs
./acousticDNA list
# Delete song
./acousticDNA delete <song-id># Start server
./server -port 8080
# Add song
curl -X POST http://localhost:8080/api/songs \
-F "audio=@song.mp3" \
-F "title=Sandstorm" \
-F "artist=Darude"
# Match audio
curl -X POST http://localhost:8080/api/match \
-F "audio=@clip.wav"
# List songs
curl http://localhost:8080/api/songs# Serve frontend
cd web/public && python3 -m http.server 8080
# or
cd web && npx serve public
# Open http://localhost:8080
# Upload audio β Generate fingerprint β Matchβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLIENT OPTIONS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Option 1: CLI Tool (Local) β
β βββββββββββββββββββ β
β β ./acousticdna β β Direct database access β
β β add/match/list β β
β βββββββββββββββββββ β
β β
β Option 2: WASM Frontend (Privacy-Preserving) β
β βββββββββββββββββββ β
β β Browser β β
β β ββββββββββββ β β
β β β WASM βββββΌββ Hashes only (14 KB) β
β β βProcessingβ β Audio never uploaded! β
β β ββββββββββββ β β
β βββββββββββββββββββ β
β β
β Option 3: Traditional Upload β
β βββββββββββββββββββ β
β β Browser β β
β β Upload file βββββ Full audio file (3 MB) β
β βββββββββββββββββββ β
β β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND SERVER (Go) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β REST API β β Fingerprint β β Database β β
β β Handlers ββββ Processor ββββ (SQLite) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
β Endpoints: β
β β’ POST /api/match/hashes β WASM hashes β
β β’ POST /api/match β File upload β
β β’ POST /api/songs β Add song β
β β’ GET /api/songs β List songs β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββ
β Input Audio β (MP3, WAV, FLAC, etc.)
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β FFmpeg Convert β β Mono 16-bit PCM @ 11,025 Hz
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β STFT + Peaks β β Spectrogram β Constellation Points
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Fingerprints β β Combinatorial Hashes (32-bit)
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β SQLite Storage β β hash β (songID, anchorTimeMs)
βββββββββββββββββββ
Query Audio β Fingerprints β Database Lookup
β
βΌ
Time-Offset Voting
βββββββββββββββββββ
β For each match: β
β offset = db_timeβ
β - query β
β votes[song][off]β
β += 1 β
ββββββββββ¬βββββββββ
β
βΌ
Rank by Max Votes
β
βΌ
Top Matches π―
- Convert any audio format to mono 16-bit PCM WAV @ 11,025 Hz using FFmpeg
- Normalize sample rate for consistent fingerprint generation
- Window Size: 1024 samples (~93ms)
- Hop Size: 256 samples (75% overlap)
- Window Function: Hamming window
- Frequency Resolution: ~10.77 Hz/bin
- Identify spectral peaks (constellation points) in time-frequency space
- Filter by intensity threshold and local maxima
- Each peak represents a significant acoustic event
- Pair anchor peaks with target peaks within time window
- Generate 32-bit hash:
[anchorFreq(9) | targetFreq(9) | deltaTime(14)] - Store hash with precise anchor timestamp
- Query hashes against database (batch SQL query for 10-100x speedup)
- Calculate time offsets:
offset = db_time - query_time - Vote for song/offset pairs
- Return matches ranked by vote count (confidence score)
Example spectrogram of "Sandstorm" by Darude:
Frequency vs. Time representation showing spectral characteristics. Brighter regions indicate higher energy.
- Auto-download videos using yt-dlp
- Auto-extract metadata (title, artist) from video info
- Audio extraction from video containers
# CLI
./acousticDNA youtube "https://youtube.com/watch?v=dQw4w9WgXcQ"
# API
curl -X POST http://localhost:8080/api/songs/youtube \
-H "Content-Type: application/json" \
-d '{"youtube_url": "https://youtube.com/watch?v=dQw4w9WgXcQ"}'- Format conversion: MP3, WAV, FLAC, AAC, M4A, OGG, etc.
- Metadata extraction: Duration, sample rate, channels
- Audio normalization: Consistent 11,025 Hz mono output
- Client-side processing: Audio fingerprinting in browser
- Privacy preservation: Only hashes sent to server (not audio)
- Bandwidth optimization: 14 KB vs 3 MB (99.5% reduction)
| Variable | Default | Description |
|---|---|---|
ACOUSTIC_DB_PATH |
acousticdna.sqlite3 |
SQLite database file path |
ACOUSTIC_TEMP_DIR |
/tmp |
Temporary file directory |
PORT |
8080 |
HTTP server port |
Server:
./server \
-port 8080 \
-db acousticdna.sqlite3 \
-temp /tmp \
-rate 11025 \
-origins "*"| Parameter | Value | Description |
|---|---|---|
| Sample Rate | 11,025 Hz | Optimized for fingerprinting |
| Bit Depth | 16-bit PCM | Signed integer format |
| Channels | Mono | Stereo averaged to mono |
| Window Size | 1024 samples | STFT frame length |
| Hop Size | 256 samples | 75% overlap |
| Window Function | Hamming | 0.54 - 0.46Γcos(2Οn/(N-1)) |
| Database Size | Hashes/Query | Query Time | Accuracy |
|---|---|---|---|
| 100 songs | ~10,000 | 50-100ms | 95%+ |
| 1,000 songs | ~10,000 | 200-400ms | 90%+ |
| 10,000 songs | ~10,000 | 1-2s | 85%+ |
| Duration | Samples | Hashes | Processing Time |
|---|---|---|---|
| 10s | 441,000 | ~1,200 | 500-800ms |
| 30s | 1,323,000 | ~3,600 | 1.5-2.5s |
| 3min | 7,938,000 | ~21,600 | 8-12s |
- Old (N queries): 10,000 hashes Γ 2ms = 20 seconds
- New (1 query): 10,000 hashes = 50-200ms
- Improvement: 10-100x faster
- Traditional upload: 3 MB audio file
- WASM hash upload: 14 KB hashes
- Bandwidth reduction: 99.5%
βββ acousticdna.sqlite3 # Fingerprint database
βββ cmd
β βββ cli
β β βββ main.go # Terminal commands (add/match/list)
β βββ server
β β βββ handlers.go # What happens when API called
β β βββ main.go # Starts the HTTP server
β β βββ routes.go # Maps URLs to handlers
β β βββ types.go # Server data structures
β βββ wasm
β βββ main.go # Runs in browser
βββ go.mod
βββ go.sum
βββ pkg
β βββ acousticdna
β β βββ audio
β β β βββ metadata.go # Gets audio info via FFprobe
β β β βββ processor.go # Converts audio via FFmpeg
β β β βββ reader.go # Reads audio files
β β βββ config.go # App settings
β β βββ fingerprint
β β β βββ generator.go # Orchestrates fingerprinting
β β β βββ hasher.go # Creates hashes from peaks
β β β βββ peaks.go # Finds peaks in spectrum
β β β βββ spectrogram.go # Builds time-frequency map
β β βββ interfaces.go # Defines contracts
β β βββ service.go # Main business logic
β β βββ storage
β β β βββ sqlite.go # Talks to database
β β βββ storage_adapter.go # Bridges interfaces
β β βββ types.go # Core data structures
β βββ logger
β β βββ logger.go # Logging helper
β βββ models
β β βββ api.go # HTTP request/response shapes
β β βββ database.go # Database table structures
β β βββ domain.go # Business objects
β βββ utils
β βββ crypto.go # Hashing helpers
β βββ files.go # File operations
β βββ uuid.go # Unique ID generator
β βββ youtube.go # Downloads with yt-dlp
βββ README.md
βββ refrence_scripts
β βββ download_yt.go # Example YouTube downloader
β βββ make-spectorgram.go # Example spectrogram maker
βββ scripts
β βββ build-wasm.sh # Compiles to WebAssembly
βββ test/
βββ wasm
β βββ acousticdna.wasm
βββ web
βββ public
β βββ fingerprint.wasm # Browser-side processor
β βββ index.html # The web interface
β βββ wasm_exec.js # Go's WASM glue code
β βββ wasm.js # Loads the WASM module
βββ src
βββ api
βββ wasm.js # JS wrapper for WASM calls
- Custom STFT implementation with Hamming windowing
- Combinatorial hash generation from spectral peaks
- Time-coherence voting for robust matching
- Batch SQL optimization for hash retrieval
- Optional client-side processing via WebAssembly
- Only cryptographic hashes transmitted to server
- Server cannot reconstruct original audio from hashes
- Clean architecture with interface-based design
- Comprehensive error handling and logging
- Context-based timeout management
"No peaks found in audio"
- Audio is too quiet or silent
- Try normalizing audio volume
- Ensure audio is at least 5-15 seconds long
"WASM initialization failed"
- Run
./scripts/build-wasm.shto build WASM module - Ensure
fingerprint.wasmexists inweb/public/
CORS errors in browser
- Set server
-originsflag:./server -origins "http://localhost:3000"
Database locked
- SQLite allows only one writer at a time
- Wait for current operation to complete
- Audio Fingerprinting Research Paper
- Acoustic Fingerprint - Wikipedia
- STFT Tutorial - Stanford CCRMA
- Shazam's Original Patent
β Star this repo if you find it useful!
Made with β€οΈ by Himanish Puri
