Skip to content

himanishpuri/AcousticDNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎡 AcousticDNA

Go Version WASM

Audio fingerprinting system built from scratch in Go. Identify songs from short audio clips using Shazam-like algorithms, with optional client-side WebAssembly processing for complete privacy.


✨ Features

  • 🎡 Shazam-Grade Matching - Identifies songs from 5-15 second clips with background noise
  • πŸ”’ Privacy-Preserving - Optional WASM processing keeps audio in browser
  • 🎼 Universal Audio Support - MP3, WAV, FLAC, AAC, M4A, OGG via FFmpeg
  • πŸ“Ή YouTube Integration - Auto-download and extract metadata from URLs
  • πŸ’» Multiple Interfaces - CLI tool, REST API, and WASM web frontend

πŸš€ Installation

Local Installation

Prerequisites:

# Clone and build
git clone https://github.com/himanishpuri/AcousticDNA.git
cd AcousticDNA
go mod download

# Build CLI
go build -o acousticDNA ./cmd/cli/

# Build server
go build -o server ./cmd/server/

# Build WASM (optional)
GOOS=js GOARCH=wasm go build -o web/public/fingerprint.wasm ./cmd/wasm/

πŸ“– Usage

CLI

# Add song from file
./acousticDNA add song.mp3 --title "Sandstorm" --artist "Darude"

# Add from YouTube
./acousticDNA add --youtube-url "https://youtube.com/watch?v=VIDEO_ID"

# Match audio
./acousticDNA match recording.wav

# List songs
./acousticDNA list

# Delete song
./acousticDNA delete <song-id>

REST API

# Start server
./server -port 8080

# Add song
curl -X POST http://localhost:8080/api/songs \
  -F "audio=@song.mp3" \
  -F "title=Sandstorm" \
  -F "artist=Darude"

# Match audio
curl -X POST http://localhost:8080/api/match \
  -F "audio=@clip.wav"

# List songs
curl http://localhost:8080/api/songs

WASM Web Interface

# Serve frontend
cd web/public && python3 -m http.server 8080

# or
cd web && npx serve public

# Open http://localhost:8080
# Upload audio β†’ Generate fingerprint β†’ Match

πŸ—οΈ Architecture

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      CLIENT OPTIONS                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                               β”‚
β”‚  Option 1: CLI Tool (Local)                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚ ./acousticdna   β”‚ β†’ Direct database access              β”‚
β”‚  β”‚ add/match/list  β”‚                                        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β”‚                                                               β”‚
β”‚  Option 2: WASM Frontend (Privacy-Preserving)              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚   Browser       β”‚                                        β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚                                        β”‚
β”‚  β”‚  β”‚   WASM   │───┼─→ Hashes only (14 KB)                β”‚
β”‚  β”‚  β”‚Processingβ”‚   β”‚   Audio never uploaded!               β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚                                        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β”‚                                                               β”‚
β”‚  Option 3: Traditional Upload                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚   Browser       β”‚                                        β”‚
β”‚  β”‚  Upload file    │───→ Full audio file (3 MB)            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β”‚                                                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BACKEND SERVER (Go)                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚  REST API    β”‚  β”‚  Fingerprint β”‚  β”‚   Database   β”‚      β”‚
β”‚  β”‚  Handlers    │─→│  Processor   │─→│   (SQLite)   β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                                               β”‚
β”‚  Endpoints:                                                  β”‚
β”‚  β€’ POST /api/match/hashes  ← WASM hashes                   β”‚
β”‚  β€’ POST /api/match         ← File upload                    β”‚
β”‚  β€’ POST /api/songs         ← Add song                       β”‚
β”‚  β€’ GET  /api/songs         ← List songs                     β”‚
β”‚                                                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Audio Processing Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Input Audio    β”‚  (MP3, WAV, FLAC, etc.)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FFmpeg Convert  β”‚  β†’ Mono 16-bit PCM @ 11,025 Hz
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STFT + Peaks   β”‚  β†’ Spectrogram β†’ Constellation Points
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Fingerprints   β”‚  β†’ Combinatorial Hashes (32-bit)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SQLite Storage  β”‚  β†’ hash β†’ (songID, anchorTimeMs)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Matching Algorithm

Query Audio β†’ Fingerprints β†’ Database Lookup
                                   β”‚
                                   β–Ό
                         Time-Offset Voting
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚ For each match: β”‚
                         β”‚ offset = db_timeβ”‚
                         β”‚        - query  β”‚
                         β”‚ votes[song][off]β”‚
                         β”‚        += 1     β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
                         Rank by Max Votes
                                  β”‚
                                  β–Ό
                           Top Matches 🎯

πŸ”¬ How It Works

1. Audio Preprocessing

  • Convert any audio format to mono 16-bit PCM WAV @ 11,025 Hz using FFmpeg
  • Normalize sample rate for consistent fingerprint generation

2. Spectrogram Generation (STFT)

  • Window Size: 1024 samples (~93ms)
  • Hop Size: 256 samples (75% overlap)
  • Window Function: Hamming window
  • Frequency Resolution: ~10.77 Hz/bin

3. Peak Extraction

  • Identify spectral peaks (constellation points) in time-frequency space
  • Filter by intensity threshold and local maxima
  • Each peak represents a significant acoustic event

4. Combinatorial Hashing

  • Pair anchor peaks with target peaks within time window
  • Generate 32-bit hash: [anchorFreq(9) | targetFreq(9) | deltaTime(14)]
  • Store hash with precise anchor timestamp

5. Time-Coherence Voting

  • Query hashes against database (batch SQL query for 10-100x speedup)
  • Calculate time offsets: offset = db_time - query_time
  • Vote for song/offset pairs
  • Return matches ranked by vote count (confidence score)

Spectrogram Visualization

Example spectrogram of "Sandstorm" by Darude:

Sandstorm Spectrogram

Frequency vs. Time representation showing spectral characteristics. Brighter regions indicate higher energy.


πŸ”— Integrations

YouTube Integration

  • Auto-download videos using yt-dlp
  • Auto-extract metadata (title, artist) from video info
  • Audio extraction from video containers
# CLI
./acousticDNA youtube "https://youtube.com/watch?v=dQw4w9WgXcQ"

# API
curl -X POST http://localhost:8080/api/songs/youtube \
  -H "Content-Type: application/json" \
  -d '{"youtube_url": "https://youtube.com/watch?v=dQw4w9WgXcQ"}'

FFmpeg Integration

  • Format conversion: MP3, WAV, FLAC, AAC, M4A, OGG, etc.
  • Metadata extraction: Duration, sample rate, channels
  • Audio normalization: Consistent 11,025 Hz mono output

WebAssembly Integration

  • Client-side processing: Audio fingerprinting in browser
  • Privacy preservation: Only hashes sent to server (not audio)
  • Bandwidth optimization: 14 KB vs 3 MB (99.5% reduction)

βš™οΈ Configuration

Environment Variables

Variable Default Description
ACOUSTIC_DB_PATH acousticdna.sqlite3 SQLite database file path
ACOUSTIC_TEMP_DIR /tmp Temporary file directory
PORT 8080 HTTP server port

CLI Flags

Server:

./server \
  -port 8080 \
  -db acousticdna.sqlite3 \
  -temp /tmp \
  -rate 11025 \
  -origins "*"

DSP Parameters

Parameter Value Description
Sample Rate 11,025 Hz Optimized for fingerprinting
Bit Depth 16-bit PCM Signed integer format
Channels Mono Stereo averaged to mono
Window Size 1024 samples STFT frame length
Hop Size 256 samples 75% overlap
Window Function Hamming 0.54 - 0.46Γ—cos(2Ο€n/(N-1))

πŸ“Š Performance

Matching Speed

Database Size Hashes/Query Query Time Accuracy
100 songs ~10,000 50-100ms 95%+
1,000 songs ~10,000 200-400ms 90%+
10,000 songs ~10,000 1-2s 85%+

Audio Processing

Duration Samples Hashes Processing Time
10s 441,000 ~1,200 500-800ms
30s 1,323,000 ~3,600 1.5-2.5s
3min 7,938,000 ~21,600 8-12s

Batch Hash Retrieval Optimization

  • Old (N queries): 10,000 hashes Γ— 2ms = 20 seconds
  • New (1 query): 10,000 hashes = 50-200ms
  • Improvement: 10-100x faster

Privacy-Preserving Mode

  • Traditional upload: 3 MB audio file
  • WASM hash upload: 14 KB hashes
  • Bandwidth reduction: 99.5%

🏒 Project Structure

β”œβ”€β”€ acousticdna.sqlite3          # Fingerprint database
β”œβ”€β”€ cmd
β”‚   β”œβ”€β”€ cli
β”‚   β”‚   └── main.go              # Terminal commands (add/match/list)
β”‚   β”œβ”€β”€ server
β”‚   β”‚   β”œβ”€β”€ handlers.go          # What happens when API called
β”‚   β”‚   β”œβ”€β”€ main.go              # Starts the HTTP server
β”‚   β”‚   β”œβ”€β”€ routes.go            # Maps URLs to handlers
β”‚   β”‚   └── types.go             # Server data structures
β”‚   └── wasm
β”‚       └── main.go              # Runs in browser
β”œβ”€β”€ go.mod
β”œβ”€β”€ go.sum
β”œβ”€β”€ pkg
β”‚   β”œβ”€β”€ acousticdna
β”‚   β”‚   β”œβ”€β”€ audio
β”‚   β”‚   β”‚   β”œβ”€β”€ metadata.go      # Gets audio info via FFprobe
β”‚   β”‚   β”‚   β”œβ”€β”€ processor.go     # Converts audio via FFmpeg
β”‚   β”‚   β”‚   └── reader.go        # Reads audio files
β”‚   β”‚   β”œβ”€β”€ config.go            # App settings
β”‚   β”‚   β”œβ”€β”€ fingerprint
β”‚   β”‚   β”‚   β”œβ”€β”€ generator.go     # Orchestrates fingerprinting
β”‚   β”‚   β”‚   β”œβ”€β”€ hasher.go        # Creates hashes from peaks
β”‚   β”‚   β”‚   β”œβ”€β”€ peaks.go         # Finds peaks in spectrum
β”‚   β”‚   β”‚   └── spectrogram.go   # Builds time-frequency map
β”‚   β”‚   β”œβ”€β”€ interfaces.go        # Defines contracts
β”‚   β”‚   β”œβ”€β”€ service.go           # Main business logic
β”‚   β”‚   β”œβ”€β”€ storage
β”‚   β”‚   β”‚   └── sqlite.go        # Talks to database
β”‚   β”‚   β”œβ”€β”€ storage_adapter.go   # Bridges interfaces
β”‚   β”‚   └── types.go             # Core data structures
β”‚   β”œβ”€β”€ logger
β”‚   β”‚   └── logger.go            # Logging helper
β”‚   β”œβ”€β”€ models
β”‚   β”‚   β”œβ”€β”€ api.go               # HTTP request/response shapes
β”‚   β”‚   β”œβ”€β”€ database.go          # Database table structures
β”‚   β”‚   └── domain.go            # Business objects
β”‚   └── utils
β”‚       β”œβ”€β”€ crypto.go            # Hashing helpers
β”‚       β”œβ”€β”€ files.go             # File operations
β”‚       β”œβ”€β”€ uuid.go              # Unique ID generator
β”‚       └── youtube.go           # Downloads with yt-dlp
β”œβ”€β”€ README.md
β”œβ”€β”€ refrence_scripts
β”‚   β”œβ”€β”€ download_yt.go           # Example YouTube downloader
β”‚   └── make-spectorgram.go      # Example spectrogram maker
β”œβ”€β”€ scripts
β”‚   └── build-wasm.sh            # Compiles to WebAssembly
β”œβ”€β”€ test/
β”œβ”€β”€ wasm
β”‚   └── acousticdna.wasm
└── web
    β”œβ”€β”€ public
    β”‚   β”œβ”€β”€ fingerprint.wasm     # Browser-side processor
    β”‚   β”œβ”€β”€ index.html           # The web interface
    β”‚   β”œβ”€β”€ wasm_exec.js         # Go's WASM glue code
    β”‚   └── wasm.js              # Loads the WASM module
    └── src
        └── api
            └── wasm.js          # JS wrapper for WASM calls

πŸŽ“ Technical Highlights

Algorithm Implementation

  • Custom STFT implementation with Hamming windowing
  • Combinatorial hash generation from spectral peaks
  • Time-coherence voting for robust matching
  • Batch SQL optimization for hash retrieval

Privacy Design

  • Optional client-side processing via WebAssembly
  • Only cryptographic hashes transmitted to server
  • Server cannot reconstruct original audio from hashes

Engineering Practices

  • Clean architecture with interface-based design
  • Comprehensive error handling and logging
  • Context-based timeout management

πŸ› Troubleshooting

"No peaks found in audio"

  • Audio is too quiet or silent
  • Try normalizing audio volume
  • Ensure audio is at least 5-15 seconds long

"WASM initialization failed"

  • Run ./scripts/build-wasm.sh to build WASM module
  • Ensure fingerprint.wasm exists in web/public/

CORS errors in browser

  • Set server -origins flag: ./server -origins "http://localhost:3000"

Database locked

  • SQLite allows only one writer at a time
  • Wait for current operation to complete

πŸ“š References


⭐ Star this repo if you find it useful!

Made with ❀️ by Himanish Puri

About

Audio fingerprinting system using spectrogram analysis to identify songs. Supports CLI, REST API, and browser-based WASM matching with YouTube integration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors