🎵 AcousticDNA

Audio fingerprinting system built from scratch in Go. Identify songs from short audio clips using Shazam-like algorithms, with optional client-side WebAssembly processing for complete privacy.

✨ Features

🎵 Shazam-Grade Matching - Identifies songs from 5-15 second clips with background noise
🔒 Privacy-Preserving - Optional WASM processing keeps audio in browser
🎼 Universal Audio Support - MP3, WAV, FLAC, AAC, M4A, OGG via FFmpeg
📹 YouTube Integration - Auto-download and extract metadata from URLs
💻 Multiple Interfaces - CLI tool, REST API, and WASM web frontend

🚀 Installation

Local Installation

Prerequisites:

Go 1.25+ (Download)
FFmpeg & FFprobe (Download)
yt-dlp (Download)

# Clone and build
git clone https://github.com/himanishpuri/AcousticDNA.git
cd AcousticDNA
go mod download

# Build CLI
go build -o acousticDNA ./cmd/cli/

# Build server
go build -o server ./cmd/server/

# Build WASM (optional)
GOOS=js GOARCH=wasm go build -o web/public/fingerprint.wasm ./cmd/wasm/

📖 Usage

CLI

# Add song from file
./acousticDNA add song.mp3 --title "Sandstorm" --artist "Darude"

# Add from YouTube
./acousticDNA add --youtube-url "https://youtube.com/watch?v=VIDEO_ID"

# Match audio
./acousticDNA match recording.wav

# List songs
./acousticDNA list

# Delete song
./acousticDNA delete <song-id>

REST API

# Start server
./server -port 8080

# Add song
curl -X POST http://localhost:8080/api/songs \
  -F "audio=@song.mp3" \
  -F "title=Sandstorm" \
  -F "artist=Darude"

# Match audio
curl -X POST http://localhost:8080/api/match \
  -F "audio=@clip.wav"

# List songs
curl http://localhost:8080/api/songs

WASM Web Interface

# Serve frontend
cd web/public && python3 -m http.server 8080

# or
cd web && npx serve public

# Open http://localhost:8080
# Upload audio → Generate fingerprint → Match

🏗️ Architecture

System Overview

┌─────────────────────────────────────────────────────────────┐
│                      CLIENT OPTIONS                          │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  Option 1: CLI Tool (Local)                                 │
│  ┌─────────────────┐                                        │
│  │ ./acousticdna   │ → Direct database access              │
│  │ add/match/list  │                                        │
│  └─────────────────┘                                        │
│                                                               │
│  Option 2: WASM Frontend (Privacy-Preserving)              │
│  ┌─────────────────┐                                        │
│  │   Browser       │                                        │
│  │  ┌──────────┐   │                                        │
│  │  │   WASM   │───┼─→ Hashes only (14 KB)                │
│  │  │Processing│   │   Audio never uploaded!               │
│  │  └──────────┘   │                                        │
│  └─────────────────┘                                        │
│                                                               │
│  Option 3: Traditional Upload                               │
│  ┌─────────────────┐                                        │
│  │   Browser       │                                        │
│  │  Upload file    │───→ Full audio file (3 MB)            │
│  └─────────────────┘                                        │
│                                                               │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│                    BACKEND SERVER (Go)                       │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  REST API    │  │  Fingerprint │  │   Database   │      │
│  │  Handlers    │─→│  Processor   │─→│   (SQLite)   │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│                                                               │
│  Endpoints:                                                  │
│  • POST /api/match/hashes  ← WASM hashes                   │
│  • POST /api/match         ← File upload                    │
│  • POST /api/songs         ← Add song                       │
│  • GET  /api/songs         ← List songs                     │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Audio Processing Flow

┌─────────────────┐
│  Input Audio    │  (MP3, WAV, FLAC, etc.)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ FFmpeg Convert  │  → Mono 16-bit PCM @ 11,025 Hz
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  STFT + Peaks   │  → Spectrogram → Constellation Points
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Fingerprints   │  → Combinatorial Hashes (32-bit)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ SQLite Storage  │  → hash → (songID, anchorTimeMs)
└─────────────────┘

Matching Algorithm

Query Audio → Fingerprints → Database Lookup
                                   │
                                   ▼
                         Time-Offset Voting
                         ┌─────────────────┐
                         │ For each match: │
                         │ offset = db_time│
                         │        - query  │
                         │ votes[song][off]│
                         │        += 1     │
                         └────────┬────────┘
                                  │
                                  ▼
                         Rank by Max Votes
                                  │
                                  ▼
                           Top Matches 🎯

🔬 How It Works

1. Audio Preprocessing

Convert any audio format to mono 16-bit PCM WAV @ 11,025 Hz using FFmpeg
Normalize sample rate for consistent fingerprint generation

2. Spectrogram Generation (STFT)

Window Size: 1024 samples (~93ms)
Hop Size: 256 samples (75% overlap)
Window Function: Hamming window
Frequency Resolution: ~10.77 Hz/bin

3. Peak Extraction

Identify spectral peaks (constellation points) in time-frequency space
Filter by intensity threshold and local maxima
Each peak represents a significant acoustic event

4. Combinatorial Hashing

Pair anchor peaks with target peaks within time window
Generate 32-bit hash: [anchorFreq(9) | targetFreq(9) | deltaTime(14)]
Store hash with precise anchor timestamp

5. Time-Coherence Voting

Query hashes against database (batch SQL query for 10-100x speedup)
Calculate time offsets: offset = db_time - query_time
Vote for song/offset pairs
Return matches ranked by vote count (confidence score)

Spectrogram Visualization

Example spectrogram of "Sandstorm" by Darude:

Frequency vs. Time representation showing spectral characteristics. Brighter regions indicate higher energy.

🔗 Integrations

YouTube Integration

Auto-download videos using yt-dlp
Auto-extract metadata (title, artist) from video info
Audio extraction from video containers

# CLI
./acousticDNA youtube "https://youtube.com/watch?v=dQw4w9WgXcQ"

# API
curl -X POST http://localhost:8080/api/songs/youtube \
  -H "Content-Type: application/json" \
  -d '{"youtube_url": "https://youtube.com/watch?v=dQw4w9WgXcQ"}'

FFmpeg Integration

Format conversion: MP3, WAV, FLAC, AAC, M4A, OGG, etc.
Metadata extraction: Duration, sample rate, channels
Audio normalization: Consistent 11,025 Hz mono output

WebAssembly Integration

Client-side processing: Audio fingerprinting in browser
Privacy preservation: Only hashes sent to server (not audio)
Bandwidth optimization: 14 KB vs 3 MB (99.5% reduction)

⚙️ Configuration

Environment Variables

Variable	Default	Description
`ACOUSTIC_DB_PATH`	`acousticdna.sqlite3`	SQLite database file path
`ACOUSTIC_TEMP_DIR`	`/tmp`	Temporary file directory
`PORT`	`8080`	HTTP server port

CLI Flags

Server:

./server \
  -port 8080 \
  -db acousticdna.sqlite3 \
  -temp /tmp \
  -rate 11025 \
  -origins "*"

DSP Parameters

Parameter	Value	Description
Sample Rate	11,025 Hz	Optimized for fingerprinting
Bit Depth	16-bit PCM	Signed integer format
Channels	Mono	Stereo averaged to mono
Window Size	1024 samples	STFT frame length
Hop Size	256 samples	75% overlap
Window Function	Hamming	0.54 - 0.46×cos(2πn/(N-1))

📊 Performance

Matching Speed

Database Size	Hashes/Query	Query Time	Accuracy
100 songs	~10,000	50-100ms	95%+
1,000 songs	~10,000	200-400ms	90%+
10,000 songs	~10,000	1-2s	85%+

Audio Processing

Duration	Samples	Hashes	Processing Time
10s	441,000	~1,200	500-800ms
30s	1,323,000	~3,600	1.5-2.5s
3min	7,938,000	~21,600	8-12s

Batch Hash Retrieval Optimization

Old (N queries): 10,000 hashes × 2ms = 20 seconds
New (1 query): 10,000 hashes = 50-200ms
Improvement: 10-100x faster

Privacy-Preserving Mode

Traditional upload: 3 MB audio file
WASM hash upload: 14 KB hashes
Bandwidth reduction: 99.5%

🏢 Project Structure

├── acousticdna.sqlite3          # Fingerprint database
├── cmd
│   ├── cli
│   │   └── main.go              # Terminal commands (add/match/list)
│   ├── server
│   │   ├── handlers.go          # What happens when API called
│   │   ├── main.go              # Starts the HTTP server
│   │   ├── routes.go            # Maps URLs to handlers
│   │   └── types.go             # Server data structures
│   └── wasm
│       └── main.go              # Runs in browser
├── go.mod
├── go.sum
├── pkg
│   ├── acousticdna
│   │   ├── audio
│   │   │   ├── metadata.go      # Gets audio info via FFprobe
│   │   │   ├── processor.go     # Converts audio via FFmpeg
│   │   │   └── reader.go        # Reads audio files
│   │   ├── config.go            # App settings
│   │   ├── fingerprint
│   │   │   ├── generator.go     # Orchestrates fingerprinting
│   │   │   ├── hasher.go        # Creates hashes from peaks
│   │   │   ├── peaks.go         # Finds peaks in spectrum
│   │   │   └── spectrogram.go   # Builds time-frequency map
│   │   ├── interfaces.go        # Defines contracts
│   │   ├── service.go           # Main business logic
│   │   ├── storage
│   │   │   └── sqlite.go        # Talks to database
│   │   ├── storage_adapter.go   # Bridges interfaces
│   │   └── types.go             # Core data structures
│   ├── logger
│   │   └── logger.go            # Logging helper
│   ├── models
│   │   ├── api.go               # HTTP request/response shapes
│   │   ├── database.go          # Database table structures
│   │   └── domain.go            # Business objects
│   └── utils
│       ├── crypto.go            # Hashing helpers
│       ├── files.go             # File operations
│       ├── uuid.go              # Unique ID generator
│       └── youtube.go           # Downloads with yt-dlp
├── README.md
├── refrence_scripts
│   ├── download_yt.go           # Example YouTube downloader
│   └── make-spectorgram.go      # Example spectrogram maker
├── scripts
│   └── build-wasm.sh            # Compiles to WebAssembly
├── test/
├── wasm
│   └── acousticdna.wasm
└── web
    ├── public
    │   ├── fingerprint.wasm     # Browser-side processor
    │   ├── index.html           # The web interface
    │   ├── wasm_exec.js         # Go's WASM glue code
    │   └── wasm.js              # Loads the WASM module
    └── src
        └── api
            └── wasm.js          # JS wrapper for WASM calls

🎓 Technical Highlights

Algorithm Implementation

Custom STFT implementation with Hamming windowing
Combinatorial hash generation from spectral peaks
Time-coherence voting for robust matching
Batch SQL optimization for hash retrieval

Privacy Design

Optional client-side processing via WebAssembly
Only cryptographic hashes transmitted to server
Server cannot reconstruct original audio from hashes

Engineering Practices

Clean architecture with interface-based design
Comprehensive error handling and logging
Context-based timeout management

🐛 Troubleshooting

"No peaks found in audio"

Audio is too quiet or silent
Try normalizing audio volume
Ensure audio is at least 5-15 seconds long

"WASM initialization failed"

Run ./scripts/build-wasm.sh to build WASM module
Ensure fingerprint.wasm exists in web/public/

CORS errors in browser

Set server -origins flag: ./server -origins "http://localhost:3000"

Database locked

SQLite allows only one writer at a time
Wait for current operation to complete

📚 References

⭐ Star this repo if you find it useful!

Made with ❤️ by Himanish Puri

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
cmd		cmd
pkg		pkg
refrence_scripts		refrence_scripts
scripts		scripts
test		test
wasm		wasm
web		web
.gitignore		.gitignore
README.md		README.md
acousticdna.sqlite3		acousticdna.sqlite3
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

🎵 AcousticDNA

✨ Features

🚀 Installation

Local Installation

📖 Usage

CLI

REST API

WASM Web Interface

🏗️ Architecture

System Overview

Audio Processing Flow

Matching Algorithm

🔬 How It Works

1. Audio Preprocessing

2. Spectrogram Generation (STFT)

3. Peak Extraction

4. Combinatorial Hashing

5. Time-Coherence Voting

Spectrogram Visualization

🔗 Integrations

YouTube Integration

FFmpeg Integration

WebAssembly Integration

⚙️ Configuration

Environment Variables

CLI Flags

DSP Parameters

📊 Performance

Matching Speed

Audio Processing

Batch Hash Retrieval Optimization

Privacy-Preserving Mode

🏢 Project Structure

🎓 Technical Highlights

Algorithm Implementation

Privacy Design

Engineering Practices

🐛 Troubleshooting

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages