EdgeGuard is a low-power, multimodal edge AI pipeline for wildlife monitoring deployed on a Raspberry Pi.
It performs:
- Animal detection using YOLOv8
- Conditional audio health classification using a CNN on spectrograms
- Event logging and media storage (JPG, MP4, CSV)
- Edge vs Desktop benchmarking
- Overview
- System Architecture
- Installation
- Pipeline Workflow
- Decision Logic
- Outputs
- CSV Output Schema
- Full Pipeline Example
- Audio CNN Architecture
- Benchmarking
- Benchmark Results: Intel Xeon vs Raspberry Pi 5
- Energy Optimization
- Repository Structure
- Success Metrics
- Non-Goals
- Long-Term Vision
Traditional camera traps require:
- Manual review
- Centralized cloud processing
- High transmission costs
- No real-time alerting
EdgeGuard moves intelligence directly to the edge device.
- Event-driven (PIR triggered)
- No continuous inference
- No required cloud connection
- Local SD card storage
- Quantized models for efficiency
- Measurable latency + accuracy comparison
Important
All inference runs locally. No cloud dependency is required.
- Raspberry Pi 4 or 5
- PIR motion sensor
- USB camera
- USB microphone
- MicroSD card (32GB+ recommended)
- Optional battery pack
- Python 3.9+
- PyTorch
- Ultralytics YOLOv8
- TensorFlow/Keras
- OpenCV
- Librosa
- NumPy
- Matplotlib
pip install torch torchvision
pip install ultralytics
pip install opencv-python pillow numpy
pip install librosa matplotlib
pip install tensorflow
When PIR detects motion:
- Capture image
- Record 5–10 second audio clip
- Launch inference
- Return to idle
Tip
Audio inference only runs if deer is detected in the image to conserve energy.
python edgeguard_pipeline.py \
--weights yolov8n.pt \
--image ./input/frame.jpg \
--conf 0.25Output:
- Bounding boxes
- Detection confidence
- Class labels
Detections are filtered using YOLO's native class labels.
- If class ≠ target animal (e.g., deer) → discard detection
- If no target animals remain → skip audio stage
- If target animal detected → run audio model
If deer is confirmed:
- Convert waveform → Mel spectrogram
- Normalize
- Run CNN classifier
python edgeguard_pipeline.py \
--audio ./input/audio.wav \
--run-audio-classifierOutputs:
anomaly_scorehealth_status
IF deer_detected == TRUE:
Run audio CNN
IF anomaly_score > threshold:
health_status = "poor_health_indicator"
ELSE:
health_status = "normal"
ELSE:
Skip audio classification
Currently all outputs are stored locally on the SD card.
Includes:
- Bounding boxes
- YOLO confidence
- Class label
- Health classification
- Timestamp
Saved to:
/outputs/images/frame_annotated.jpg
Includes:
- Species label
- Health status
- Confidence scores
- Timestamp
Saved to:
/outputs/videos/event_001.mp4
Saved to:
/outputs/logs/edgeguard_events.csv
One row per confirmed deer event.
| Column | Type | Description |
|---|---|---|
| timestamp | ISO datetime | Event time |
| image_path | String | Annotated image |
| video_path | String | MP4 clip |
| yolo_conf | Float | Detection confidence |
| yolo_class | String | YOLO class label |
| anomaly_score | Float | Audio model output |
| health_status | String | Classification result |
| device | String | edge / desktop |
| inference_latency_ms | Float | Total latency |
python edgeguard_pipeline.py \
--weights yolov8n.pt \
--image ./input/frame.jpg \
--audio ./input/audio.wav \
--conf 0.25 \
--output-dir ./outputs \
--device edgeThe audio health classification model uses a lightweight CNN architecture optimized for ARM-based edge devices like the Raspberry Pi 5.
model = models.Sequential([
# Separable Conv is easier on ARM CPUs
layers.SeparableConv1D(32, kernel_size=3, padding='same', input_shape=input_shape),
layers.Activation('relu'),
layers.MaxPooling1D(2),
layers.SeparableConv1D(64, kernel_size=3, padding='same'),
layers.Activation('relu'),
# Flatten is faster for tiny feature sets (24 features)
layers.Flatten(),
layers.Dense(32, activation='relu'),
layers.Dropout(0.3),
layers.Dense(1, activation='sigmoid')
])Standard convolutions perform a single operation that combines spatial filtering and channel mixing. Separable convolutions split this into two steps:
- Depthwise convolution: Applies a single filter per input channel
- Pointwise convolution: 1×1 convolution to combine channels
Why this matters for edge deployment:
| Metric | Standard Conv | Separable Conv | Benefit |
|---|---|---|---|
| Parameters | k × C_in × C_out | k × C_in + C_in × C_out | ~3-9x fewer |
| Multiply-Adds | O(k × C_in × C_out × L) | O(k × C_in × L + C_in × C_out × L) | Significantly reduced |
| ARM Efficiency | Memory-bound | Compute-efficient | Better cache utilization |
For our model with 24 acoustic features:
- Standard Conv (3×1×32): 96 parameters per layer
- Separable Conv: 3 + 32 = 35 parameters (~63% reduction)
ARM Cortex-A76 cores (Pi5) have smaller caches than x86. Separable convolutions reduce memory bandwidth requirements, leading to faster inference.
For tiny feature sets (24 acoustic features), Flatten outperforms GlobalAveragePooling:
| Approach | Operation | Best For |
|---|---|---|
| GlobalAveragePooling | Averages across spatial dimension | Large feature maps (images) |
| Flatten | Preserves all features | Small feature sets (<100 features) |
With only 24 input features and 2 conv layers, the intermediate feature map is small enough that:
- Flatten adds minimal overhead
- No information loss from averaging
- Dense layers can learn fine-grained patterns
Layer (type) Output Shape Param #
================================================================
separable_conv1d (None, 24, 32) 67
activation (None, 24, 32) 0
max_pooling1d (None, 12, 32) 0
separable_conv1d_1 (None, 12, 64) 2,176
activation_1 (None, 12, 64) 0
flatten (None, 768) 0
dense (None, 32) 24,608
dropout (None, 32) 0
dense_1 (None, 1) 33
================================================================
Total params: 26,884
Trainable params: 26,884
The CNN processes 24 pre-extracted acoustic features:
| Category | Features |
|---|---|
| Temporal | duration_s, time_to_peak_s |
| Signal Quality | snr_db |
| Pitch (F0) | f0_mean_hz, f0_min_hz, f0_max_hz, voiced_ratio |
| Intensity | intensity_min_db, intensity_max_db |
| Formants | f1_mean_hz, f2_mean_hz |
| Energy | rms_energy_mean, rms_energy_std |
| Spectral | spectral_centroid_mean, spectral_bandwidth_mean, spectral_rolloff_95 |
| Zero-Crossing | zcr_mean, zcr_std |
| MFCCs | mfcc1_mean, mfcc1_std, mfcc2_mean, mfcc2_std, mfcc3_mean, mfcc3_std |
The optimized architecture achieved a 36% runtime reduction on Raspberry Pi 5:
| Version | Runtime | Peak RAM |
|---|---|---|
| Standard CNN | 94.10 s | 1,173.83 MB |
| Optimized (Separable) | 60.58 s | 1,013.39 MB |
The same pipeline runs on:
- Raspberry Pi (edge)
- Desktop CPU/GPU (baseline)
- Vision latency (ms)
- Audio latency (ms)
- Total pipeline latency
- RAM usage (MB)
- Model size (MB)
- CPU utilization
- Precision / Recall
- Accuracy degradation
Accuracy degradation formula:
((Desktop Accuracy - Edge Accuracy) / Desktop Accuracy) * 100
Note
Benchmarking is automatically logged per event when --device flag is used.
We conducted comparative benchmarks between a high-performance server CPU and an edge device to quantify the tradeoffs between compute power and energy efficiency.
| Specification | Details |
|---|---|
| CPU | Intel Xeon Gold 6148 |
| Clock Speed | 2.4 GHz (base) / 3.7 GHz (turbo) |
| Cores | 20 cores / 40 threads |
| Architecture | Skylake-SP (14nm) |
| L3 Cache | 27.5 MB |
| Memory Bandwidth | 128 GB/s (6-channel DDR4-2666) |
| TDP | 150W |
| Compute Mode | CPU-only (CUDA disabled) |
| Specification | Details |
|---|---|
| CPU | Broadcom BCM2712 (Arm Cortex-A76) |
| Clock Speed | 2.4 GHz |
| Cores | 4 cores / 4 threads |
| Architecture | Arm v8.2-A (16nm) |
| L3 Cache | 2 MB shared |
| Memory | 8 GB LPDDR4X-4267 |
| TDP | ~12W (under load) |
| Compute Mode | CPU-only |
Model: YOLOv8n (nano)
Inference Mode: CPU-only
Confidence Threshold: 0.25
Keep Threshold: 0.50
Input Resolution: Native (no preprocessing resize)
Batch Size: 1 (per-image inference)
Resource Sampling: 1 second intervals (OSC) / 0.5 second intervals (Pi)| Metric | Intel Xeon Gold 6148 | Raspberry Pi 5 | Comparison |
|---|---|---|---|
| Total Runtime | 40.46 s | 34.03 s | Pi 16% faster |
| Peak Process RAM | 2,802.58 MB | 2,461.81 MB | Pi uses 12% less |
| Peak System Memory | 7.1% | 44.2% | Xeon has more headroom |
| Peak CPU Utilization | 6.1% | 44.5% | Pi runs near capacity |
| Metric | Intel Xeon Gold 6148 | Raspberry Pi 5 (Optimized) |
|---|---|---|
| Total Runtime | 112.95 s | 60.58 s |
| Peak Process RAM | 12.19 MB | 1,013.39 MB |
| Peak CPU Utilization | 17.2% | 46.5% |
| Metric | Intel Xeon Gold 6148 | Raspberry Pi 5 |
|---|---|---|
| Total Pipeline Time | ~153.4 s | ~94.6 s |
| Vision + Audio Combined | CV + CNN | CV + CNN (Optimized) |
Computer Vision (YOLO):
- Startup RAM: 404.8 MB → Peak: 2,802.58 MB
- CPU remained low (0.3% – 6.1%) due to 20-core architecture
- Disk usage stable at ~29.1 GB used / 160.9 GB free
Audio CNN:
- Extremely lightweight: constant 12.19 MB RAM
- CPU: ~10-17% utilization
- Two-phase execution visible: preprocessing (~70s) + inference (~35s)
Computer Vision (YOLO):
- Startup RAM: 1.05 MB → Peak: 2,461.81 MB
- CPU consistently high (35-45%) — fully utilizing 4 cores
- System memory peaked at 44.2%
Audio CNN:
- RAM: peaked at 1,013.39 MB (optimized)
- CPU: 36-48% utilization
- Optimized version reduced runtime by 36%
Both platforms operate at identical base clock speeds (2.4 GHz), making this a direct comparison of:
- Core count: 20 cores (Xeon) vs 4 cores (Pi5)
- Architecture efficiency: Skylake x86 vs Cortex-A76 ARM
- Memory subsystem: 128 GB/s bandwidth vs ~34 GB/s
- Cache hierarchy: 27.5 MB L3 vs 2 MB L3
Surprising Result: Raspberry Pi 5 outperformed Intel Xeon on both pipelines
This counterintuitive result is explained by:
- Single-threaded workload: YOLOv8n inference is largely single-threaded, negating the Xeon's core count advantage
- Memory efficiency: ARM architecture's tighter memory integration benefits inference workloads
- CPU-only mode: The Xeon's strengths (AVX-512, massive parallelism) are underutilized without GPU offload
- Optimized CNN architecture: Separable convolutions are specifically designed for ARM efficiency
Intel Xeon Gold 6148 Advantages:
- Massive memory headroom (6.5-7.1% system usage vs 44%)
- Better suited for parallel batch processing
- Lower per-core utilization leaves room for concurrent tasks
- Superior for training workloads
Raspberry Pi 5 Advantages:
- Faster single-stream inference
- ~12.5x lower power consumption (12W vs 150W)
- Deployable in remote/off-grid locations
- Lower cost per node ($80 vs $3,000+)
- Sufficient for real-time single-image inference
| Metric | Intel Xeon Gold 6148 | Raspberry Pi 5 |
|---|---|---|
| TDP | 150W | ~12W |
| CV Pipeline Energy | ~6,069 J | ~408 J |
| Energy Ratio | 1x | 14.9x more efficient |
Important
Both platforms produce identical inference results when using the same model weights. The accuracy degradation is 0% — only latency and resource usage differ.
# SSH into OSC
ssh username@owens.osc.edu
# Navigate to project directory
cd /fs/scratch/PAS2136/EdgeGuard
# Run benchmark script
python src/osc_monitor.py# Run benchmark script
python src/animalClassifierTelemtry.pyBoth scripts generate:
| File | Description |
|---|---|
detections.csv |
Detection results with confidence scores |
resource_log_cpu_only.csv |
Time-series resource utilization |
| Column | Type | Description |
|---|---|---|
| timestamp | ISO datetime | Sample time |
| elapsed_s | Float | Seconds since start |
| cpu_percent_total | Float | System CPU utilization |
| proc_rss_mb | Float | Process memory (MB) |
| sys_mem_percent | Float | System memory utilization |
| disk_used_gb | Float | Disk space used |
| disk_free_gb | Float | Disk space available |
To generate comparison charts from benchmark logs:
python scripts/plot_benchmarks.py \
--xeon results/xeon_resource_log.csv \
--pi5 results/pi5_resource_log.csv \
--output benchmark_comparison.png- Event-driven execution
- Batch size = 1
- INT8 quantization
- Reduced input resolution
- Skip audio when no deer detected
- Immediate return to idle
- Separable convolutions for ARM efficiency
EdgeGuard/
│
├── edgeguard_pipeline.py
├── src/
│ ├── main.py # Audio feature extraction + CNN
│ ├── osc_monitor.py # OSC benchmark script
│ ├── animalClassifierTelemtry.py # Pi benchmark script
│ └── Requirements.txt
│
├── models/
│ ├── yolov8n.pt
│ ├── lightweight_cnn_model.h5 # Optimized audio CNN
│ └── scaler.pkl
│
├── outputs/
│ ├── images/
│ ├── videos/
│ └── logs/
│
├── results/
│ ├── xeon_resource_log.csv
│ └── pi5_resource_log.csv
│
└── README.md
- ≥ 80% precision
- ≥ 75% recall
- Detect deviation from baseline grazing
- Inference latency under 500ms (vision)
- Minimal accuracy degradation
- Reduced memory footprint
- Document performance gap between server and edge
- Validate identical accuracy across platforms
- Quantify energy efficiency advantage of edge deployment
- No cloud infrastructure
- No live GPS tracking
- No multi-species classification
- No medical diagnosis
- No ecological forecasting
Future expansions may include:
- Remote Cloud Upload
- Multi-species classification
- Real-time conservation alerts
- Federated learning
- Habitat stress analytics
EdgeGuard demonstrates:
- Deployable edge AI in remote ecosystems
- Low-cost conservation monitoring
- Multimodal intelligence (vision + audio)
- Measurable optimization tradeoffs
The innovation is not just detection —
it is optimized, measurable, deployable AI for environmental impact.