Project // Gunshot Classifier

Acoustic Firearm Classification

A multi-iteration classification system trained on the CADRE ballistic audio dataset, modeled on research by Robert C. Maher. Classical machine learning on physics-based features outperformed a deep neural network approach — and the reasons why are as instructive as the results.

truth@evades:~$ forensic-analysis

Foundation // Robert C. Maher

This work is modeled on the research of Robert C. Maher (Montana State University), whose publications on gunshot acoustics established the scientific basis for acoustic firearm analysis. Key findings from his research that shaped this classifier:

▸ Gunshot waveforms consist of separable components — muzzle blast, supersonic shockwave (N-wave), and reflections — each carrying discriminative information
▸ Shot-to-shot variability is significant even under controlled conditions (~10% peak pressure variation for rifles, higher for handguns)
▸ Consumer recording devices introduce 5–10ms time smearing on gunshot onsets (3+ meters of error at the speed of sound)
▸ "Conventional audio recordings have not been shown to be reliable for identifying particular firearms" — a limitation this project directly confronts

References: Maher & Shaw, AES 2008, 2014, 2016 // IEEE DSP 2006 // IEEE SAFE 2007

FIELD VALIDATION // UVU EVIDENCE

When applied to 9 real-world forensic recordings from the UVU shooting, the classifier reported low caliber-classification confidence (24.5%) — but its physics-based N-wave detection confirmed supersonic shockwave signatures in all successfully processed recordings. The model is honest about what it doesn't know and definitive about what it can detect.

↓ FULL UVU RESULTS BELOW

The Problem

Why This Matters

In forensic audio analysis, identifying the firearm used in an incident from audio recordings alone is a problem with real investigative value. Different calibers produce distinct acoustic signatures — muzzle blast characteristics, supersonic crack profiles, spectral energy distribution — but these differences are subtle and environment-dependent.

Why It's Hard

✗ Recording conditions vary wildly (distance, environment, device)
✗ Echoes and reflections contaminate the signal
✗ Different firearms can share the same caliber
✗ Consumer device microphones clip, compress, and apply AGC
✗ Domain shift: lab recordings don't sound like field recordings

Training Data // CADRE Dataset

Trained on the CADRE (Community Accessible Digital Resource for Forensics) ballistic audio dataset, combining two collection efforts: a multi-firearm, multi-orientation edge-collected dataset from the Center for Advanced Data Analytics & Systems (CADAS) and the Air Force Research Laboratory (AFRL) (Kabealo et al., Data in Brief), plus CADRE's own iPhone-recorded collection.

The CADAS/AFRL recordings were captured with multichannel microphone arrays (up to 7 channels) across multiple orientations and firing styles, then cropped to 2-second clips containing at least one gunshot. Metadata includes firearm, caliber, GPS, recording device, microphone, and per-shot timestamps.

FIREARMS

5,036

WAV FILES

CALIBER CLASSES

MIC CHANNELS

CADAS / AFRL EDGE-COLLECTED

Multi-channel array recordings (up to 7 channels) with UUID tracking. Each original recording cropped to 2-second clips guaranteed to contain at least one gunshot. Multiple variation captures per session. Includes channel means for mono analysis.

Kabealo, Wyatt, Aravamudan, Zhang, et al. — CADAS (Florida Institute of Technology) & AFRL

CADRE iPHONE RECORDINGS

Single-channel consumer device captures. Session-identified (IP_XXXX_SXX format). 18 firearm types, ~120 files each. Introduces real-world device characteristics (AGC, clipping, compression) but also device-specific confounds.

Source: cadreforensics.com/audio

FIREARMS IN DATASET:

S&W .38 Special (503) Glock 17 9mm (669) Ruger AR-556 .223 (597) Remington 870 12ga (379) AK-47 (72) AK-12 (98) M16 (100+120) M4 (100) M249 SAW (99) MG-42 (100) MP5 (100) MP40-1 (120) MP40-2 (120) IMI Desert Eagle (100) Colt 1911 .45 (120) Glock 45 (120) Kimber .45 (120) HK USP .45 (120) SIG P226 9mm (120) Lorcin .380 (120) Rem 700 .308 (60) Ruger .22 LR (119) Ruger .357 Mag (120) S&W .22 (118) S&W .38 SP (120) Sp. King .22 (120) Bolt Action .22 (118) WASR 7.62x39 (120) Win M14 .308 (62) Zastava M92 (82)

NOTE: Significant class imbalance — Glock 17 has 669 files, Rem 700 has 60. iPhone vs. professional array recordings introduce device confounds that the model can learn instead of actual ballistic signatures.

CALIBER CLASSES (v4 mapping — 30 firearms → 14 calibers):

9mm (pistol) .45 ACP (pistol) .223/5.56 (rifle) 7.62x39 (rifle) .308/7.62x51 (rifle) 12 gauge (shotgun) .38 Special (revolver) .22 LR (rimfire) .380 ACP (pistol) .357 Mag (revolver) .50 AE (pistol) 5.45x39 (rifle) 7.62x54R (rifle) 5.56 belt (support)

Feature Engineering

Feature Distribution

~200 handcrafted features per audio sample

Gunshot-Specific Features

Beyond standard audio features, the system extracts physics-based signatures derived from Maher's research on ballistic acoustics:

▸
Muzzle Blast Energy — Low-frequency (<500Hz) content from propellant gas expansion
▸
N-Wave Detection — Bipolar shockwave from supersonic projectiles. Rise time, duration, amplitude ratio
▸
Impulse Kurtosis — Statistical measure of signal impulsiveness (high for gunshots vs. other transients)
▸
Crack-to-Blast Ratio — High-frequency crack energy vs. low-frequency blast. Caliber-dependent
▸
Attack/Decay Profile — Rise time to peak and decay to -20dB. Differs by barrel length and caliber

SUPERSONIC vs SUBSONIC DISCRIMINATION

Supersonic (>343 m/s)

Generates an N-wave — a characteristic bipolar shockwave from the supersonic projectile's Mach cone. High-frequency content (>2kHz), rapid sign changes, detectable via high-freq kurtosis.

9mm, .223/5.56, 7.62x39, .308, .357 Mag

Subsonic (<343 m/s)

Muzzle blast only — no crack. Smooth low-frequency energy, longer decay, distinct spectral envelope. Absence of N-wave is itself a classification signal.

.45 ACP, .38 Special (std), 12 gauge slugs

Two Approaches // ML vs Neural Network

The classifier was developed along two parallel tracks: classical machine learning on handcrafted features, and deep transfer learning via a pretrained neural network. The results reveal something important about this problem domain.

CLASSICAL ML

AdaBoost / XGBoost / Random Forest / Ensemble

Physics-based feature engineering derived from Maher's research. 200+ handcrafted features targeting known ballistic acoustic phenomena — muzzle blast energy, N-wave characteristics, spectral band ratios, wavelet decomposition. The features encode domain knowledge about how gunshots actually work.

Best CV accuracy: 83.5% (caliber mode)

uvu_workflow iterations v1–v4

NEURAL NETWORK

YAMNet (MobileNet v1) Transfer Learning

Google's YAMNet — a deep neural network pretrained on AudioSet (10M+ hours of general audio) for sound event classification. Embeddings (3,072 dimensions) extracted and fused with acoustic features for a 4,116-dimension feature vector. YAMNet understands sound in general, but the subtle ballistic physics that distinguish calibers aren't in AudioSet's training distribution.

Train: 100% / Validation: 55.6% / CV: 59.7%

44-point overfitting gap

WHY CLASSICAL ML OUTPERFORMED THE NEURAL NETWORK

01 Domain knowledge matters more than data volume here. The handcrafted features encode acoustic physics (muzzle blast frequency content, N-wave timing, crack-to-blast ratios) that directly correspond to what makes calibers physically different. YAMNet's pretrained knowledge of "what a gunshot sounds like" is too general for caliber-level discrimination.
02 YAMNet memorized the dataset. 100% training accuracy with 55.6% validation = the network learned to recognize individual CADRE recordings, not generalizable ballistic signatures. With ~5,000 samples across 22 classes, there isn't enough data to fine-tune a deep network without massive overfitting.
03 Feature engineering is interpretable. With classical ML, you can inspect which features drive predictions (via SHAP). When the model says ".308" you can verify it's looking at the right spectral bands. A neural network's reasoning is opaque — a problem for forensic work where you need to explain your conclusions.

Evolution // Model Iterations

Each iteration addressed specific limitations discovered in the previous version. Click to expand.

Key insight: classifying by caliber instead of specific firearm dramatically improves accuracy. Calibrated ensemble with isotonic probability calibration for forensically reliable confidence scores. Highest cross-validation accuracy of any iteration.

ARCHITECTURE

Classical ML: CalibratedEnsembleClassifier (RF + GB + LightGBM + XGBoost → soft voting → isotonic calibration)

CAPABILITIES

▸Caliber-level classification (31 firearms → 14 calibers)
▸Ballistic-focused feature extraction (80+ features)
▸Isotonic probability calibration for honest confidence scores
▸Device confound removal (strips iPhone/array labels)
▸Ensemble voting with confidence assessment

KEY INNOVATION

Caliber abstraction — strips device artifacts, maps to ballistic class, achieves 83.5% vs ~60% at firearm level

The Hard Truth // Domain Shift

The best cross-validation accuracy on CADRE data is 83.5%. But when the classifier processes real-world forensic recordings (UVU evidence audio), confidence drops dramatically. This is the domain shift problem — the core challenge Maher's research predicted.

CADRE (TRAINING)

Quasi-anechoic, controlled environment
Multi-channel professional array
Known firearm, distance, position
Clean WAV, no compression
Minimal background noise

FIELD RECORDINGS (REALITY)

Reverberant, uncontrolled environment
Single phone mic with AGC and clipping
Unknown distance, angle, obstructions
Compressed, re-encoded, shared via social media
Wind, traffic, crowd noise, overlapping events

UVU EVIDENCE ANALYSIS (9 recordings):

0/9

HIGH confidence

1/9

MODERATE

8/9

LOW / UNRELIABLE

24.5%

Weighted consensus

CONFIRMED FINDING: SUPERSONIC SHOCKWAVE DETECTED

While caliber-level classification confidence was low, the classifier's N-wave detection returned a definitive result: all 8 successfully processed UVU recordings contain supersonic shockwave signatures (1 of 9 files errored during processing). This is a physics-based detection — not a statistical prediction — and it immediately constrains the problem space. The rounds fired were supersonic, ruling out subsonic calibers (.45 ACP, standard .38 Special, 12 gauge slugs). This is the classifier operating at its most reliable: binary acoustic physics, not probabilistic classification.

The caliber classification confidence being low isn't a failure of the model — it's the model honestly reporting that it's outside its training distribution. The isotonic calibration ensures the confidence scores mean something: when it says "low confidence," it means it. That honesty is more valuable than false certainty. Meanwhile, the physics-based detections (N-wave, muzzle blast characterization) still deliver actionable findings regardless of domain shift.

Synthetic Augmentation

The GunShotSynthesizer generates realistic acoustic variations of training samples, expanding the dataset 5× while simulating real-world recording conditions that the clean CADRE recordings don't capture. This is one approach to narrowing the domain gap.

ENVIRONMENT SIMULATION

Anechoic (reference)
Outdoor open / urban
Indoor small / large
Parking garage
Forest / Canyon

PHYSICS MODELING

Inverse square law attenuation
Atmospheric absorption
Mach cone delay calculation
Barrel length effects
Suppression modeling

NOISE & DEVICE

AWGN (0–100 dB SNR)
Environmental noise mixing
Mic response simulation
Directional patterns
SNR-controlled blending

Stack

Python scikit-learn AdaBoost XGBoost LightGBM TensorFlow YAMNet librosa scipy numpy SHAP Praat Audacity

This classifier is a research tool — not a definitive forensic instrument. Classical ML with physics-based features outperformed deep learning here because domain knowledge matters more than model complexity when training data is limited and the problem is grounded in known physics. All classifications include calibrated confidence scores, and results should be interpreted alongside traditional acoustic analysis methods.

// Based on research by Robert C. Maher // CADRE dataset via cadreforensics.com // Methodology at /methods