Stop Ignoring Your Snore: Building a Local Sleep Apnea Detector with OpenAI Whisper and Librosa
Source: Dev.to
Introduction
Sleep is supposed to be the time when our bodies recharge, but for millions it’s a battle for air. Sleep apnea is a silent killer, often going undiagnosed because clinical sleep studies are expensive and intrusive. What if you could use audio‑signal processing and OpenAI Whisper to monitor breathing patterns locally, ensuring total privacy?
Overview
This tutorial demonstrates a hybrid approach to sleep‑apnea detection:
- Acoustic track – Librosa performs Fast Fourier Transform (FFT) to identify “silence” vs. “snoring” frequencies.
- Semantic track – Whisper’s encoder extracts high‑level audio features to distinguish gasps, chokes, and background noise.
The two tracks feed into an event‑detection engine that produces an Apnea‑Hypopnea Index (AHI) report stored locally.
graph TD
A[Raw Audio Input .wav/.mp3] --> B{FFmpeg Processing}
B --> C[Segmented Audio Chunks]
C --> D[Librosa: Spectral Analysis]
C --> E[Whisper: Feature Extraction]
D --> F[Frequency & Amplitude Thresholding]
E --> G[Acoustic Pattern Recognition]
F --> H[Event Detection Engine]
G --> H[Event Detection Engine]
H --> I[Apnea‑Hypopnea Index - AHI Report]
I --> J[Local Storage / Privacy First]
Prerequisites
- Python 3.9+
- OpenAI Whisper (audio feature extraction)
- Librosa (audio analysis)
- PyTorch (run Whisper models)
- FFmpeg (audio decoding)
pip install openai-whisper librosa torch matplotlib ffmpeg-python
Energy‑Based Breathing Analysis
First we extract the Short‑Time Fourier Transform (STFT) to examine energy distribution. Snoring typically occupies the 60 Hz – 2000 Hz range, while apnea events appear as sudden energy drops.
import librosa
import numpy as np
def analyze_breathing_energy(audio_path):
# Load audio (downsampled to 16 kHz for Whisper compatibility)
y, sr = librosa.load(audio_path, sr=16000)
# Compute STFT magnitude and RMS energy
stft = np.abs(librosa.stft(y))
energy = librosa.feature.rms(y=y)
# Detect “silent” patches longer than 10 s (potential apnea)
threshold = 0.01 # Adjust based on ambient noise
# silent_frames = energy # (logic to identify frames below threshold for >10 s)
return y, sr, energy
def detect_apnea_events(audio_path):
y, sr, silent_frames = analyze_breathing_energy(audio_path)
events = []
# Simplified sliding‑window logic
for i in range(0, len(silent_frames[0]), 100):
if np.all(silent_frames[0][i : i + 50]): # Potential apnea duration
start_time = librosa.frames_to_time(i, sr=sr)
events.append(f"Apnea warning at {start_time:.2f} seconds")
return events
print(detect_apnea_events("sleep_record.wav"))
Visualizing Breathing Patterns
Using Matplotlib (and Librosa’s display utilities) we can plot amplitude over time to spot flatlines and compensatory spikes.
import matplotlib.pyplot as plt
import librosa.display
def plot_breathing(y, sr):
plt.figure(figsize=(12, 4))
librosa.display.waveshow(y, sr=sr, alpha=0.5)
plt.title("Nocturnal Breathing Pattern")
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.show()
Next Steps
- Train a snore classifier (e.g., Random Forest) on top of Whisper embeddings.
- Integrate the pipeline with a mobile app (Flutter, React Native) for real‑time bedside alerts.
- Explore high‑concurrency audio streaming and medical‑LLM summarization as described in the WellAlly Tech Blog.
Disclaimer
This project is for educational purposes only and is not a substitute for professional medical advice. Always consult a qualified healthcare provider for sleep‑related health concerns.