Stop Ignoring Your Snore: Building a Local Sleep Apnea Detector with OpenAI Whisper and Librosa

Published: (January 17, 2026 at 07:50 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

Sleep is supposed to be the time when our bodies recharge, but for millions it’s a battle for air. Sleep apnea is a silent killer, often going undiagnosed because clinical sleep studies are expensive and intrusive. What if you could use audio‑signal processing and OpenAI Whisper to monitor breathing patterns locally, ensuring total privacy?

Overview

This tutorial demonstrates a hybrid approach to sleep‑apnea detection:

  • Acoustic track – Librosa performs Fast Fourier Transform (FFT) to identify “silence” vs. “snoring” frequencies.
  • Semantic track – Whisper’s encoder extracts high‑level audio features to distinguish gasps, chokes, and background noise.

The two tracks feed into an event‑detection engine that produces an Apnea‑Hypopnea Index (AHI) report stored locally.

graph TD
    A[Raw Audio Input .wav/.mp3] --> B{FFmpeg Processing}
    B --> C[Segmented Audio Chunks]
    C --> D[Librosa: Spectral Analysis]
    C --> E[Whisper: Feature Extraction]
    D --> F[Frequency & Amplitude Thresholding]
    E --> G[Acoustic Pattern Recognition]
    F --> H[Event Detection Engine]
    G --> H[Event Detection Engine]
    H --> I[Apnea‑Hypopnea Index - AHI Report]
    I --> J[Local Storage / Privacy First]

Prerequisites

  • Python 3.9+
  • OpenAI Whisper (audio feature extraction)
  • Librosa (audio analysis)
  • PyTorch (run Whisper models)
  • FFmpeg (audio decoding)
pip install openai-whisper librosa torch matplotlib ffmpeg-python

Energy‑Based Breathing Analysis

First we extract the Short‑Time Fourier Transform (STFT) to examine energy distribution. Snoring typically occupies the 60 Hz – 2000 Hz range, while apnea events appear as sudden energy drops.

import librosa
import numpy as np

def analyze_breathing_energy(audio_path):
    # Load audio (downsampled to 16 kHz for Whisper compatibility)
    y, sr = librosa.load(audio_path, sr=16000)

    # Compute STFT magnitude and RMS energy
    stft = np.abs(librosa.stft(y))
    energy = librosa.feature.rms(y=y)

    # Detect “silent” patches longer than 10 s (potential apnea)
    threshold = 0.01  # Adjust based on ambient noise
    # silent_frames = energy  # (logic to identify frames below threshold for >10 s)
    return y, sr, energy
def detect_apnea_events(audio_path):
    y, sr, silent_frames = analyze_breathing_energy(audio_path)

    events = []
    # Simplified sliding‑window logic
    for i in range(0, len(silent_frames[0]), 100):
        if np.all(silent_frames[0][i : i + 50]):  # Potential apnea duration
            start_time = librosa.frames_to_time(i, sr=sr)
            events.append(f"Apnea warning at {start_time:.2f} seconds")

    return events

print(detect_apnea_events("sleep_record.wav"))

Visualizing Breathing Patterns

Using Matplotlib (and Librosa’s display utilities) we can plot amplitude over time to spot flatlines and compensatory spikes.

import matplotlib.pyplot as plt
import librosa.display

def plot_breathing(y, sr):
    plt.figure(figsize=(12, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.5)
    plt.title("Nocturnal Breathing Pattern")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.show()

Next Steps

  • Train a snore classifier (e.g., Random Forest) on top of Whisper embeddings.
  • Integrate the pipeline with a mobile app (Flutter, React Native) for real‑time bedside alerts.
  • Explore high‑concurrency audio streaming and medical‑LLM summarization as described in the WellAlly Tech Blog.

Disclaimer

This project is for educational purposes only and is not a substitute for professional medical advice. Always consult a qualified healthcare provider for sleep‑related health concerns.

Back to Blog

Related posts

Read more »

𝗗𝗲𝘀𝗶𝗴𝗻𝗲𝗱 𝗮 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻‑𝗥𝗲𝗮𝗱𝘆 𝗠𝘂𝗹𝘁𝗶‑𝗥𝗲𝗴𝗶𝗼𝗻 𝗔𝗪𝗦 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗘𝗞𝗦 | 𝗖𝗜/𝗖𝗗 | 𝗖𝗮𝗻𝗮𝗿𝘆 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁𝘀 | 𝗗𝗥 𝗙𝗮𝗶𝗹𝗼𝘃𝗲𝗿

!Architecture Diagramhttps://dev-to-uploads.s3.amazonaws.com/uploads/articles/p20jqk5gukphtqbsnftb.gif I designed a production‑grade multi‑region AWS architectu...