Quantified Self: Building a Blazing Fast Health Dashboard with DuckDB and Streamlit

Published: 2 months ago (February 8, 2026 at 08:15 PM EST)

4 min read

Source: Dev.to

Source: Dev.to

Why This Stack?

If you’ve been following modern data stacks, you know that OLAP for wearables is becoming a hot topic. Traditional Python libraries like Pandas are great, but they often struggle with the memory overhead of large nested XML structures.

Component	Why It Matters
DuckDB	“SQLite for analytics.” An in‑process columnar database that runs SQL at the speed of light.
PyArrow	Zero‑copy bridge for moving data between formats.
Streamlit	Fastest way to turn data scripts into shareable web apps.

Architecture 🏗️

The biggest challenge with wearable data is the ETL (Extract‑Transform‑Load) process. We need to turn a hierarchical XML file into a flattened, queryable Parquet format that DuckDB can devour.

graph TD
    A[Apple Health export.xml] --> B[Python XML Parser]
    B --> C[PyArrow Table]
    C --> D[Parquet Storage]
    D --> E[DuckDB Engine]
    E --> F[Streamlit Dashboard]
    F --> G[Millisecond Insights 🚀]

Prerequisites

Before we start, make sure you have your export.xml ready and the required tools installed:

pip install duckdb streamlit pandas pyarrow

Step 1 – From XML Chaos to Parquet Order

Apple’s XML format is… “unique.” We use PyArrow to define a schema and convert those records into a compressed Parquet file. This reduces the file size by up to 90 % and optimizes it for columnar reads.

import xml.etree.ElementTree as ET
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

def parse_health_data(xml_path: str, parquet_path: str = "health_data.parquet"):
    """Parse Apple Health export XML and write a Parquet file."""
    tree = ET.parse(xml_path)
    root = tree.getroot()

    # Keep only Record elements for this dashboard
    records = [
        {
            "type": rec.get("type"),
            "value": rec.get("value"),
            "unit": rec.get("unit"),
            "creationDate": rec.get("creationDate"),
            "startDate": rec.get("startDate"),
        }
        for rec in root.findall("Record")
    ]

    df = pd.DataFrame(records)

    # Convert dates and numeric values
    df["startDate"] = pd.to_datetime(df["startDate"])
    df["value"] = pd.to_numeric(df["value"], errors="coerce")

    # Arrow Table → Parquet
    table = pa.Table.from_pandas(df)
    pq.write_table(table, parquet_path, compression="snappy")
    print("✅ Transformation complete!")

# parse_health_data("export.xml")

Step 2 – The Secret Sauce: DuckDB Logic

Instead of loading the entire Parquet file into RAM with Pandas, we query it directly using DuckDB. This lets us perform complex aggregations (e.g., heart‑rate variability vs. sleep quality) in milliseconds.

import duckdb
import pandas as pd

def get_heart_rate_summary(parquet_path: str = "health_data.parquet") -> pd.DataFrame:
    """Return daily average & max heart‑rate from the Parquet file."""
    con = duckdb.connect(database=":memory:")
    query = f"""
        SELECT
            date_trunc('day', startDate) AS day,
            AVG(value) AS avg_heart_rate,
            MAX(value) AS max_heart_rate
        FROM '{parquet_path}'
        WHERE type = 'HKQuantityTypeIdentifierHeartRate'
        GROUP BY 1
        ORDER BY 1 DESC
    """
    return con.execute(query).df()

Step 3 – Building the Streamlit UI 🎨

Streamlit makes it incredibly easy to visualize these SQL results. Add sliders, date pickers, and interactive charts with just a few lines of code.

import streamlit as st
import plotly.express as px

# ----------------------------------------------------------------------
# Page configuration
# ----------------------------------------------------------------------
st.set_page_config(page_title="Quantified Self Dashboard", layout="wide")

st.title("🏃‍♂️ My Quantified Self Dashboard")
st.markdown(
    "Analyzing millions of health records with **DuckDB** speed."
)

# ----------------------------------------------------------------------
# Load data
# ----------------------------------------------------------------------
df_hr = get_heart_rate_summary()

# ----------------------------------------------------------------------
# Layout
# ----------------------------------------------------------------------
col1, col2 = st.columns(2)

with col1:
    st.subheader("Heart Rate Trends")
    fig = px.line(
        df_hr,
        x="day",
        y="avg_heart_rate",
        title="Average Daily Heart Rate",
        markers=True,
    )
    st.plotly_chart(fig, use_container_width=True)

with col2:
    st.subheader("Raw DuckDB Query Speed")
    st.code(
        """SELECT avg(value) FROM 'health_data.parquet' 
WHERE type = 'HKQuantityTypeIdentifierHeartRate'"""
    )
    st.success("Query executed in ~0.002 s")

Run the app with:

streamlit run your_script.py

The “Official” Way to Scale 🥑

While building locally is fun, production‑grade data engineering requires more robust patterns: multi‑user environments, automated ingestion, and advanced machine‑learning pipelines. For a deep dive into those topics, check out the WellAlly Blog. They provide excellent architectural patterns and production‑ready examples that go far beyond a local script.

Conclusion

By switching from raw XML parsing to a DuckDB + Parquet workflow, we’ve turned a sluggish data problem into a high‑performance analytical tool. You no longer need a massive cluster to analyze millions of health records—just a laptop, a few Python libraries, and a dash of curiosity. Happy quantifying!

What are you tracking?
Whether it’s steps, sleep, or coding hours, let me know in the comments how you’re visualizing your life! 👇

Quantified Self: Building a Blazing Fast Health Dashboard with DuckDB and Streamlit

Why This Stack?

Architecture 🏗️

Prerequisites

Step 1 – From XML Chaos to Parquet Order

Step 2 – The Secret Sauce: DuckDB Logic

Step 3 – Building the Streamlit UI 🎨

The “Official” Way to Scale 🥑

Conclusion

Related posts

Stop Guessing Your Health: Build a Unified Quantified Self Dashboard with Streamlit and ECharts

I built a Streamlit alternative because I hated 'Full-Script Reruns' (Introducing Violit) 🚀

Showcase: Connect to App DB using Cartonnage

Searcher of lyrics's musics

Why This Stack?

Architecture 🏗️

Prerequisites

Step 1 – From XML Chaos to Parquet Order

Step 2 – The Secret Sauce: DuckDB Logic

Step 3 – Building the Streamlit UI 🎨

The “Official” Way to Scale 🥑

Conclusion

Related posts

Stop Guessing Your Health: Build a Unified Quantified Self Dashboard with Streamlit and ECharts

I built a Streamlit alternative because I hated 'Full-Script Reruns' (Introducing Violit) 🚀

Showcase: Connect to App DB using Cartonnage

Searcher of lyrics's musics

Step 1 – From XML Chaos to Parquet Order

Step 2 – The Secret Sauce: DuckDB Logic

Step 3 – Building the Streamlit UI 🎨