How I Built SilentEar — A Real-Time AI Accessibility Agent for Deaf Users with Gemini Live API

Published: (March 15, 2026 at 04:39 PM EDT)
7 min read
Source: Dev.to

Source: Dev.to

Why I Built This

My son was born profoundly deaf. As he began learning Pakistan Sign Language (PSL) at school, I started basic training alongside him. Through this journey I saw how daily life can be isolating and even dangerous for Deaf individuals and their families:

  • Feeling irrelevant or isolated in group settings
  • Struggling to communicate smoothly with hearing people
  • Missing critical environmental alerts (fire alarms, door knocks, baby cries, etc.)

While AI has advanced dramatically, most accessibility tools still rely on simple speech‑to‑text transcription. Transcripts miss the context that makes a sound urgent. Inspired by my son’s experience and the power of the Gemini Live API, I set out to build an agent that listens, interprets, and delivers life‑saving cues in formats Deaf users actually need (haptic feedback, screen flashes, visual sign language).

What SilentEar Does

SilentEar continuously monitors ambient audio, extracts meaning, and alerts the user in real time.

Core CapabilityHow It Works
Environmental sound detectionGemini Live API streams bidirectional audio; function calling (trigger_alert) fires custom alerts (dog bark, doorbell, siren, name‑call, etc.).
Context‑aware transcriptionGemini 3 Flash refines noisy speech into clean sentences and adds scene intelligence.
Visual sign language supportSignMoji – a library of sign‑language videos that appear with alerts. Users can add custom SignMojis via video upload, URL, or web search.
Two‑way communicationAI‑powered Voice Deck provides text‑to‑speech with smart, context‑aware phrase prediction (Gemini 3 Flash).
Caregiver dashboardTrusted contacts can view live alerts, device status, and history remotely.

Architecture Overview

Frontend

  • Framework: React 19 + TypeScript (PWA)
  • Styling: Tailwind CSS
  • Audio processing: Web Audio API + local FFT for ultra‑low‑latency alarm detection

Backend

  • Runtime: Node.js + Express on Google Cloud Run
  • Streaming: WebSocket proxy that forwards PCM audio (16 kHz) to Gemini Live API
  • AI integration: @google/genai SDK – live audio streaming, function calling (trigger_alert), and REST calls to Gemini 3 Flash for transcript refinement & scene analysis

Data & Media

ServiceRole
Supabase (PostgreSQL + Realtime + Storage)User profiles, custom SignMoji libraries, trigger definitions, caregiver sync
Cloud FirestoreAlert history, device status, trigger configurations
Google Cloud RunHosts Express + WebSocket backend, runs server‑side REST endpoints for Gemini 3 Flash processing

Audio Flow

Device Microphone → PCM Audio (16 kHz) → WebSocket → Cloud Run → Gemini Live API

                                 Haptic + Visual Alerts ← Function Call (trigger_alert)

Gemini Live Function Calling

Instead of naïve keyword matching, SilentEar gives Gemini a trigger_alert tool that knows the user’s custom categories. When the model hears a matching sound or phrase, it calls the tool, instantly notifying the device.

const triggerTool: FunctionDeclaration = {
  name: 'trigger_alert',
  description: 'Call this when an environmental sound or keyword matches alert categories.',
  parameters: {
    type: Type.OBJECT,
    properties: {
      alert_id: {
        type: Type.STRING,
        description: 'The ID of the alert to trigger.'
      },
      context: {
        type: Type.STRING,
        description: 'Short summary of what was heard.'
      }
    },
    required: ['alert_id']
  }
};

Result: Gemini distinguishes a dog barking on TV from a real dog at the door, reducing false alarms dramatically.

Gemini 3 Flash Enhancements

FeatureBenefit
Scene AnalysisPeriodic summaries (“Two people are talking nearby. Someone mentioned your name.”)
Transcript RefinementTurns choppy fragments into clean, readable sentences
Trigger Auto‑DiscoveryAnalyzes ambient patterns and suggests new alert categories for the user

All of these run as lightweight REST endpoints on Cloud Run, keeping the mobile client fast and responsive.

Full Stack Diagram (Simplified)

+----------------+      WebSocket      +----------------+      Gemini Live API
|  Mobile Device | ──────────────────► |  Cloud Run     | ──────────────────► |
|  (React PWA)   |                     |  Express WS    |                      |
+----------------+                     +----------------+                      |
        │                                   │                                 |
        │                                   ▼                                 |
        │                         +----------------+                         |
        │                         | Gemini 3 Flash|                         |
        │                         +----------------+                         |
        │                                   │                                 |
        ▼                                   ▼                                 ▼
   Haptic / Visual Alerts          Refined Transcripts               Scene Summaries

Getting Started (Quick‑Start)

  1. Clone the repo

    git clone https://github.com/your‑username/silent‑ear.git
    cd silent-ear
  2. Set up environment variables (.env.local)

    GOOGLE_API_KEY=your_google_api_key
    SUPABASE_URL=...
    SUPABASE_ANON_KEY=...
    FIRESTORE_PROJECT_ID=...
  3. Run locally

    # Frontend
    npm install && npm run dev
    
    # Backend
    cd backend && npm install && npm start
  4. Deploy (optional) – push the backend to Cloud Run and the frontend to Firebase Hosting or any static‑site host.

Closing Thoughts

SilentEar shows how context‑aware AI can move beyond transcription to truly interpret the world for Deaf users. By leveraging Gemini Live’s streaming + function calling and Gemini 3 Flash’s scene intelligence, we deliver timely, multimodal alerts that keep users safe and connected.

If you’re interested in collaborating, testing, or extending the platform, feel free to open an issue or reach out directly.


Trigger Configurations

Gemini Live API

Real‑time bidirectional audio streaming with tool calling

Gemini 3 Flash

Scene intelligence, NLP post‑processing

Cloud Build

Automated CI/CD pipeline (Docker build → deploy)

Automated Deployment

Deployment is fully automated via a single cloudbuild.yaml file:

steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'gcr.io/$PROJECT_ID/silentear-backend', '.']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$PROJECT_ID/silentear-backend']
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: gcloud
    args: [
      'run', 'deploy', 'silentear-backend',
      '--image=gcr.io/$PROJECT_ID/silentear-backend',
      # …additional flags…
    ]

A single gcloud builds submit command builds the Docker image and deploys it to Cloud Run—zero manual steps.

SilentEar – Not Just a Demo

SilentEar is a production‑ready app built for real deaf users, featuring:

  • Customizable Triggers – Users define their own alert words (doorbell, fire, baby, their name) with unique vibration patterns and colors.
  • Sign Language Videos – Alerts can include ASL, BSL, or PSL sign‑language video demonstrations.
  • SignMoji – A companion sign‑language library where users can record, search, or link sign videos with AI‑generated icons, synced across devices.
  • Voice Deck – A text‑to‑speech tool with AI‑powered phrase suggestions, letting deaf users “speak” through their device.
  • Caregiver Dashboard – Family members monitor alerts in real time via Supabase real‑time subscriptions.
  • Offline Mode – Falls back to the browser Speech Recognition API when the cloud isn’t available.
  • Multi‑Language – Supports 10 languages for transcript processing.

“I’m especially proud of how seamless the SignMoji integration feels. Allowing users to instantly search the web, record their own sign‑language videos, and sync them securely into their trigger system makes the platform deeply personal and culturally meaningful. Achieving ultra‑low latency alerts through Gemini Live function calling also feels transformative in real‑world testing.”

Technical Highlights & Learnings

  • Web Audio API & Real‑Time Streaming – Gained deep experience with the Web Audio API and the constraints of real‑time streaming in modern browsers.
  • Accessibility‑First Development – Learned the nuance of Deaf culture: transcription alone is insufficient; combining environmental intelligence, visual signals, haptics, and sign language is essential for true inclusion.

Challenges Overcome

  1. WebSocket Session Management on Cloud Run – Ensured stable, long‑lived connections despite Cloud Run’s request‑based scaling.
  2. Audio Format Compatibility – The browser captures audio as Float32 PCM, while Gemini expects specific formats. Implemented a real‑time PCM encoder that converts and chunks audio for optimal streaming.
0 views
Back to Blog

Related posts

Read more »

Travigo

Travel as fast as you speak with Gemini! Where live agents meet immersive storytelling & 3D navigation. This project was created for entering the Gemini Live Ag...

Micro games

Hey Gamers! 👾 As part of the Rapid Games Prototyping module, we are tasked with reviewing a peer's game. The challenge is to analyse a prototype built in just...