How I Built SilentEar — A Real-Time AI Accessibility Agent for Deaf Users with Gemini Live API

Published: 1 month ago (March 15, 2026 at 04:39 PM EDT)

7 min read

Source: Dev.to

Source: Dev.to

Why I Built This

My son was born profoundly deaf. As he began learning Pakistan Sign Language (PSL) at school, I started basic training alongside him. Through this journey I saw how daily life can be isolating and even dangerous for Deaf individuals and their families:

Feeling irrelevant or isolated in group settings
Struggling to communicate smoothly with hearing people
Missing critical environmental alerts (fire alarms, door knocks, baby cries, etc.)

While AI has advanced dramatically, most accessibility tools still rely on simple speech‑to‑text transcription. Transcripts miss the context that makes a sound urgent. Inspired by my son’s experience and the power of the Gemini Live API, I set out to build an agent that listens, interprets, and delivers life‑saving cues in formats Deaf users actually need (haptic feedback, screen flashes, visual sign language).

What SilentEar Does

SilentEar continuously monitors ambient audio, extracts meaning, and alerts the user in real time.

Core Capability	How It Works
Environmental sound detection	Gemini Live API streams bidirectional audio; function calling (`trigger_alert`) fires custom alerts (dog bark, doorbell, siren, name‑call, etc.).
Context‑aware transcription	Gemini 3 Flash refines noisy speech into clean sentences and adds scene intelligence.
Visual sign language support	SignMoji – a library of sign‑language videos that appear with alerts. Users can add custom SignMojis via video upload, URL, or web search.
Two‑way communication	AI‑powered Voice Deck provides text‑to‑speech with smart, context‑aware phrase prediction (Gemini 3 Flash).
Caregiver dashboard	Trusted contacts can view live alerts, device status, and history remotely.

Architecture Overview

Frontend

Framework: React 19 + TypeScript (PWA)
Styling: Tailwind CSS
Audio processing: Web Audio API + local FFT for ultra‑low‑latency alarm detection

Backend

Runtime: Node.js + Express on Google Cloud Run
Streaming: WebSocket proxy that forwards PCM audio (16 kHz) to Gemini Live API
AI integration: @google/genai SDK – live audio streaming, function calling (trigger_alert), and REST calls to Gemini 3 Flash for transcript refinement & scene analysis

Data & Media

Service	Role
Supabase (PostgreSQL + Realtime + Storage)	User profiles, custom SignMoji libraries, trigger definitions, caregiver sync
Cloud Firestore	Alert history, device status, trigger configurations
Google Cloud Run	Hosts Express + WebSocket backend, runs server‑side REST endpoints for Gemini 3 Flash processing

Audio Flow

Device Microphone → PCM Audio (16 kHz) → WebSocket → Cloud Run → Gemini Live API
                                                                      ↓
                                 Haptic + Visual Alerts ← Function Call (trigger_alert)

Gemini Live Function Calling

Instead of naïve keyword matching, SilentEar gives Gemini a trigger_alert tool that knows the user’s custom categories. When the model hears a matching sound or phrase, it calls the tool, instantly notifying the device.

const triggerTool: FunctionDeclaration = {
  name: 'trigger_alert',
  description: 'Call this when an environmental sound or keyword matches alert categories.',
  parameters: {
    type: Type.OBJECT,
    properties: {
      alert_id: {
        type: Type.STRING,
        description: 'The ID of the alert to trigger.'
      },
      context: {
        type: Type.STRING,
        description: 'Short summary of what was heard.'
      }
    },
    required: ['alert_id']
  }
};

Result: Gemini distinguishes a dog barking on TV from a real dog at the door, reducing false alarms dramatically.

Gemini 3 Flash Enhancements

Feature	Benefit
Scene Analysis	Periodic summaries (“Two people are talking nearby. Someone mentioned your name.”)
Transcript Refinement	Turns choppy fragments into clean, readable sentences
Trigger Auto‑Discovery	Analyzes ambient patterns and suggests new alert categories for the user

All of these run as lightweight REST endpoints on Cloud Run, keeping the mobile client fast and responsive.

Full Stack Diagram (Simplified)

+----------------+      WebSocket      +----------------+      Gemini Live API
|  Mobile Device | ──────────────────► |  Cloud Run     | ──────────────────► |
|  (React PWA)   |                     |  Express WS    |                      |
+----------------+                     +----------------+                      |
        │                                   │                                 |
        │                                   ▼                                 |
        │                         +----------------+                         |
        │                         | Gemini 3 Flash|                         |
        │                         +----------------+                         |
        │                                   │                                 |
        ▼                                   ▼                                 ▼
   Haptic / Visual Alerts          Refined Transcripts               Scene Summaries

Getting Started (Quick‑Start)

Clone the repo

git clone https://github.com/your‑username/silent‑ear.git
cd silent-ear

Set up environment variables (.env.local)

GOOGLE_API_KEY=your_google_api_key
SUPABASE_URL=...
SUPABASE_ANON_KEY=...
FIRESTORE_PROJECT_ID=...

Run locally

# Frontend
npm install && npm run dev

# Backend
cd backend && npm install && npm start

Deploy (optional) – push the backend to Cloud Run and the frontend to Firebase Hosting or any static‑site host.

Closing Thoughts

SilentEar shows how context‑aware AI can move beyond transcription to truly interpret the world for Deaf users. By leveraging Gemini Live’s streaming + function calling and Gemini 3 Flash’s scene intelligence, we deliver timely, multimodal alerts that keep users safe and connected.

If you’re interested in collaborating, testing, or extending the platform, feel free to open an issue or reach out directly.

Trigger Configurations

Gemini Live API

Real‑time bidirectional audio streaming with tool calling

Gemini 3 Flash

Scene intelligence, NLP post‑processing

Cloud Build

Automated CI/CD pipeline (Docker build → deploy)

Automated Deployment

Deployment is fully automated via a single cloudbuild.yaml file:

steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'gcr.io/$PROJECT_ID/silentear-backend', '.']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$PROJECT_ID/silentear-backend']
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: gcloud
    args: [
      'run', 'deploy', 'silentear-backend',
      '--image=gcr.io/$PROJECT_ID/silentear-backend',
      # …additional flags…
    ]

A single gcloud builds submit command builds the Docker image and deploys it to Cloud Run—zero manual steps.

SilentEar – Not Just a Demo

SilentEar is a production‑ready app built for real deaf users, featuring:

Customizable Triggers – Users define their own alert words (doorbell, fire, baby, their name) with unique vibration patterns and colors.
Sign Language Videos – Alerts can include ASL, BSL, or PSL sign‑language video demonstrations.
SignMoji – A companion sign‑language library where users can record, search, or link sign videos with AI‑generated icons, synced across devices.
Voice Deck – A text‑to‑speech tool with AI‑powered phrase suggestions, letting deaf users “speak” through their device.
Caregiver Dashboard – Family members monitor alerts in real time via Supabase real‑time subscriptions.
Offline Mode – Falls back to the browser Speech Recognition API when the cloud isn’t available.
Multi‑Language – Supports 10 languages for transcript processing.

“I’m especially proud of how seamless the SignMoji integration feels. Allowing users to instantly search the web, record their own sign‑language videos, and sync them securely into their trigger system makes the platform deeply personal and culturally meaningful. Achieving ultra‑low latency alerts through Gemini Live function calling also feels transformative in real‑world testing.”

Technical Highlights & Learnings

Web Audio API & Real‑Time Streaming – Gained deep experience with the Web Audio API and the constraints of real‑time streaming in modern browsers.
Accessibility‑First Development – Learned the nuance of Deaf culture: transcription alone is insufficient; combining environmental intelligence, visual signals, haptics, and sign language is essential for true inclusion.

Challenges Overcome

WebSocket Session Management on Cloud Run – Ensured stable, long‑lived connections despite Cloud Run’s request‑based scaling.
Audio Format Compatibility – The browser captures audio as Float32 PCM, while Gemini expects specific formats. Implemented a real‑time PCM encoder that converts and chunks audio for optimal streaming.

How I Built SilentEar — A Real-Time AI Accessibility Agent for Deaf Users with Gemini Live API

Why I Built This

What SilentEar Does

Architecture Overview

Frontend

Backend

Data & Media

Audio Flow

Gemini Live Function Calling

Gemini 3 Flash Enhancements

Full Stack Diagram (Simplified)

Getting Started (Quick‑Start)

Closing Thoughts

Trigger Configurations

Gemini Live API

Gemini 3 Flash

Cloud Build

Automated Deployment

SilentEar – Not Just a Demo

Technical Highlights & Learnings

Challenges Overcome

Related posts

Why Open Source AI Tools Are Quietly Winning

Travigo

Trust Debt: The Production Crisis Hidden Inside AI-Generated Codebases

Micro games

Why I Built This

What SilentEar Does

Architecture Overview

Frontend

Backend

Data & Media

Audio Flow

Gemini Live Function Calling

Gemini 3 Flash Enhancements

Full Stack Diagram (Simplified)

Getting Started (Quick‑Start)

Closing Thoughts

Trigger Configurations

Gemini Live API

Gemini 3 Flash

Cloud Build

Automated Deployment

SilentEar – Not Just a Demo

Technical Highlights & Learnings

Challenges Overcome

Related posts

Why Open Source AI Tools Are Quietly Winning

Travigo

Trust Debt: The Production Crisis Hidden Inside AI-Generated Codebases

Micro games

Gemini Live Function Calling

Gemini 3 Flash Enhancements

Gemini 3 Flash