Realtime Multimodal AI on Ray-Ban Meta Glasses with Gemini Live & LiveKit

Published: 3 months ago (February 3, 2026 at 11:27 AM EST)

3 min read

Source: Dev.to

Source: Dev.to

Cover image for Realtime Multimodal AI on Ray-Ban Meta Glasses with Gemini Live & LiveKit

Architecture

The setup involves several layers to ensure low‑latency, secure communication between the wearable device and the AI:

Meta Ray‑Ban Glasses – Capture video and audio, connecting via Bluetooth to your phone.
Phone (Android/iOS) – Acts as the gateway, connecting via WebRTC to LiveKit Cloud.
LiveKit Cloud – Serves as a secure, high‑performance proxy for the Gemini Live API.
Gemini Live API – Processes the stream via WebSockets, enabling real‑time multimodal interaction.

Architecture diagram

Backend: Building the Gemini Live Agent

We use the LiveKit Agents framework to act as a secure WebRTC proxy for the Gemini Live API. This agent joins the LiveKit room, listens to the audio, and processes the video stream from the glasses.

Setting up the Assistant

The core of our agent is the AgentSession. We use the google.beta.realtime.RealtimeModel to interface with Gemini and enable video_input in the RoomOptions so the agent can “see.”

@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession(
        llm=google.beta.realtime.RealtimeModel(
            model="gemini-2.5-flash-native-audio-preview-12-2025",
            proactivity=True,
            enable_affective_dialog=True,
        ),
        vad=ctx.proc.userdata["vad"],
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            video_input=True,
        ),
    )
    await ctx.connect()
    await session.generate_reply()

By setting video_input=True, the agent automatically requests the video track from the room, which in this case is the 1 FPS stream coming from the glasses.

Running the Agent

To start your agent in development mode and make it accessible globally via LiveKit Cloud:

uv run agent.py dev

Find the full Gemini Live vision agent example in the LiveKit docs.

Connection & Authentication

CLI Token Generation

For testing and demos, you can quickly generate a short‑lived access token using the LiveKit CLI:

lk token create \
  --api-key  \
  --api-secret  \
  --join \
  --room  \
  --identity  \
  --valid-for 24h

In a production environment, always issue tokens from a secure backend to keep your API secrets safe (see LiveKit’s authentication guide).

Frontend: Meta Wearables Integration

This example targets Android devices (e.g., Google Pixel). You’ll need the Meta Wearables Toolkit and the sample project.

Clone the sample – Get the Android client example.
Configure local.properties – Add your GitHub token as required by the Meta SDK.

Update connection details – In StreamScreen.kt, replace the server URL and token with your LiveKit details:

// streamViewModel.connectToLiveKit
connectToLiveKit(
    url = "wss://your-project.livekit.cloud",
    token = "your-generated-token"
)

Run the app – Connect your device via USB and deploy from Android Studio.

Conclusion

By bridging Meta Wearables with Gemini Live via LiveKit, we’ve created a powerful, low‑latency vision AI experience. The architecture is scalable and secure, providing a foundation for the next generation of wearable AI applications.

Resources

Happy hacking! 🚀

Realtime Multimodal AI on Ray-Ban Meta Glasses with Gemini Live & LiveKit

Architecture

Backend: Building the Gemini Live Agent

Setting up the Assistant

Running the Agent

Connection & Authentication

CLI Token Generation

Frontend: Meta Wearables Integration

Conclusion

Resources

Related posts

Your AI Agent Just Got a Credit Card: Introducing x402 Bazaar

Smartfind.ai

Building a Jedi-Style Hand Gesture Interface with TensorFlow.js: Control Your Browser Without Touching Anything

How to Sync AI Skills Across Claude Code, OpenClaw, and Codex in 2 Minutes