Blazing fast on-device GenAI with LiteRT-LM

Published: 2 weeks ago (May 26, 2026 at 10:50 PM EDT)

1 min read

Source: Google Developers Blog

Overview

Google AI Edge’s LiteRT‑LM provides a production‑proven, highly optimized infrastructure for running Gemma 4 across cross‑platform mobile and edge environments. It actively unlocks the model’s native multimodal and agentic features on‑device by utilizing memory‑efficient dynamic loading, Multi‑Token Prediction for up to a 2.2× speedup, and advanced orchestration tools like Thinking Mode and Constrained Decoding.

Furthermore, the engine is rapidly expanding its integration surfaces beyond Android, introducing new native Swift APIs for Apple ecosystems and WebGPU‑accelerated JavaScript APIs for high‑performance, serverless browser inference.

Back to Blog

Blazing fast on-device GenAI with LiteRT-LM

Overview

Related posts

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Announcing ADK for Kotlin and ADK for Android 0.1.0: Building AI Agents on Android and Beyond

One Year of Innovation: Celebrating 100k Members in the Google Cloud x NVIDIA Developer Community

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity