MediaTek NPU and LiteRT: Powering the next generation of on-device AI

Published: 1 month ago (December 9, 2025 at 02:18 AM EST)

4 min read

Source: Google Developers Blog

Key features of the LiteRT NeuroPilot Accelerator

Moving well beyond basic acceleration, the LiteRT NeuroPilot Accelerator provides a unified development workflow and sophisticated features designed to productionize AI on MediaTek NPUs. Highlights include:

Seamless and unified deployment workflow – Access various MediaTek NPUs via a unified API that abstracts SDK complexities. Choose between two distinct compilation workflows: offline (Ahead‑of‑Time, AOT) and online (on‑device), giving you flexibility to minimize first‑run latency or enable platform‑agnostic model distribution.
Rich generative AI capabilities – Our collaboration unlocks the full potential of state‑of‑the‑art models like the Gemma family, enabling sophisticated generative AI features—from advanced text generation to new multimodal applications—directly on NPU.
Efficient, cross‑platform development – A new, simplified C++ API (an improvement on the previous C API) makes building highly efficient ML pipelines easier. The API works seamlessly with Native Hardware Buffer Interoperability, allowing zero‑copy data passing from AHardwareBuffer directly to the NPU, as well as automatic conversion from OpenGL/OpenCL buffers. This is critical for high‑throughput, real‑time camera and video applications.

Seamless and unified deployment workflow

Traditionally, developers needed to build for numerous combinations of SoC providers and versions, managing the distribution of compiled models and runtimes for each. To solve this, we created a simple 3‑step workflow to get your models running with NPU acceleration.

The full, detailed guide with a Colab notebook and sample app is available on our LiteRT NPU documentation.

Step 1: AOT compilation for target SoCs (optional).
Use the LiteRT Python library to compile your .tflite model to the supported SoCs. See the LiteRT AOT Compilation Tutorial for details. While optional, AOT compilation is highly recommended for larger models to reduce on‑device initialization time. This step is not required for on‑device compilation.
Step 2: Deploy with Google Play for On‑device AI (PODAI) if on Android.
Use LiteRT to export the model assets and required runtime libraries into an “AI Pack”, the format used by PODAI. Copy the AI Pack into your Android app project. When users install your app from Google Play, the service analyzes the device and automatically delivers the model and runtime to a compatible device.
Step 3: Inference using LiteRT Runtime.
LiteRT abstracts away hardware fragmentation. For both AOT and on‑device compilation, simply load the model and specify Accelerator.NPU in the options. LiteRT handles the rest and includes a robust fallback mechanism: you can specify GPU or CPU as secondary options, and LiteRT will automatically use them if the NPU is unavailable.

AOT and on‑device compilation

With the new LiteRT NeuroPilot Accelerator, we moved from a high‑level wrapper to a direct, native integration with the NeuroPilot compiler and runtime. This enables powerful Ahead‑of‑Time (AOT) compilation workflows that were previously out of reach, giving developers flexibility in their deployment strategy:

Offline (AOT) compilation – Best suited for large, complex models where the target SoC is known. Compiling ahead‑of‑time significantly reduces initialization costs and lowers memory usage when the user launches your app.
Online (on‑device) compilation – Ideal for platform‑agnostic distribution of small models. The model is compiled on the user’s device during initialization, requiring no extra preparation step but incurring a higher first‑run cost.

The two approaches compared for a large model (e.g., Gemma 3 270M) show that on‑device compilation can take over a minute, making AOT the more practical choice for production.

Gemma 3 270 AOT_JIT

Rich generative AI capabilities with Gemma and other open‑weight models

On supported Android devices you can use Gemini Nano through ML Kit. For markets where Gemini Nano is not supported or for use cases requiring deeper customization, we now unlock the full potential of open‑weight models. This includes Google’s Gemma model family—lightweight, state‑of‑the‑art open models optimized specifically for on‑device use cases.

As announced at MediaTek’s recent Dimensity 9500 event, our collaboration brings optimized, production‑ready support for the following models on their latest chipsets:

Qwen3 0.6B – Foundation models powering new AI experiences in Mainland China by OEMs such as Xiaomi, Huawei, and Vivo.
Gemma 3 270M – A hyper‑efficient, compact base model designed for task‑specific fine‑tuning, enabling high‑speed, low‑latency features like sentiment analysis or entity extraction.

MediaTek NPU and LiteRT: Powering the next generation of on-device AI

Key features of the LiteRT NeuroPilot Accelerator

Seamless and unified deployment workflow

AOT and on‑device compilation

Rich generative AI capabilities with Gemma and other open‑weight models

Related posts

Introducing A2UI: An open project for agent-driven interfaces

Build with Google Antigravity, our new agentic development platform

Building with Gemini 3 in Jules

Don't Trust, Verify: Building End-to-End Confidential Applications on Google Cloud