MediaTek NPU and LiteRT: Powering the next generation of on-device AI

Published: (December 8, 2025 at 01:04 PM EST)
5 min read

Source: Google Developers Blog

The Neural Processing Unit (NPU) has become the critical enabler for the next generation of on‑device AI. By delivering maximum performance of tens of TOPS (Tera Operations Per Second) with minimal power consumption, NPUs allow devices to run sophisticated, computationally heavy generative AI models that were previously impossible on standard edge devices.

These powerful NPUs are the engine behind a massive, diverse ecosystem of products, from flagship smartphones, laptops, and tablets, to smart home hubs and IoT devices. However, deploying AI on NPUs has often been difficult, hindering broad adoption. The NPU space is highly diverse, with hundreds of SoC variants targeting different device types, creating significant hurdles for developers to manage compilers and distribute runtimes. Existing on‑device ML infrastructure is typically tailored for CPUs and GPUs, lacking deep integration with specialized NPU SDKs and their unique compilation needs. This has resulted in complex, ad‑hoc deployment workflows. Moreover, enabling sophisticated GenAI models running efficiently on NPUs requires advanced optimization and special kernels, going far beyond simple operator delegation.

Together with MediaTek, we are excited to announce the new LiteRT NeuroPilot Accelerator, a ground‑up successor for the TFLite NeuroPilot delegate, bringing a seamless deployment experience, state‑of‑the‑art LLM support, and advanced performance to millions of devices worldwide.

Key features of the LiteRT NeuroPilot Accelerator

Moving well beyond basic acceleration, the LiteRT NeuroPilot Accelerator provides a unified development workflow and sophisticated features designed to productionize AI on MediaTek NPUs. Highlights include:

  • Seamless and unified deployment workflow – Easy access to various MediaTek NPUs via a unified API, abstracting away SDK complexities. Choose between offline (Ahead‑of‑Time, AOT) and online (on‑device) compilation workflows.
  • Rich generative AI capabilities – Unlock the full potential of state‑of‑the‑art models like the Gemma family, enabling advanced text generation and multimodal applications directly on NPU.
  • Efficient, cross‑platform development – A new, simplified C++ API (improved from the previous C API) works seamlessly with Native Hardware Buffer Interoperability, allowing zero‑copy data passing from AHardwareBuffer to the NPU and automatic conversion from OpenGL/OpenCL buffers. This is critical for high‑throughput, real‑time camera and video applications.

Seamless and unified deployment workflow

Traditionally, developers needed to build for various combinations of SoC providers and versions and manage the distribution of compiled models and runtimes for each combination. To solve this, we created a simple 3‑step workflow to get models running with NPU acceleration.

The full, detailed guide with a Colab notebook and sample app is available on our LiteRT NPU documentation.

Step 1: AOT compilation for target SoCs (optional)

Use the LiteRT Python library to compile your .tflite model to the supported SoCs. See the LiteRT AOT Compilation Tutorial for details. While optional, AOT compilation is highly recommended for larger models to reduce on‑device initialization time. This step is not required for on‑device compilation.

Step 2: Deploy with Google Play for On‑device AI (PODAI) (Android)

Export the model assets and required runtime libraries into an AI Pack, the format used by PODAI. Copy the AI Pack to your Android app project. When users install your app from Google Play, the service analyzes the device and automatically delivers the model and runtime to a compatible device.

Step 3: Inference using LiteRT Runtime

LiteRT abstracts away hardware fragmentation. For both AOT and on‑device compilation, simply load the model and specify Accelerator.NPU in the options. LiteRT handles the rest and includes a robust fallback mechanism: you can specify GPU or CPU as secondary options, and LiteRT will automatically use them if the NPU is unavailable.

AOT and on‑device compilation

With the new LiteRT NeuroPilot Accelerator, we moved from a high‑level wrapper to a direct, native integration with the NeuroPilot compiler and runtime. This enables powerful Ahead‑of‑Time (AOT) compilation workflows that were previously out of reach, giving developers flexibility in their deployment strategy:

  • Offline (AOT) compilation – Best suited for large, complex models where the target SoC is known. Compiling ahead‑of‑time significantly reduces initialization costs and lowers memory usage when the user launches the app.
  • Online (on‑device) compilation – Ideal for platform‑agnostic distribution of small models. The model is compiled on the user’s device during initialization, requiring no extra preparation step but incurring a higher first‑run cost.

Comparison example

For a large model such as Gemma 3 270M, on‑device compilation can take over a minute, making AOT the more practical choice for production.

Gemma 3 270 AOT_JIT

Rich generative AI capabilities with Gemma and other open‑weight models

On supported Android devices you can use Gemini Nano through ML Kit. For markets where Gemini Nano is not supported or for use cases requiring deeper customization, we now unlock the full potential of open‑weight models. This includes Google’s Gemma model family, a set of lightweight, state‑of‑the‑art open models optimized specifically for on‑device use cases.

As announced at MediaTek’s recent Dimensity 9500 event, our collaboration brings optimized, production‑ready support for the following models on their latest chipsets:

  • Qwen3 0.6B – Foundation models powering new AI experiences in Mainland China by OEMs such as Xiaomi, Huawei, and Vivo.
  • Gemma 3 270M – A hyper‑efficient and compact base model designed for task‑specific fine‑tuning, enabling high‑speed, low‑latency features like sentiment analysis or entity extraction.
Back to Blog

Related posts

Read more »

Building with Gemini 3 in Jules

On Tuesday we introduced Gemini 3, Google’s most intelligent model that can help bring any idea to life. Today, we’re excited to share that Gemini 3 Pro will st...