[Paper] AME: An Efficient Heterogeneous Agentic Memory Engine for Smartphones

Published: 1 week ago (November 24, 2025 at 10:03 AM EST)

2 min read

Source: arXiv

Source: arXiv

Abstract

On‑device agents on smartphones increasingly require continuously evolving memory to support personalized, context‑aware, and long‑term behaviors. To meet both privacy and responsiveness demands, user data is embedded as vectors and stored in a vector database for fast similarity search. However, most existing vector databases target server‑class environments. When ported directly to smartphones, two gaps emerge:

(G1) Hardware mismatch – mobile SoC constraints differ from vector‑database assumptions, including tight bandwidth budgets, limited on‑chip memory, and stricter data‑type and layout constraints.
(G2) Workload mismatch – on‑device usage resembles a continuously learning memory, where queries must coexist with frequent inserts, deletions, and ongoing index maintenance.

To address these challenges, we propose AME, an on‑device Agentic Memory Engine co‑designed with modern smartphone SoCs. AME introduces two key techniques:

A hardware‑aware, high‑efficiency matrix pipeline that maximizes compute‑unit utilization and exploits multi‑level on‑chip storage to sustain high throughput.
A hardware‑ and workload‑aware scheduling scheme that coordinates querying, insertion, and index rebuilding to minimize latency.

We implement AME on Snapdragon 8‑series SoCs and evaluate it on HotpotQA. In our experiments, AME improves query throughput by up to 1.4× at matched recall, achieves up to 7× faster index construction, and delivers up to 6× higher insertion throughput under concurrent query workloads.

Subjects

Distributed, Parallel, and Cluster Computing (cs.DC)

Citation

arXiv: 2511.19192 (cs.DC)
DOI: https://doi.org/10.48550/arXiv.2511.19192 (pending registration)

Submission History

v1 – Qingyu Ma, Mon, 24 Nov 2025 15:03:06 UTC (621 KB)

[Paper] AME: An Efficient Heterogeneous Agentic Memory Engine for Smartphones

Abstract

Subjects

Citation

Submission History

Related posts

[Paper] Continual Error Correction on Low-Resource Devices

[Paper] Foundry: Distilling 3D Foundation Models for the Edge

[Paper] Batch Denoising for AIGC Service Provisioning in Wireless Edge Networks

It’s code red for ChatGPT

Abstract

Subjects

Citation

Submission History

Related posts

[Paper] Continual Error Correction on Low-Resource Devices

[Paper] Foundry: Distilling 3D Foundation Models for the Edge

[Paper] Batch Denoising for AIGC Service Provisioning in Wireless Edge Networks

It&#8217;s code red for ChatGPT

It’s code red for ChatGPT