Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory
Source: VentureBeat
Google’s Open‑Source “Always On Memory Agent”
Author: Shubham Saboo – Senior AI Product Manager, Google
Release: Open‑source on the official Google Cloud Platform GitHub (MIT License)
Built With:
- Google’s Agent Development Kit (ADK) – introduced Spring 2025
- Gemini 3.1 Flash‑Lite – low‑cost, high‑throughput model launched Mar 3 2026
The repo provides a practical reference implementation for a persistent‑memory agent that can:
- Continuously ingest information (files, API input, etc.)
- Consolidate memories in the background
- Retrieve stored data later without a conventional vector database
For enterprise developers, the release is less a product launch than a signal about the future direction of agent infrastructure. It showcases a view of long‑running autonomy useful for support systems, research assistants, internal copilots, and workflow automation—while also sharpening governance concerns once memory is no longer session‑bound.
What the repo appears to do — and what it does not clearly claim
- The repository uses a multi‑agent internal architecture with specialist components for ingestion, consolidation, and querying.
- However, it does not explicitly claim to be a shared‑memory framework for multiple independent agents.
- This distinction matters: while ADK supports multi‑agent systems, this repo is best described as an always‑on memory agent (or memory layer) built with specialist sub‑agents and persistent storage.
- Even at this narrower scope, it tackles a core infrastructure problem many teams are actively solving.
The architecture favors simplicity over a traditional retrieval stack
- Continuous operation: The agent runs nonstop, ingesting text, image, audio, video, and PDF inputs.
- Storage: Structured memories are stored in SQLite.
- Consolidation: By default, memory consolidation runs every 30 minutes.
- Interfaces: Includes a local HTTP API and a Streamlit dashboard.
Provocative claim: “No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory.”
Why this matters
- Traditional retrieval stacks require separate embedding pipelines, vector stores, indexing logic, and synchronization work.
- Saboo’s design leans on the LLM to organize and update memory directly, simplifying prototypes and reducing infrastructure sprawl—especially for smaller or medium‑memory agents.
- The performance trade‑off shifts from vector‑search overhead to model latency, memory‑compaction logic, and long‑run behavioral stability.
Flash‑Lite gives the always‑on model some economic logic
- Pricing (Google):
- $0.25 per 1 M input tokens
- $1.50 per 1 M output tokens
- Speed: 2.5× faster than Gemini 2.5 Flash (time‑to‑first‑token) and 45 % higher output speed.
- Benchmarks:
- Elo 1432 on Arena.ai
- 86.9 % on GPQA Diamond
- 76.8 % on MMMU Pro
Google positions Flash‑Lite for high‑frequency tasks (translation, moderation, UI generation, simulation). These characteristics make it a sensible match for a 24/7 background‑memory agent, where predictable latency and low inference cost are essential to keep “always‑on” affordable.
ADK context
- Model‑agnostic and deployment‑agnostic.
- Supports workflow agents, multi‑agent systems, tools, evaluation, and targets such as Cloud Run and Vertex AI Agent Engine.
Thus, the memory agent feels less like a one‑off demo and more like a reference point for a broader agent‑runtime strategy.
The enterprise debate is about governance, not just capability
Public reaction highlights that enterprise adoption of persistent memory hinges on governance as much as on speed or cost.
| Commenter | Key Concern |
|---|---|
| Franck Abe (X) | “Brilliant leaps for continuous autonomy, but an agent that dreams and cross‑pollinates memories without deterministic boundaries becomes a compliance nightmare.” |
| ELED | The main cost isn’t tokens but drift and loops—agents may diverge over time. |
| Iffy | Challenges the “no embeddings” claim: the system still needs to chunk, index, and retrieve structured memory. Works for small‑context agents but may break down as memory stores grow. |
These critiques target the operational burden of persistent systems:
- Who can write memory?
- What gets merged?
- Retention policies & deletion
- Auditability of learned knowledge
For developers, the trade‑off is less ideological and more about fit:
- A lighter stack (no vector DB) can be attractive for low‑cost, bounded‑memory agents.
- Larger‑scale deployments may still require explicit retrieval controls, stricter governance, and more robust storage solutions.
The “Always On Memory Agent” thus serves as both a technical showcase and a catalyst for deeper conversations around the future of autonomous AI agents and the governance frameworks needed to keep them trustworthy.
# Indexing strategies and stronger lifecycle tooling
## ADK broadens the story beyond a single demo
Other commenters focused on developer workflow. One asked for the ADK repo and documentation and wanted to know whether the runtime is serverless or long‑running, and whether tool‑calling and evaluation hooks are available out of the box.
Based on the supplied materials, the answer is effectively **both**: the `memory‑agent` example itself is structured like a long‑running service, while ADK more broadly supports multiple deployment patterns and includes tools and evaluation capabilities.
The always‑on memory agent is interesting on its own, but the larger message is that **Saboo is trying to make agents feel like deployable software systems rather than isolated prompts**. In that framing, memory becomes part of the runtime layer, not just an add‑on feature.
---
## What Saboo has shown — and what he has not
### Shown
- A functional, always‑on memory agent that can be run as a long‑living service.
- ADK’s flexibility to support both serverless and long‑running deployment patterns.
- Built‑in tool‑calling and evaluation hooks.
### Not shown (yet)
- A direct **Flash‑Lite vs. Anthropic Claude Haiku** benchmark for agent loops in production use.
- Enterprise‑grade compliance controls specific to this memory agent, such as:
- Deterministic policy boundaries
- Retention guarantees
- Segregation rules
- Formal audit workflows
- Clear evidence that persistent memory can be **shared across multiple independent agents** (the repo uses multiple specialist agents internally, but the larger claim remains unproven).
> **Bottom line:** The repository reads as a compelling engineering template rather than a complete enterprise memory platform.
---
## Why this matters now
- Enterprise AI teams are moving beyond single‑turn assistants toward systems that **remember preferences, preserve project context, and operate across longer horizons**.
- Saboo’s open‑source memory agent offers a concrete starting point for that next layer of infrastructure, and Flash‑Lite gives the economics some credibility.
**Key takeaway:** Continuous memory will be judged on **governance as much as capability**. The real enterprise question behind Saboo’s demo isn’t just *whether an agent can remember*, but *whether it can remember in ways that stay bounded, inspectable, and safe enough to trust in production*.