Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory

Published: 1 month ago (March 6, 2026 at 02:56 PM EST)

6 min read

Source: VentureBeat

Google’s Open‑Source “Always On Memory Agent”

Author: Shubham Saboo – Senior AI Product Manager, Google
Release: Open‑source on the official Google Cloud Platform GitHub (MIT License)
Built With:

Google’s Agent Development Kit (ADK) – introduced Spring 2025
Gemini 3.1 Flash‑Lite – low‑cost, high‑throughput model launched Mar 3 2026

The repository provides a practical reference implementation for a persistent‑memory agent that can:

Continuously ingest information (files, API input, etc.)
Consolidate memories in the background
Retrieve stored data later without a conventional vector database

For enterprise developers, the release is less a product launch than a signal about the future direction of agent infrastructure. It showcases a view of long‑running autonomy useful for support systems, research assistants, internal copilots, and workflow automation—while also sharpening governance concerns once memory is no longer session‑bound.

What the repo appears to do — and what it does not clearly claim

The repository uses a multi‑agent internal architecture with specialist components for ingestion, consolidation, and querying.
However, it does not explicitly claim to be a shared‑memory framework for multiple independent agents.
This distinction matters: while ADK supports multi‑agent systems, this repo is best described as an always‑on memory agent (or memory layer) built with specialist sub‑agents and persistent storage.
Even at this narrower scope, it tackles a core infrastructure problem many teams are actively solving.

The architecture favors simplicity over a traditional retrieval stack

Continuous operation: The agent runs nonstop, ingesting text, image, audio, video, and PDF inputs.
Storage: Structured memories are stored in SQLite.
Consolidation: By default, memory consolidation runs every 30 minutes.
Interfaces: Includes a local HTTP API and a Streamlit dashboard.

Provocative claim: “No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory.”

Why this matters

Traditional retrieval stacks require separate embedding pipelines, vector stores, indexing logic, and synchronization work.
Saboo’s design leans on the LLM to organize and update memory directly, simplifying prototypes and reducing infrastructure sprawl—especially for smaller or medium‑memory agents.
The performance trade‑off shifts from vector‑search overhead to model latency, memory‑compaction logic, and long‑run behavioral stability.

Flash‑Lite gives the always‑on model some economic logic

Metric	Value
Pricing (Google)	$0.25 per 1 M input tokens; $1.50 per 1 M output tokens
Speed	2.5× faster than Gemini 2.5 Flash (time‑to‑first‑token) and 45 % higher output speed
Benchmarks	Elo 1432 on Arena.ai; 86.9 % on GPQA Diamond; 76.8 % on MMMU Pro

Google positions Flash‑Lite for high‑frequency tasks (translation, moderation, UI generation, simulation). These characteristics make it a sensible match for a 24/7 background‑memory agent, where predictable latency and low inference cost are essential to keep “always‑on” affordable.

ADK context

Model‑agnostic and deployment‑agnostic.
Supports workflow agents, multi‑agent systems, tools, evaluation, and targets such as Cloud Run and Vertex AI Agent Engine.

Thus, the memory agent feels less like a one‑off demo and more like a reference point for a broader agent‑runtime strategy.

The enterprise debate is about governance, not just capability

Public reaction highlights that enterprise adoption of persistent memory hinges on governance as much as on speed or cost.

Commenter	Key Concern
Franck Abe (X)	“Brilliant leaps for continuous autonomy, but an agent that dreams and cross‑pollinates memories without deterministic boundaries becomes a compliance nightmare.”
ELED	The main cost isn’t tokens but drift and loops—agents may diverge over time.
Iffy	Challenges the “no embeddings” claim: the system still needs to chunk, index, and retrieve structured memory. Works for small‑context agents but may break down as memory stores grow.

These critiques target the operational burden of persistent systems:

Who can write memory?
What gets merged?
Retention policies & deletion
Auditability of learned knowledge

For developers, the trade‑off is less ideological and more about fit:

A lighter stack (no vector DB) can be attractive for low‑cost, bounded‑memory agents.
Larger‑scale deployments may still require explicit retrieval controls, stricter governance, and more robust storage solutions.

Takeaway

The “Always On Memory Agent” serves as both a technical showcase and a catalyst for deeper conversations around the future of autonomous AI agents and the governance frameworks needed to keep them trustworthy.

Indexing strategies and stronger lifecycle tooling

(Further details on indexing strategies and lifecycle tooling can be added here as needed.)

ADK Broadens the Story Beyond a Single Demo

Other commenters focused on developer workflow. One asked for the ADK repository and documentation and wanted to know whether the runtime is serverless or long‑running, and whether tool‑calling and evaluation hooks are available out of the box.

Based on the supplied materials, the answer is effectively both:

The memory‑agent example itself is structured like a long‑running service.
ADK more broadly supports multiple deployment patterns and includes tools and evaluation capabilities.

The always‑on memory agent is interesting on its own, but the larger message is that Saboo is trying to make agents feel like deployable software systems rather than isolated prompts. In that framing, memory becomes part of the runtime layer, not just an add‑on feature.

What Saboo Has Shown — and What He Has Not

Shown

A functional, always‑on memory agent that can be run as a long‑living service.
ADK’s flexibility to support both serverless and long‑running deployment patterns.
Built‑in tool‑calling and evaluation hooks.

Not Shown (Yet)

A direct Flash‑Lite vs. Anthropic Claude Haiku benchmark for agent loops in production use.
Enterprise‑grade compliance controls specific to this memory agent, such as:
- Deterministic policy boundaries
- Retention guarantees
- Segregation rules
- Formal audit workflows
Clear evidence that persistent memory can be shared across multiple independent agents (the repo uses multiple specialist agents internally, but the larger claim remains unproven).

Bottom line: The repository reads as a compelling engineering template rather than a complete enterprise memory platform.

Why this matters now

Enterprise AI teams are moving beyond single‑turn assistants toward systems that remember preferences, preserve project context, and operate across longer horizons.
Saboo’s open‑source memory agent offers a concrete starting point for that next layer of infrastructure, and Flash‑Lite gives the economics some credibility.

Key takeaway: Continuous memory will be judged on governance as much as capability. The real enterprise question behind Saboo’s demo isn’t just whether an agent can remember, but whether it can remember in ways that stay bounded, inspectable, and safe enough to trust in production.