How Agentic Memory Enables Durable, Reliable AI Agents Across Millions of Enterprise Users

Published: 2 months ago (February 9, 2026 at 01:46 PM EST)

9 min read

Source: Salesforce Engineering
By Makarand Bhonsle, Christina Abraham, and Jayesh Govindarajan.

In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today’s discussion features Makarand Bhonsle, a software‑engineering architect at Salesforce, whose team is developing Agentic Memory within Agentforce to provide durable, governable memory for enterprise agents at massive scale.

Explore how the team addressed the inherent limits of stateless agents with small context windows by introducing Agentic Memory as a durable, structured data layer, and how they tackled the formidable challenge of ensuring its accuracy, governability, and reliability at enterprise scale through confidence scoring, write/read gates, and hybrid semantic validation.

What is your team’s mission in addressing the limitations of stateless AI agents within enterprise workflows?

The fundamental objective is to elevate agents beyond fleeting, stateless exchanges, transforming them into dependable collaborators over extended periods. Across the industry, most AI‑agent architectures operate within a restricted working space, treating each interaction in isolation. This design severely curtails their capacity to retain user context, past decisions, and crucial enterprise constraints across various business workflows. Consequently, applying these architectures reliably becomes increasingly difficult beyond basic, single‑turn interactions.

To overcome this limitation, the team prioritizes equipping agents with a robust, durable memory foundation. This memory persists across interactions, yet remains governable and transparent. Agentic Memory is a core platform capability that allows agents to use relevant information in the chat without referring back to chat history and other large consumer datasets.

While short‑term context remains tethered to the active session, enabling agents to reason effectively in the immediate moment,
Long‑term memory is linked to a persistent profile graph. This graph endures across sessions and distinct communication channels.

This strategic approach ensures continuity without compromising trust, auditability, or enterprise control. The profile graph refers to an individual profile within Salesforce.

The Agent Memory Platform powered by Data 360

What constraints of small context windows and stateless execution prevent today’s agents from operating reliably over time?

In stateless agent designs, agents operate with a severely restricted view of information. Older chats, emails, and CRM records simply vanish from their scope as conversations evolve. Moreover, this execution model consistently resets an agent’s working context with each interaction, even when the user and the task remain constant. During extended interactions, these limitations often lead to:

Repetitive questioning
Inconsistent behavior
Noticeable gaps in retained context

Industry attempts to mitigate this by injecting vast quantities of raw historical data into prompts only introduce further latency, increase costs, and generate unnecessary noise. Obsolete or irrelevant information distracts the model, diminishing its reasoning capabilities. Without explicit memory records, updating or removing facts as individuals change roles, preferences, or circumstances becomes a formidable challenge, allowing outdated information to resurface persistently.

Agentic Memory directly confronts these constraints by externalizing memory into structured records with explicit lifecycle control. This design enables agents to:

Retain stable facts
Adapt their understanding as conditions shift
Discard information that no longer holds relevance

Why does treating memory as prompt text break down at enterprise scale, and what architectural shift was required to overcome that?

Prompt‑based memory approaches frequently fail at enterprise scale due to their inherent lack of structure, governance, and explainability. When memory exists solely as transient prompt text, auditing an agent’s knowledge, enforcing access controls, or explaining decision paths becomes increasingly difficult. At the enterprise level, these deficiencies severely limit the ability to meet trust, governance, and compliance expectations.

To rectify this, the team re‑conceptualized memory as a core platform capability, rather than a mere prompt‑side technique. Memory now resides in a real‑time data layer, distinct from prompts, with explicit structure and lifecycle controls. Key aspects of the new architecture:

Short‑term session context is isolated from long‑term memory, which anchors to a profile graph.
Raw signals pass through a pipeline that determines whether to add, update, delete, or disregard each memory candidate.

This fundamental shift makes memory inspectable, governable, and explainable and allows seamless integration with retrieval, planning, and tool execution across various agents.

How does adaptive context and session‑level tracing help make agent behavior governable and explainable at enterprise scale?

Salesforce serves as the authoritative record for enterprise data. However, effective agent automation needs context that evolves with interactions. Adaptive Context allows agents to dynamically refine, prioritize, and prune information in real time, moving beyond static inputs. This helps agents highlight the most relevant signals from conversations, documents, and enterprise systems as the interaction progresses, while session‑level tracing records the provenance of each piece of information used. Together, they provide:

Governability: Administrators can set policies that dictate which signals may be retained, how long they persist, and who may access them.
Explainability: When an agent makes a recommendation, the system can surface the exact memory records and their timestamps that influenced the decision, satisfying audit and compliance requirements.

By coupling adaptive context with detailed tracing, the platform ensures that agents remain both responsive to the latest user intent and accountable to enterprise governance standards.

R Tasks Progress

During execution, agents create structured reasoning and decision traces. These traces capture:

How choices are evaluated
Which tools or actions are selected
The data sources consulted

By persisting these traces, the platform provides an auditable record that can be inspected to understand why a particular decision was made, satisfying governance and compliance requirements while also enabling developers to debug and improve agent behavior.

Enterprise‑Grade Agentic Memory

Why a traceable, evidence‑backed history matters

An agent’s actions need a clear, auditable record. A standardized session‑trace model organizes activity, capturing the agent’s complete journey. Over time, these traces form a relational history that links decisions to enterprise outcomes. By referencing successful past sessions in similar contexts, agents can base future behavior on proven patterns—while remaining fully inspectable, auditable, and aligned with organizational policies.

What Makes Enterprise‑Grade Agentic Memory Especially Difficult to Build Correctly at Scale?

The most challenging aspect is determining what information merits retention and ensuring its accuracy over time.

Core Challenges

Challenge	Why It Matters
Data volume vs. signal	Storing too much creates noise; storing too little limits utility.
Episodic complexity	Order and timing are crucial; agents must preserve the precise sequence of events to reason accurately.
Conflicting sources	Enterprise systems may contradict conversational signals. Memory must represent uncertainty rather than false certainty.
Context leakage	Mixing short‑term context with long‑term memory can cause private or one‑time information to persist incorrectly across sessions.

Mitigations

Strict write and read gates – Validate data before it enters memory and enforce access policies on retrieval.
Confidence scoring – Attach a probability or quality metric to each stored fact; decay or re‑evaluate low‑confidence items over time.
Memory‑compaction processes – Periodically summarize, prune, or merge entries to keep the store lean and coherent.
Comprehensive source tracking – Record provenance (system, timestamp, author) so contradictions can be resolved and audits performed.
Hybrid matching – Combine similarity search with semantic checks to prevent duplication and drift as the memory evolves.

What performance and cost constraints shaped how Agentic Memory retrieval was designed?

Constraints

Latency – Each agent turn must feel responsive.
Cost – Frequent large‑model invocations quickly become unsustainable at scale.

Design choices

Compact, structured memory records
- Pre‑computed embeddings enable rapid similarity search.
Selective retrieval
- Pull only a small, task‑relevant subset per interaction.
- Cache active‑session results for reuse.
Model tiering
- Use smaller, inexpensive models for steps such as candidate extraction and validation.
- Reserve larger models for complex reasoning only when necessary.

This approach lets agents leverage long‑term memory without sacrificing responsiveness or efficiency in real‑time enterprise environments.

Scientific and Engineering Challenges in Modeling Memory for Agents

Challenge	Description
Temporal ordering	Episodic memory must preserve the exact sequence of events and relate outcomes across time.
Context separation	Short‑term context must stay distinct from long‑term memory to prevent transient or sensitive data from persisting.
Uncertainty handling	Sources can disagree; the system should represent confidence levels instead of assuming correctness.
Quality measurement	Evaluation must consider correctness, freshness, helpfulness, and safety—not a single metric.

Proposed Solutions

Time‑bounded episodic chunks – Store recent events in bounded windows to keep ordering while limiting growth.
Confidence‑first design – Attach a confidence score to every memory entry and propagate it through downstream reasoning.
Replay‑based evaluation – Periodically replay stored memories against new queries to ensure they remain useful without becoming rigid or unsafe.

Data Sources Beyond Conversations

Source	Description
Human agent chats (Service Cloud)	`livechattranscript` Salesforce object.
Einstein Bot conversations	Bot Conversations Data model.
Zero‑copy connector imports	Direct ingestion from external systems.

These sources are defined as metadata and fed into the derivation pipeline, mirroring industry‑grade memory solutions that extract memories or insights from an actor‑text‑blob tuple or actor‑text‑actor triplet.

The text blob may originate from conversations, documents (Excel, PDF), or any connector in the Data 360 ecosystem.
Decoupling sources makes the pipeline a flexible, extensible enterprise‑memory platform, storing derived memories in a standard memory object.

What early R&D approaches show the most promise for keeping agentic memory reliable and governable over time?

Clean, structured data model – Explicit fields for type, time, source, confidence, and lifecycle controls.
Write gates – Only high‑quality candidates become memory.
Read gates – Retrieval limited to task‑relevant records.
Hybrid validation – Combines vector similarity with meaning checks to prevent duplication and drift.
Episodic summarization – Periodic summarization preserves signal while reducing noise.
Prioritization of trusted enterprise records over casual conversational signals.
Replay testing – Evaluates correctness, freshness, and safety.
Cost‑aware model selection – Balances performance with expense.

Together, these techniques support the development of long‑running, enterprise‑grade agentic memory without sacrificing trust or reliability.

Learn more

Stay connected — join our Talent Community (replace with the full URL when available).

Additional resources

ws.beamery.com/salesforce/eng-social-2023
Check out our Technology and Product teams to learn how you can get involved.