How Agentic Memory Enables Durable, Reliable AI Agents Across Millions of Enterprise Users
Source: Salesforce Engineering
By Makarand Bhonsle, Christina Abraham, and Jayesh Govindarajan.
In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today’s discussion features Makarand Bhonsle, a software‑engineering architect at Salesforce, whose team is developing Agentic Memory within Agentforce to provide durable, governable memory for enterprise agents at massive scale.
Explore how the team addressed the inherent limits of stateless agents with small context windows by introducing Agentic Memory as a durable, structured data layer, and how they tackled the formidable challenge of ensuring its accuracy, governability, and reliability at enterprise scale through confidence scoring, write/read gates, and hybrid semantic validation.
What is your team’s mission in addressing the limitations of stateless AI agents within enterprise workflows?
The fundamental objective is to elevate agents beyond fleeting, stateless exchanges, transforming them into dependable collaborators over extended periods. Across the industry, most AI‑agent architectures operate within a restricted working space, treating each interaction in isolation. This design severely curtails their capacity to retain user context, past decisions, and crucial enterprise constraints across various business workflows. Consequently, applying these architectures reliably becomes increasingly difficult beyond basic, single‑turn interactions.
To overcome this limitation, the team prioritizes equipping agents with a robust, durable memory foundation. This memory persists across interactions, yet remains governable and transparent. Agentic Memory is a core platform capability that allows agents to use relevant information in the chat without referring back to chat history and other large consumer datasets.
- While short‑term context remains tethered to the active session, enabling agents to reason effectively in the immediate moment,
- Long‑term memory is linked to a persistent profile graph. This graph endures across sessions and distinct communication channels.
This strategic approach ensures continuity without compromising trust, auditability, or enterprise control. The profile graph refers to an individual profile within Salesforce.

What constraints of small context windows and stateless execution prevent today’s agents from operating reliably over time?
In stateless agent designs, agents operate with a severely restricted view of information. Older chats, emails, and CRM records simply vanish from their scope as conversations evolve. Moreover, this execution model consistently resets an agent’s working context with each interaction, even when the user and the task remain constant. During extended interactions, these limitations often lead to:
- Repetitive questioning
- Inconsistent behavior
- Noticeable gaps in retained context
Industry attempts to mitigate this by injecting vast quantities of raw historical data into prompts only introduce further latency, increase costs, and generate unnecessary noise. Obsolete or irrelevant information distracts the model, diminishing its reasoning capabilities. Without explicit memory records, updating or removing facts as individuals change roles, preferences, or circumstances becomes a formidable challenge, allowing outdated information to resurface persistently.
Agentic Memory directly confronts these constraints by externalizing memory into structured records with explicit lifecycle control. This design enables agents to:
- Retain stable facts
- Adapt their understanding as conditions shift
- Discard information that no longer holds relevance
Why does treating memory as prompt text break down at enterprise scale, and what architectural shift was required to overcome that?
Prompt‑based memory approaches frequently fail at enterprise scale due to their inherent lack of structure, governance, and explainability. When memory exists solely as transient prompt text, auditing an agent’s knowledge, enforcing access controls, or explaining decision paths becomes increasingly difficult. At the enterprise level, these deficiencies severely limit the ability to meet trust, governance, and compliance expectations.
To rectify this, the team re‑conceptualized memory as a core platform capability, rather than a mere prompt‑side technique. Memory now resides in a real‑time data layer, distinct from prompts, with explicit structure and lifecycle controls. Key aspects of the new architecture:
- Short‑term session context is isolated from long‑term memory, which anchors to a profile graph.
- Raw signals pass through a pipeline that determines whether to add, update, delete, or disregard each memory candidate.
This fundamental shift makes memory inspectable, governable, and explainable and allows seamless integration with retrieval, planning, and tool execution across various agents.
How does adaptive context and session‑level tracing help make agent behavior governable and explainable at enterprise scale?
Salesforce serves as the authoritative record for enterprise data. However, effective agent automation needs context that evolves with interactions. Adaptive Context allows agents to dynamically refine, prioritize, and prune information in real time, moving beyond static inputs. This helps agents highlight the most relevant signals from conversations, documents, and enterprise systems as the interaction progresses, while session‑level tracing records the provenance of each piece of information used. Together, they provide:
- Governability: Administrators can set policies that dictate which signals may be retained, how long they persist, and who may access them.
- Explainability: When an agent makes a recommendation, the system can surface the exact memory records and their timestamps that influenced the decision, satisfying audit and compliance requirements.
By coupling adaptive context with detailed tracing, the platform ensures that agents remain both responsive to the latest user intent and accountable to enterprise governance standards.
R Tasks Progress
During execution, agents create structured reasoning and decision traces. These traces capture:
- How choices are evaluated
- Which tools or actions are selected
- The data sources consulted
By persisting these traces, the platform provides an auditable record that can be inspected to understand why a particular decision was made, satisfying governance and compliance requirements while also enabling developers to debug and improve agent behavior.
Enterprise‑Grade Agentic Memory
Why a traceable, evidence‑backed history matters
An agent’s actions need a clear, auditable record. A standardized session‑trace model organizes activity, capturing the agent’s complete journey. Over time, these traces form a relational history that links decisions to enterprise outcomes. By referencing successful past sessions in similar contexts, agents can base future behavior on proven patterns—while remaining fully inspectable, auditable, and aligned with organizational policies.
What Makes Enterprise‑Grade Agentic Memory Especially Difficult to Build Correctly at Scale?
The most challenging aspect is determining what information merits retention and ensuring its accuracy over time.
Core Challenges
| Challenge | Why It Matters |
|---|---|
| Data volume vs. signal | Storing too much creates noise; storing too little limits utility. |
| Episodic complexity | Order and timing are crucial; agents must preserve the precise sequence of events to reason accurately. |
| Conflicting sources | Enterprise systems may contradict conversational signals. Memory must represent uncertainty rather than false certainty. |
| Context leakage | Mixing short‑term context with long‑term memory can cause private or one‑time information to persist incorrectly across sessions. |
Mitigations
- Strict write and read gates – Validate data before it enters memory and enforce access policies on retrieval.
- Confidence scoring – Attach a probability or quality metric to each stored fact; decay or re‑evaluate low‑confidence items over time.
- Memory‑compaction processes – Periodically summarize, prune, or merge entries to keep the store lean and coherent.
- Comprehensive source tracking – Record provenance (system, timestamp, author) so contradictions can be resolved and audits performed.
- Hybrid matching – Combine similarity search with semantic checks to prevent duplication and drift as the memory evolves.
What performance and cost constraints shaped how Agentic Memory retrieval was designed?
Constraints
- Latency – Each agent turn must feel responsive.
- Cost – Frequent large‑model invocations quickly become unsustainable at scale.
Design choices
- Compact, structured memory records
- Pre‑computed embeddings enable rapid similarity search.
- Selective retrieval
- Pull only a small, task‑relevant subset per interaction.
- Cache active‑session results for reuse.
- Model tiering
- Use smaller, inexpensive models for steps such as candidate extraction and validation.
- Reserve larger models for complex reasoning only when necessary.
This approach lets agents leverage long‑term memory without sacrificing responsiveness or efficiency in real‑time enterprise environments.
Scientific and Engineering Challenges in Modeling Memory for Agents
| Challenge | Description |
|---|---|
| Temporal ordering | Episodic memory must preserve the exact sequence of events and relate outcomes across time. |
| Context separation | Short‑term context must stay distinct from long‑term memory to prevent transient or sensitive data from persisting. |
| Uncertainty handling | Sources can disagree; the system should represent confidence levels instead of assuming correctness. |
| Quality measurement | Evaluation must consider correctness, freshness, helpfulness, and safety—not a single metric. |
Proposed Solutions
- Time‑bounded episodic chunks – Store recent events in bounded windows to keep ordering while limiting growth.
- Confidence‑first design – Attach a confidence score to every memory entry and propagate it through downstream reasoning.
- Replay‑based evaluation – Periodically replay stored memories against new queries to ensure they remain useful without becoming rigid or unsafe.
Data Sources Beyond Conversations
| Source | Description |
|---|---|
| Human agent chats (Service Cloud) | livechattranscript Salesforce object. |
| Einstein Bot conversations | Bot Conversations Data model. |
| Zero‑copy connector imports | Direct ingestion from external systems. |
These sources are defined as metadata and fed into the derivation pipeline, mirroring industry‑grade memory solutions that extract memories or insights from an actor‑text‑blob tuple or actor‑text‑actor triplet.
- The text blob may originate from conversations, documents (Excel, PDF), or any connector in the Data 360 ecosystem.
- Decoupling sources makes the pipeline a flexible, extensible enterprise‑memory platform, storing derived memories in a standard memory object.
What early R&D approaches show the most promise for keeping agentic memory reliable and governable over time?
- Clean, structured data model – Explicit fields for type, time, source, confidence, and lifecycle controls.
- Write gates – Only high‑quality candidates become memory.
- Read gates – Retrieval limited to task‑relevant records.
- Hybrid validation – Combines vector similarity with meaning checks to prevent duplication and drift.
- Episodic summarization – Periodic summarization preserves signal while reducing noise.
- Prioritization of trusted enterprise records over casual conversational signals.
- Replay testing – Evaluates correctness, freshness, and safety.
- Cost‑aware model selection – Balances performance with expense.
Together, these techniques support the development of long‑running, enterprise‑grade agentic memory without sacrificing trust or reliability.
Learn more
- Stay connected — join our Talent Community (replace with the full URL when available).
Additional resources
- ws.beamery.com/salesforce/eng-social-2023
- Check out our Technology and Product teams to learn how you can get involved.