Enterprise AI agents keep failing because they forget what they learned

Published: 3 weeks ago (May 20, 2026 at 02:43 PM EDT)

6 min read

Source: VentureBeat

RAG architectures are good at one thing: surfacing semantically relevant documents. That’s also where they stop.

A framework called a decision‑context graph addresses that gap by giving agents:

Structured memory
Time‑aware reasoning
Explicit decision logic

Rippletide, a startup in the Neo4j ecosystem, has built one. Its key capability: agents that are non‑regressive, able to freeze validated sequences of actions and compound on them over time.

“The key point you want is non‑regressivity: How do you make sure that, when the agent will generate something new, you can compound on the previous discoveries?” – Yann Bilien, Rippletid’s co‑founder and chief scientific officer

Why RAG doesn’t go far enough

Enterprise context is sprawled across ERP tools, logs, databases, vector stores, and policy documents. Generative‑AI tools can retrieve from all of it—through keyword search, SQL queries, or full RAG pipelines—but retrieval has a ceiling.

Relevance gap – Retrieved data may not be relevant to the decision at hand, leading to hallucinations.
Lack of guidance – Even when the right data is pulled, agents often lack the rationale needed to make sound decisions.

“Everyone starts with RAG: Pull relevant docs, stuff them in the prompt, let the model figure it out.” – Wyatt Mayham, Northwest AI Consulting

While that works for chatbots, it “breaks immediately” for agents that must decide and act, Mayham notes.

“The biggest thing builders struggle with is the gap between retrieval and applicability.” – Mayham

A retrieved document doesn’t tell the agent whether it still applies, whether it’s been superseded, or whether a conflicting rule takes priority. In construction, for example, the agent must know:

A pricing exception has expired.
A safety policy applies only in certain jurisdictions.
A standard operating procedure was updated a month ago.

“Miss any of that, and the agent confidently does the wrong thing.” – Mayham

Without structured decision context, agents:

Combine incompatible rules.
Invent constraints to fill gaps.
Rely on “probabilistic guesses over unbounded data.”

These errors are hard to reproduce because builders can’t trace why a particular choice was made.

The compounding error problem is real: a small miss‑rate per step becomes catastrophic across a multi‑step workflow, which is “the main reason most enterprise agents never leave the pilot phase.” – Mayham

How decision‑context graphs get to the relevant answer

A decision‑context graph encodes a structured map of:

What is applicable.
What the rules are.
When they apply.

The framework is optimized for one question:

“Given this situation, which context applies right now?”

Time is treated as a first‑class dimension; every rule, decision, and exception is scoped to its validity period.

“The goal is to explicitly address missing, incoherent, or contradictory data when building the graph to avoid probabilistic [errors] once the agent is running.” – Bilien

Core Principles

Applicability – Logic is explicitly encoded so the agent knows which rules to remember and apply in a given situation. Context is returned only when it is relevant.
Time‑aware memory – Every rule, decision, and exception is time‑scoped, enabling the agent to reason about “what was true then versus what is true now” and to reproduce or explain its decisions.
Decision paths – The system can explain how it got from A to B and the “why” behind its rationale (e.g., why one piece of context was included and another was not). Agents receive “decision‑path” examples of how similar cases were handled before.

At setup, unstructured data is ingested and structured into an ontology (entities, rules, exceptions). Neuro‑symbolic AI handles pattern recognition and encodes formal, machine‑readable logic. Over time, the system refines its knowledge base as new decisions are made.

“Neuro‑symbolic brings two parts: a neuronal part giving large autonomy to agents and a symbolic part to reduce the amount of data needed and bring control.” – Bilien

The agent is tested pre‑production to validate behavior or pinpoint improvements, reducing risk and inference‑time computation.

Agents learning, rather than regressing

Non‑regression hinges on compounding both intelligence (models) and knowledge (shared between agents).

Agents can explore when they don’t know how to accomplish a task, trying different possibilities in a controlled environment or simulation (e.g., a support bot testing multiple response patterns).
Once a solution is evaluated as satisfactory, the graph freezes that sequence of actions. Future exploration starts from this “stable base of validated behaviors,” preventing newly‑acquired skills from overwriting previously learned good behavior.

Before acting on a customer, an agent checks against the graph:

Is it violating a rule?
Is it hallucinating?
Is it staying within constraints?
Can it generalize the solution across similar cases?

At a macro level, the system assesses outcomes:

Did the behavior improve long‑term performance?
Did it generalize across similar contexts?
Did it preserve previous capabilities?

“This determinism is key for agents to run reliability at scale.” – Bilien

The result is behavior that is more consistent, predictable, explainable, and auditable, enabling stronger control.

“You want your agents to be able to learn by themselves when they face something they don’t know. You want them to be able to explore and find new solutions.” – Bilien

End of cleaned markdown segment.

Setting Beyond “Episodic” Memory

While the team initially assumed it would deploy RL everywhere, that actually proved very difficult in an enterprise setting, Bilien said. “Data are scarce for some specific use cases and messy for others.”

Typically, using raw data for reliable predictions has been a manual and time‑consuming challenge, but “now with agents we entered a new era where building ontologies is possible automatically,” Bilien noted.

Classic supervised fine‑tuning methods can lead to oscillations: models forget the last skill they learned while acquiring the next one. Overall, learning is not compounded, compression is “dramatic,” and models improve “episodically” rather than continuously, causing them to repeatedly fail on new or unseen tasks.

“You will never have a fully self‑learning model if you are regressing every time.” – Bilien

In enterprise use cases—such as banking, where millions of transactions are processed daily—a high level of reliability is critical. “One question I ask all customers: Is 95 % enough? In a lot of use cases, it’s not. You need 99.999 %. 1 % off is way too much,” he emphasized.

Decision‑Context Graphs

Decision‑context graphs can close that gap, he contends. When the same customer‑support question is asked repeatedly, the agent will return a “satisfactory” answer predictably and without regression, all while retaining autonomy.

Encoding applicability and temporal validity into a structured graph—rather than relying on an LLM to infer it—is a “sound approach” to a real limitation in existing retrieval frameworks, Mayham said.

“The open question is whether the automatic ontology generation holds up against the messy, diverse data that enterprises actually have.”
— Mayham

“That’s always the hard part.”

Enterprise AI agents keep failing because they forget what they learned

Why RAG doesn’t go far enough

How decision‑context graphs get to the relevant answer

Core Principles

Agents learning, rather than regressing

Setting Beyond “Episodic” Memory

Decision‑Context Graphs

Related posts

How We Reduced LLM Costs Without Touching Model Quality

A 0.12% parameter add-on gives AI agents the working memory RAG can't

Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself

Hmm, where were we?