[Paper] Predictive Associative Memory: Retrieval Beyond Similarity Through Temporal Co-occurrence

Published: 3 days ago (February 11, 2026 at 02:51 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.11322v1

Overview

The paper introduces Predictive Associative Memory (PAM), a neural architecture that retrieves memories based on temporal co‑occurrence rather than pure similarity. By training a JEPA‑style predictor on continuous streams of experience, PAM can “jump” to items that have historically appeared together, mimicking how biological memory links events that happen close in time.

Key Contributions

Temporal‑association retrieval: Shows that a predictor trained on sequential data can retrieve the correct past state even when its embedding is unrelated in cosine space.
Inward JEPA: Proposes a novel “inward” predictor that operates on stored experiences to navigate backward through the associative graph, complementing the classic “outward” predictor that forecasts the future.
Benchmark & metrics: Introduces an associative‑recall evaluation suite (Association Precision@k, Recall@k, discrimination AUC) that focuses on faithful recall of experienced pairs rather than generalisation to unseen pairs.
Empirical evidence: Demonstrates >97 % top‑1 precision on a synthetic benchmark and strong cross‑boundary recall where similarity scores are zero, confirming that the model captures true temporal structure.
Robustness checks: Includes a temporal‑shuffle control that collapses performance, proving the signal comes from order information, not embedding geometry.

Methodology

Embedding space: Raw observations (e.g., image frames, sensor readings) are encoded into a continuous latent space using a standard encoder network.
JEPA framework:
- Outward JEPA predicts the future latent state given the current one (the usual predictive coding setup).
- Inward JEPA does the opposite: given a query latent vector, it predicts a past latent vector that is associatively reachable—i.e., a state that historically co‑occurred with the query.
Training objective: Both predictors are trained with a contrastive loss that rewards accurate prediction of the true temporal neighbor and penalises mismatched samples drawn from the experience buffer.
Recall procedure: At inference time, a query vector is fed to the Inward JEPA, which outputs a candidate past vector. The nearest stored experience (by cosine similarity to the output) is returned as the retrieved memory.
Evaluation: Instead of measuring generalisation to novel pairs, the authors test faithfulness: does the retrieved item belong to the same temporal episode as the query? Metrics such as Association Precision@1 and Recall@20 capture this.

Results & Findings

Metric	PAM (Inward JEPA)	Cosine‑Similarity Baseline
Association Precision@1	0.970	0.321
Recall@20 (cross‑boundary)	0.421	0.000
Discrimination AUC (experienced vs. never‑experienced)	0.916	0.789
Cross‑room AUC (where similarity is uninformative)	0.849	0.503
Temporal‑shuffle control (Recall@20)	0.042 (‑90 % drop)	–

Interpretation: The Inward JEPA reliably surfaces the correct temporal associate even when the raw embeddings are orthogonal. The large gap between PAM and a naïve cosine similarity baseline demonstrates that the model learns a genuine associative graph rather than exploiting static geometry.

Practical Implications

Robust episodic retrieval: Systems such as personal assistants, robotics, or game AI can recall when two events occurred together, not just what they look alike. This enables more context‑aware behavior (e.g., “the last time I opened a file, I also received a network alert”).
Improved replay buffers: Reinforcement‑learning pipelines could replace random sampling with temporally‑aware recall, yielding richer training batches that respect the natural causal structure of the environment.
Memory‑augmented models: Large language or vision models can be equipped with a PAM‑style module to fetch relevant past contexts that are linked by time, potentially reducing hallucinations caused by similarity‑only retrieval.
Anomaly detection: Because PAM learns the normal temporal co‑occurrence graph, deviations (e.g., a query that has no strong associative neighbor) can flag out‑of‑distribution events in monitoring or security applications.
Cross‑modal linking: The approach is modality‑agnostic; developers can store embeddings from audio, video, sensor streams, etc., and let PAM discover cross‑modal associations that traditional similarity search would miss.

Limitations & Future Work

Synthetic benchmark: All experiments are on a controlled synthetic dataset; real‑world sensory streams (e.g., video, logs) may present noise, non‑stationarity, and scale challenges.
Memory footprint: The method assumes a buffer of stored embeddings that can be queried; scaling to billions of experiences will require efficient indexing or hierarchical memory structures.
Temporal granularity: The current formulation treats each timestep equally; future work could incorporate variable time gaps or hierarchical time scales (seconds vs. days).
Integration with downstream tasks: The paper focuses on recall fidelity; evaluating how PAM‑augmented retrieval improves downstream performance (e.g., RL sample efficiency, QA accuracy) remains an open question.

Bottom line: Predictive Associative Memory offers a fresh, biologically inspired way to retrieve memories based on “what happened together” rather than “what looks alike.” For developers building systems that need to reason over sequences of events, PAM opens a promising avenue to more context‑rich, temporally aware AI.

Authors

Jason Dury

Paper Information

arXiv ID: 2602.11322v1
Categories: cs.LG, cs.AI, cs.NE
Published: February 11, 2026
PDF: Download PDF

[Paper] Predictive Associative Memory: Retrieval Beyond Similarity Through Temporal Co-occurrence

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

[Paper] Agentic Test-Time Scaling for WebAgents