I Gave My AI Assistant Permanent Memory — Here's Exactly How

Published: (March 11, 2026 at 01:47 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

My AI assistant woke up every morning with no idea who I was.
I’d been running the same assistant for months. It knew my stack, my projects, my preferences — but only within a session. The next day? Blank slate. Every conversation started with context‑dumping: “Here’s what we’re building. Here’s where we left off. Here’s what matters.”

I got tired of it, so I built the thing that fixes it.

The Memory Problem

Most people solve AI memory by stuffing everything into the system prompt: project docs, previous decisions, preferences — all of it, every session.

That works until it doesn’t. Context windows have limits, and not all memory is equal. You don’t need to know everything; you need the right things at the right time.

It’s a retrieval problem, not a storage problem.

Engram – Persistent, Queryable Memory

@cartisien/engram is a SQLite‑backed, TypeScript‑first library that provides permanent, queryable memory for AI assistants with zero configuration.

Core API

import { Engram } from '@cartisien/engram';

const memory = new Engram({ dbPath: './assistant.db' });

// Store something
await memory.remember(
  sessionId,
  'User is building a federal contracting app in React 19',
  'user'
);

// Retrieve what’s relevant
const context = await memory.recall(sessionId, 'what are we building?', 5);

Drop it into any agent loop, chat handler, or LLM integration.

The first version used a simple SQLite table with indexes on session + timestamp and LIKE‑based keyword matching on recall. It worked, but keyword search only finds what you literally asked for.

Example: querying “What are we building?” would not surface a memory stored as “working on GovScout, a federal contracting app.”

This week I shipped v0.2 with semantic search, without any external API or managed vector database.

Local Embedding with Ollama

I’m running an RTX 5090 with Ollama locally. The nomic-embed-text model is already pulled, so the embedding call is a local HTTP request:

const response = await fetch('http://localhost:11434/api/embeddings', {
  method: 'POST',
  body: JSON.stringify({ model: 'nomic-embed-text', prompt: text })
});
const { embedding } = await response.json(); // 768‑dim float array
  • On remember(), the content is embedded and the vector is stored as JSON alongside the memory.
  • On recall(), the query is embedded, cosine similarity is computed against every stored vector, and the top‑k results are returned.
private cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, magA = 0, magB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }
  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}

No sqlite‑vss extension, no pgvector, no Pinecone—just math on JSON arrays.

For the scale Engram targets (one assistant, thousands of memories — not millions), this is plenty fast. If Ollama is unreachable, Engram automatically falls back to keyword search, so there are no crashes and no extra configuration required.

Running Engram in Practice

I run Engram as my own assistant’s memory store. Every significant memory is posted to a local API server (PM2, port 3470) alongside the markdown files I already use.

Example Semantic Query

curl "http://localhost:3470/memory/charli?query=what+projects+is+jeff+working+on&limit=5"

Result

[
  { "content": "Jeff is building GovScout, a federal contracting app...", "similarity": 0.525 },
  { "content": "Engram v0.2 ships semantic search via nomic-embed-text...", "similarity": 0.396 }
]

The query “What projects is Jeff working on?” surfaced the GovScout memory (0.53 similarity) over the Engram memory (0.40), even though there was no keyword overlap—exactly the right answer.

The Cartisien Memory Suite

Engram is part of a larger framework I call the Cartisien Memory Suite:

PackageRole
@cartisien/engramPersistent memory (this library)
@cartisien/extensaVector infrastructure layer
@cartisien/cogitoAgent identity and wake/sleep lifecycle

The framing comes from Descartes: res cogitans (thinking substance) and res extensa (extended substance) — mind and body. cogito is the agent’s sense of self, extensa is the vector layer it thinks through, and engram is where experience accumulates.

Thesis: agents need more than a context window; they need a substrate of self.

Installation

npm install @cartisien/engram

v0.2.0 is live. See the repository for details:

github.com/Cartisien/engram

I’m still testing semantic search in production before publishing to npm—watching for edge cases, handling Ollama timeouts, and ensuring the cosine math scales.

Call to Action

If you’re building agents and hitting the memory problem, I’d love to hear what you’re doing about it. The space is wide open.

0 views
Back to Blog

Related posts

Read more »

AnswerThis (YC F25) Is Hiring

Who we are Trillions of dollars flow into global R&D every year, and a massive share of it goes to researchers manually reading papers, writing literature revi...