The Model Doesn't Remember. You Do

Published: 2 hours ago (June 18, 2026 at 01:02 PM EDT)

4 min read

Source: Dev.to

Introduction

Before I dug into how an LLM works, I assumed each chat stored its memory or context in its own. The moment I realized it was just an array with all the messages appended gave me a sense of control. I wish I had known this sooner. This is invisible in a chat session; Claude and OpenAI pull a lot of threads to pull up a context accurate response. To know about those threads first, I needed to work with an LLM API with raw fetch, no SDK, and understand the request/response cycle. We want to build strong fundamentals, so not using the Anthropic SDK frees us from abstractions we may not notice. The SDK provides idiomatic interfaces, type safety, and built-in support for streaming, retries, and error handling. Without the SDK, nothing is abstracted away. Every decision is visible, which is exactly the point. Normally, with the SDK to call the API, you’d need to add a script like this one: import Anthropic from “@anthropic-ai/sdk”;

const client = new Anthropic();

const message = await client.messages.create({ model: “claude-opus-4-8”, max_tokens: 1000, messages: [ { role: “user”, content: “What should I search for to find the latest developments in renewable energy?” } ] }); console.log(message.content);

And for a raw fetch, you’d need to manage the headers and body yourself: const URL = https://api.anthropic.com/v1/messages;

const res = await fetch(URL, { method: ‘POST’, headers: { ‘content-type’: ‘application/json’, ‘x-api-key’: ${process.env.ANTHROPIC_API_KEY}, ‘anthropic-version’: ‘2023-06-01’, }, body: JSON.stringify({ model: ‘claude-sonnet-4-5’, max_tokens: 1024, messages: [ { role: ‘user’, content: ‘Hello Claude’, }, ], }), });

const data = await res.json();

console.log(data.content[0].text);

Surprisingly, there is little documentation if you want to take this path; it’s obvious why, but still inquiring. And well, this is just for the basic request and response dynamic. You send a query, get a response from the LLM, and that’s it. The Messages API is stateless, so you need to always send back the full conversation history every time you send a request. We’d want to achieve multiple conversational turns. Let’s stop for a moment to think about this “history” we need to manage. This is where you learn the most important concept in LLM development. The model has no memory. You are responsible for keeping the history and sending it back every time. Our model is only aware of what we are sending to it. Everything else is forgotten. Going through the loop development, I found out our “memory” is just an array with our previous messages, along with the latest query. Yes, that’s how an LLM manages its context. This did hit me hard because I thought a model was managing this on its own, and being able to control this array to this fine-grained level was a nice surprise. Our “memory” after a second query would look like the snippet below. messages: [ { role: “user”, content: “Hello, Claude” }, { role: “assistant”, content: “Hello! How can I help you today?” }, { role: “user”, content: “Can you describe LLMs to me?” } ]

What if we want a real back-and-forth conversation with the model? First, we need these requirements: read user input from the terminal, append the new message with the previous one to pass it to the model, print the response, go back to step 1, and, as a nice touch, an exit option. If you want to check the full implementation of a basic loop chat, check this script at the raw-claude-chat where this stage is added. This simple array is the seed for many context strategies like sliding window, RAG, and semantic search that will be necessary later for a really functional chat that “remembers”. When interacting with a chat, one thing we may want to do is not just to message it, but to tell it to do something. This leads to tool use, being able to execute what the model is actually instructed to run, run one task after another, and choose correctly which tool to run when it needs to. We have built a tool from the server perspective, gitstoria. Now we are going to complement this knowledge by understanding the counterpart, the client side.

The Model Doesn't Remember. You Do

Related posts

I copy-pasted ChatGPT prompts into Reddit 200 times before I built this

One 'Fix This Code' Prompt Away from a Production Incident

Is Your Unity Game Still Choking on a Single Thread?

Batch-converting documents to markdown with Microsoft's markitdown