How to Add Memory to a Python AI Agent

Published: (March 14, 2026 at 05:01 PM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Your AI agent forgets everything the moment it responds. Ask it a follow‑up question and it has zero context. Without memory, every interaction starts from scratch.

Here’s how to fix that in under 40 lines of Python – no LangChain, no frameworks, just the standard library and the OpenAI SDK.

The Code

import json
import os
from pathlib import Path
from openai import OpenAI

MEMORY_FILE = "agent_memory.json"
client = OpenAI()  # uses OPENAI_API_KEY env var

def load_memory() -> list[dict]:
    """Load conversation history from disk."""
    if Path(MEMORY_FILE).exists():
        with open(MEMORY_FILE, "r") as f:
            return json.load(f)
    return []

def save_memory(messages: list[dict]) -> None:
    """Persist conversation history to disk."""
    with open(MEMORY_FILE, "w") as f:
        json.dump(messages, f, indent=2)

def chat(user_input: str, messages: list[dict]) -> str:
    """Send a message with full conversation history."""
    messages.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            *messages
        ],
    )

    reply = response.choices[0].message.content
    messages.append({"role": "assistant", "content": reply})
    save_memory(messages)
    return reply

if __name__ == "__main__":
    history = load_memory()
    print("Agent ready. Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break
        print(f"Agent: {chat(user_input, history)}\n")

Save this as agent.py, set your OPENAI_API_KEY, and run it:

pip install openai
export OPENAI_API_KEY="sk-..."
python agent.py

How It Works

  • load_memory() checks for a local JSON file and loads any previous conversation. If the file doesn’t exist, it starts fresh with an empty list. This is your agent’s long‑term memory – it survives restarts.
  • save_memory() writes the full message list to disk after every exchange. The format matches OpenAI’s message schema exactly, so there’s no translation step.
  • chat() appends the user’s message to the history, sends the entire conversation to the model, then appends the response. The model sees every previous turn, so it can reference earlier context naturally.
  • The *messages spread in the API call unpacks your history after the system prompt, keeping the system instruction separate from the conversation flow.

What You’ll See

You: My name is Sarah and I'm building a CLI tool in Rust.
Agent: Nice to meet you, Sarah! A CLI tool in Rust is a great
       choice. What does it do?

You: What language am I using?
Agent: You're using Rust for your CLI tool.

# Restart the script...

You: What's my name?
Agent: Your name is Sarah!

The agent remembers across messages and across sessions because the JSON file persists.

When This Breaks Down

  • Token overflow. Every message gets sent to the model. After ~50 exchanges you’ll exceed the context window.
    Fix: Trim messages to the last N entries before the API call, or summarize older messages.

  • No semantic search. The agent remembers everything linearly but can’t search its memory by topic. For that you’d add an embedding store – but that’s a different tutorial.

For most prototypes and personal tools, this flat‑file approach works surprisingly well. You get persistent, contextual conversations with zero dependencies beyond the OpenAI SDK.

Next Steps

  • Add a max_history parameter to cap token usage.
  • Store timestamps with each message for time‑aware recall.
  • Split into short‑term (RAM) and long‑term (disk) memory layers.

Check out the other posts in the AI Agent Quick Tips series for more patterns like retry logic, structured outputs, and human approval gates.

Building agents that need memory, tools, and orchestration out of the box? Nebula handles the infrastructure so you can focus on the logic.

0 views
Back to Blog

Related posts

Read more »

Travigo

Travel as fast as you speak with Gemini! Where live agents meet immersive storytelling & 3D navigation. This project was created for entering the Gemini Live Ag...

Micro games

Hey Gamers! 👾 As part of the Rapid Games Prototyping module, we are tasked with reviewing a peer's game. The challenge is to analyse a prototype built in just...