Orchestrating AI Agents to create Memes
Source: Dev.to

https://mathewdony.com/blog/orchestrating-ai-agents-to-create-memes
What even is an AI agent?
At this point, you’ve probably used some kind of LLM‑powered application either directly or indirectly. In simple words, an AI agent is just an application that uses an LLM as its brain.
If you compare it to a human, the LLM is the thinking part of the mind. It takes whatever you see, hear, or feel, makes sense of it, and decides what to do next. In a human body, your eyes and ears gather information, your nerves send that information to the brain, the brain interprets it, and then your muscles carry out the action.
An AI agent follows the same pattern:
- Input – data from the user or environment.
- Pre‑processing – code that cleans and structures the input.
- LLM – decides what’s happening.
- Functions/tools – act on the decision.
The brain (LLM) is clever, but without senses (input) and muscles (tools) it can’t interact with the world. Early LLMs hit this limitation: they could reason, but they couldn’t fetch real‑time data like the current weather because they lacked the necessary “senses” and “muscles”.
Enter tools
Tools are the missing pieces that enable the LLM to actually do things. Think of the LLM as the brain and tools as the hands, sensors, and external abilities it never had. Once you connect tools to the brain, it can reach out, fetch data, take actions, and handle tasks it previously couldn’t attempt.
As tools became more common, a standard way for agents and LLMs to communicate with them was needed. Anthropic introduced the Model Context Protocol (MCP), providing a unified schema for defining and using tools.
Creating the Meme MCP Server
I built an MCP server that wraps ImgFlip’s caption_image API to generate memes. The server exposes a single tool that my Meme agent can call. It’s published on npm as imgflip-meme-mcp and provides a generate_meme tool that takes a template ID, captions, and API credentials.
The 3 Agent squad

The end user only talks to the Supervisor agent. They never interact with the Emotion or Meme agents directly, keeping the experience clean while the coordination happens behind the scenes.
I could have bundled everything into one giant agent with two tools:
- Summarise the user’s emotions
- Generate a meme
Instead, I split them for three solid reasons:
- Too many tools in one agent actually makes it worse.
- Specialized agents are easier to tune and scale.
- I can mix and match cheap and expensive models depending on the job (e.g., a lightweight model for emotion summarisation vs. a more powerful model for creative meme generation).
Building the 3 Agents
Meme agent
The Meme agent accesses the remote MCP server to call the meme generator tool:
const memeAgent = createAgent({
model: "GPT-5",
tools: [generateMemeTool],
systemPrompt: "Create a funny meme",
});
Emotion agent
The Emotion agent analyses the user’s feelings:
const emotionAgent = createAgent({
model: "GPT-3.5",
systemPrompt: "Analyze what the user is feeling",
});
Supervisor agent
The Supervisor agent doesn’t generate memes or analyse emotions itself. Instead, it wraps the above agents as tools:
const supervisorAgent = createAgent({
model: "Gemini-3",
tools: [summarizeEmotionTool, generateMemeTool],
systemPrompt:
"You are a Supervisor that is tasked with creating a meme based on the emotions of the user",
});
The Supervisor only sees high‑level tools like summarizeEmotionTool and generateMemeTool; it isn’t aware of the low‑level implementation details inside the MCP server. This modular design makes debugging and scaling much easier.
Try out the Meme agent on my blog: https://mathewdony.com/blog/orchestrating-ai-agents-to-create-memes
Links
If these agents become self‑aware, at least they’ll have a sense of humor.