How I Built Swarm DJ: A Multi-Agent AI System Performing Live Electronic Music 🎧
Source: Dev.to

Overview
What happens when you give local Large Language Models (LLMs) the keys to a DJ booth?
That was the question that sparked Swarm DJ. I wanted to explore whether autonomous AI agents could collaborate in real‑time to generate music, argue over creative directions, and actually make crowds dance—without any human intervention.
The result is a distributed, multi‑agent AI system powered by Ollama, MQTT, and real‑time DSP audio generation, turning AI agents into a collective, autonomous DJ.
Demo
System Architecture
Building an autonomous DJ meant bridging the gap between slow, token‑by‑token text generation and hard real‑time audio constraints (where a missed buffer means an audible click).
To achieve this, the architecture separates the “thinking” from the “playing”, using an MQTT broker as the central nervous system.

The Core Components
- The Audio Engine – Built with pure NumPy for DSP synthesis (generating kicks, acid loops, and pads) and Spotify’s
pedalboardfor real‑time effects (reverb, delay, filters). It runs in an isolated, high‑priority thread to prevent audio dropouts. - The MIDI Clock – Emits a
clock/bar_completeevent strictly synchronized to the BPM, keeping the LLM voting cycles perfectly matched to the music. - The AI Agents – Three distinct personas powered by local Llama 3.2 models:
- The Architect – Focuses on structure, manipulating BPM and drop phases.
- The Ghost – A moody, atmospheric agent controlling reverb and low‑pass filters.
- The Prankster – An agent born to disrupt, adding delays and literal vinyl‑tape‑stop chaos.
- The Council – A Python orchestrator that runs the voting logic.
“Dictatorship by Confidence” Protocol
Initially I built a fully democratic voting system: every 8 bars the agents would deliberate, propose parameter changes, and vote Yes/No/Abstain on each other’s ideas.
Result: democratic gridlock. The Prankster would propose chaos, and the Architect would vote it down.
To make the music evolve dynamically (and cut cycle times from 15 seconds to 5 seconds), I replaced democracy with a “Dictatorship by Confidence.”
Every 4 bars, each agent generates a proposal with a self‑assigned confidence score (0.0 – 1.0). The orchestrator listens, and the highest‑confidence proposal instantly wins, ensuring fast, opinionated musical shifts.
🤯 Emergency Veto Powers
Each agent is granted one Emergency Veto per session. If an agent feels its vision is being completely ignored, it can bypass the voting cycle entirely:
- Architect – Tempo Lock: freezes the BPM for 32 bars.
- Ghost – Ambient Wash: floods the track with reverb and mutes the bass.
- Prankster – Glitch Storm: randomizes audio parameters for a chaotic 8‑bar drop.

🚀 What Can You Build With This Paradigm?
The Swarm DJ architecture (Real‑time Engine + MQTT + Autonomous Agents) is extremely adaptable. Using this codebase, you could build:
- AI Video Game Directors – Replace the synthesizer with an Unreal/Unity integration; agents control enemy spawn rates, weather, and lighting based on player health.
- Autonomous Lighting Techs – Connect MQTT output to DMX fixtures; LLMs “listen” to a Spotify stream and argue over stage‑light colors and strobe speeds.
- AI Stock Trading Ensembles – Swap the MIDI clock for a market data feed; conservative, aggressive, and contrarian agents debate and allocate portfolio percentages in real‑time.
- Interactive Storytellers – Have agents control smart‑home IoT devices (Hue lights, speakers, locks) while running a live D&D‑style audio feed in a haunted‑house attraction.
Building LLM tools that exist purely in chat interfaces is yesterday’s news. Swarm DJ proves we can break agents out of the chatbox and let them physically orchestrate the real world in real‑time.
Want to run your own AI rave? Check out the repo and build your own agents!