We Built a Full-Stack AI Music Agent with Next.js — Here's What We Learned

Published: 2 days ago (February 14, 2026 at 05:59 PM EST)

8 min read

Source: Dev.to

The Stack

Component	Technology
Framework	Next.js 16 (App Router)
Auth	Clerk
Payments	Stripe
Audio	Web Audio API + WaveSurfer.js
AI	Custom agent orchestrating multiple music AI providers
i18n	`next-intl` (32 languages)
State	Zustand + TanStack Query
UI	Radix primitives + Tailwind
Hosting	Vercel + S3‑compatible object storage

Lesson 1: Streaming AI Responses Requires Rethinking Your Data Flow

When a user says “make me a lo‑fi beat with jazz piano,” the AI agent doesn’t just return text — it generates a song, creates cover art, extracts metadata, and streams progress updates back to the UI, all in a single conversation turn.

The naive approach is to wait for the entire response and then render. But music generation takes 30–120 seconds. You need to stream.

What we learned

Server‑Sent Events (SSE) over fetch – not WebSockets. For a conversational AI interface, SSE is simpler and works perfectly with Vercel’s serverless model. WebSockets would require a persistent connection and a separate infrastructure layer.

// Simplified streaming pattern
const response = await fetch('/api/agent', {
  method: 'POST',
  body: JSON.stringify({ message: userInput }),
});

const reader = response.body?.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  // Parse SSE events: text deltas, resource creation, progress updates
  processStreamEvents(chunk);
}

State management during a stream – when the agent creates a new audio resource mid‑stream you must:
1. Update the chat message (append text)
2. Add the new resource to the resource panel
3. Trigger a waveform render for the new audio
4. Update the credit balance
All of this needs to happen smoothly without re‑renders that cause audio playback glitches.

What we’d do differently: Design your state management around streaming from day one. We started with simple useState and had to refactor to Zustand stores + refs to avoid cascade re‑renders during active streams.

Lesson 2: Browser Audio Processing Is Harder Than You Think

The studio includes a real‑time mastering chain — EQ, compression, stereo width, limiter — all running in the browser via the Web Audio API. Users can tweak mastering settings and hear changes in real time, then export the mastered MP3.

Real‑time vs. offline rendering

Goal: Real‑time playback and offline rendering must produce identical output.

// The mastering pipeline (simplified)
async function renderMasteredBuffer(
  audioUrl: string,
  settings: MasteringSettings
): Promise {
  const offlineCtx = new OfflineAudioContext(
    2,                    // stereo
    sampleRate * duration,
    sampleRate
  );

  // Build the same effect chain used in real‑time playback
  const source = offlineCtx.createBufferSource();
  const eq = createParametricEQ(offlineCtx, settings.eq);
  const compressor = createCompressor(offlineCtx, settings.compression);
  const limiter = createLimiter(offlineCtx, settings.limiter);

  source.connect(eq).connect(compressor).connect(limiter).connect(offlineCtx.destination);
  source.start(0);

  return offlineCtx.startRendering();
}

Gotcha: OfflineAudioContext and a regular AudioContext can produce subtly different results if filter frequencies or parameter ramps aren’t identical. We extracted all shared constants into a single TypeScript file to guarantee bit‑perfect parity.

MP3 encoding in the browser

We use lamejs (a JavaScript LAME port) to encode AudioBuffers to MP3 client‑side, avoiding a round‑trip to the server. However, lamejs is CPU‑intensive — encoding a 3‑minute song can block the main thread for 2–3 seconds.

Fix: Process in chunks and yield back to the event loop.

async function encodeToMp3(audioBuffer: AudioBuffer): Promise {
  const mp3encoder = new lamejs.Mp3Encoder(2, audioBuffer.sampleRate, 192);
  const chunks: Int8Array[] = [];
  const blockSize = 1152;

  for (let i = 0; i  0) chunks.push(mp3buf);

    // Yield to prevent UI freeze
    if (i % (blockSize * 100) === 0) {
      await new Promise(resolve => setTimeout(resolve, 0));
    }
  }

  const end = mp3encoder.flush();
  if (end.length > 0) chunks.push(end);

  return new Blob(chunks, { type: 'audio/mp3' });
}

Lesson 3: File Uploads on Vercel Have a Hidden Limit

Vercel serverless functions impose a 4.5 MB body size limit. That sounds fine until you realize a single mastered audio file is easily 5–10 MB.

Our first approach was client → Next.js API route → object storage. This broke immediately for any real audio file.

The solution: direct client‑to‑storage uploads with pre‑signed URLs

1. Client requests a signed upload URL from our API (tiny JSON payload)
2. Client uploads the file dire

(The rest of the flow continues as usual: the client PUTs the file to the storage endpoint, then notifies our backend that the upload is complete.)

Lesson 3 – Bypass Vercel’s Body‑Size Limit for Large Uploads

When you need to upload files larger than Vercel’s 4.5 MB request limit, the simplest pattern is a direct‑to‑object‑storage upload using a signed URL.

Client requests a signed upload URL from an API route.
Client uploads the file directly to the storage service (S3, Cloudflare R2, etc.).
Client sends the resulting public URL back to the API (tiny JSON payload).
API updates the database with the file metadata.

All steps stay well under the 4.5 MB limit; the heavy file transfer bypasses Vercel entirely.

// Upload flow that bypasses Vercel's body limit
export async function uploadFileToStorageFromClient({
  file,
  filename,
  key,
}: {
  file: Blob;
  filename: string;
  key: string;
}): Promise {
  // Step 1: Get signed URL (tiny request)
  const tokenResp = await fetch('/api/upload/token', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ key, filename, contentType: file.type }),
  });
  const { uploadUrl, publicUrl } = await tokenResp.json();

  // Step 2: Upload directly to object storage (no Vercel in the middle)
  await fetch(uploadUrl, {
    method: 'PUT',
    body: file,
    headers: { 'Content-Type': file.type },
  });

  return { url: publicUrl };
}

This pattern is essential for any media‑heavy app on Vercel.

Lesson 4 – i18n at Scale Is a Product Decision, Not a Technical One

Gliss supports 32 languages (not 3 or 5). Below is the i18n setup:

// routing.ts
import { defineRouting } from 'next-intl/routing';

export const routing = defineRouting({
  locales: SUPPORTED_LOCALE_CODES, // 32 locales
  defaultLocale: 'en',
  localePrefix: 'as-needed', // No /en prefix for English
});

The localePrefix: 'as-needed' eliminated a ~790 ms redirect from / → /en, giving a Lighthouse win.

Practical lessons

Use AI for the initial translation pass, then have native speakers review. Pure AI translation makes embarrassing mistakes with music terminology.
Keep English terms for industry jargon (e.g., “mastering,” “stems,” “BPM,” “MIDI”). Musicians worldwide use these terms.
RTL languages (Arabic, Hebrew, Urdu, Persian) need layout testing, not just translation. Flex layouts can break; test thoroughly.
Don’t translate dynamically. Load all translations at build time. next-intl’s server components avoid shipping translation bundles to the client unnecessarily.

Lesson 5 – Content Security Policy Will Break Everything You Love

Adding a proper CSP header inevitably starts a day of “whack‑a‑mole.” Every external script, font, analytics pixel, and auth widget needs explicit permission:

value: [
  "default-src 'self'",
  "script-src 'self' 'unsafe-eval' 'unsafe-inline' https://your-auth-provider.com https://*.yourdomain.com",
  "connect-src 'self' https://*.yourdomain.com https: blob: data: wss:",
  "style-src 'self' 'unsafe-inline' https://fonts.googleapis.com",
  "font-src 'self' data: https://fonts.gstatic.com",
  "media-src 'self' https: blob: data:",
  "worker-src 'self' blob:",
].join('; ')

The blob: and data: entries in media-src are crucial for audio apps — the Web Audio API creates blob URLs for playback, and OfflineAudioContext renders to data URIs.

Do it anyway. CSP is non‑negotiable for production apps handling payments and user data.

Lesson 6 – Optimizing Bundle Size With Next.js

Our initial bundle shipped the entirety of react‑icons, which is massive. Enabling Next.js’s optimizePackageImports gave us a big win:

experimental: {
  optimizePackageImports: [
    'react-icons/si',
    'react-icons/fa6',
    'react-icons/md',
    'react-icons/lu',
    'lucide-react',
    '@clerk/nextjs',
  ],
},

This tells Next.js to tree‑shake these packages more aggressively. For react-icons alone it cut ~200 KB from the bundle.

Other wins

inlineCss: true – eliminates the separate CSS request, reducing time‑to‑first‑paint.
Lazy‑load heavy viewers (MIDI viewer, waveform renderer) with next/dynamic.

What We’d Do Differently

Start with a streaming architecture. Retrofitting streaming into a request‑response mental model is painful.
Use S3‑compatible direct uploads from day 1. Don’t route binary files through your API layer.
Set up CSP on day 1. Adding it later means debugging every third‑party integration you’ve already embedded.
Invest in i18n infrastructure early. Adding a 32nd language is easy when your pipeline is automated; adding a 2nd language with hard‑coded strings everywhere is a nightmare.
Build your audio pipeline with OfflineAudioContext first, then port to real‑time. Getting offline rendering right guarantees your real‑time version will be correct.

Try It

If you want to see all of this in action, check out Gliss. You can generate a song from a text description, master it in your browser, and export — no account required for your first few creations.

The music‑AI space is moving incredibly fast. If you’re building anything with audio in the browser, we hope these lessons save you the debugging time we spent.

What’s the hardest technical challenge you’ve hit building with audio in the browser? We’d love to hear about it in the comments.

We Built a Full-Stack AI Music Agent with Next.js — Here's What We Learned

The Stack

Lesson 1: Streaming AI Responses Requires Rethinking Your Data Flow

What we learned

Lesson 2: Browser Audio Processing Is Harder Than You Think

Real‑time vs. offline rendering

MP3 encoding in the browser

Lesson 3: File Uploads on Vercel Have a Hidden Limit

The solution: direct client‑to‑storage uploads with pre‑signed URLs

Lesson 3 – Bypass Vercel’s Body‑Size Limit for Large Uploads

Lesson 4 – i18n at Scale Is a Product Decision, Not a Technical One

Practical lessons

Lesson 5 – Content Security Policy Will Break Everything You Love

Lesson 6 – Optimizing Bundle Size With Next.js

Other wins

What We’d Do Differently

Try It

Related posts

Alerts for self-hosted customer deployments

Stepping outside your role - how to gain an edge at work

The Vonage Dev Discussion

MLflow: primeiros passos em MLOps

The Stack

Lesson 1: Streaming AI Responses Requires Rethinking Your Data Flow

What we learned

Lesson 2: Browser Audio Processing Is Harder Than You Think

Real‑time vs. offline rendering

MP3 encoding in the browser

Lesson 3: File Uploads on Vercel Have a Hidden Limit

The solution: direct client‑to‑storage uploads with pre‑signed URLs

Lesson 3 – Bypass Vercel’s Body‑Size Limit for Large Uploads

Lesson 4 – i18n at Scale Is a Product Decision, Not a Technical One

Practical lessons

Lesson 5 – Content Security Policy Will Break Everything You Love

Lesson 6 – Optimizing Bundle Size With Next.js

Other wins

What We’d Do Differently

Try It

Related posts

Alerts for self-hosted customer deployments

Stepping outside your role - how to gain an edge at work

The Vonage Dev Discussion

MLflow: primeiros passos em MLOps

Lesson 1: Streaming AI Responses Requires Rethinking Your Data Flow

Lesson 2: Browser Audio Processing Is Harder Than You Think

Lesson 3: File Uploads on Vercel Have a Hidden Limit

Lesson 3 – Bypass Vercel’s Body‑Size Limit for Large Uploads

Lesson 4 – i18n at Scale Is a Product Decision, Not a Technical One

Lesson 5 – Content Security Policy Will Break Everything You Love

Lesson 6 – Optimizing Bundle Size With Next.js