Appends for AI apps: Stream into a single message with Ably AI Transport

Published: (February 26, 2026 at 07:05 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

Cover image for Appends for AI apps: Stream into a single message with Ably AI Transport

Ably Blog

Streaming tokens is easy. Resuming cleanly is not. A user refreshes mid‑response, another client joins late, a mobile connection drops for 10 seconds, and suddenly your “one answer” is 600 tiny messages that your UI has to stitch back together. Message history turns into fragments. You start building a side store just to reconstruct “the response so far”.

This is not a model problem. It’s a delivery problem.

That’s why we developed message appends for Ably AI Transport. Appends let you stream AI output tokens into a single message as they are produced, so you get progressive rendering for live subscribers and a clean, compact response in history.

The failure mode we’re fixing

The usual implementation streams each token as a separate message. This works perfectly on a stable connection, but in production clients disconnect and resume mid‑stream: refreshes, mobile drop‑outs, backgrounded tabs, and late joins.

When real reconnects and refreshes happen you inherit work you didn’t plan for:

  • ordering
  • deduplication
  • buffering
  • “latest wins” logic
  • replay rules that make history and realtime agree

You can build it, but it quietly eats weeks of engineering time.

Diagram of token‑per‑message problem

With appends you avoid that by changing the shape of the data. Instead of hundreds of token messages you have one response message whose content grows over time.

The pattern: create once, append many

In Ably AI Transport you publish an initial response message and capture its server‑assigned serial. That serial is what you append to.

// Create the empty response message
const result = await channel.publish({ name: 'response', data: '' });
const { serials: [msgSerial] } = result;

Now, as your model yields tokens, you append each fragment to that same message:

if (event.type === 'token') {
  channel.appendMessage({ serial: msgSerial, data: event.text });
}

What changes for clients

Subscribers still see progressive output, but they see it as actions on the same message serial. A response starts with a create, tokens arrive as appends, and occasionally clients receive a full‑state update to resynchronise (e.g., after a reconnection).

Most UIs already implement this shape; with appends it becomes boringly predictable:

switch (message.action) {
  case 'message.append':
    renderAppend(message.serial, message.data);
    break;
  case 'message.update':
    renderReplace(message.serial, message.data);
    break;
}

The important difference is that history and realtime stop disagreeing, without extra client code. You render progressively for live users, and you still treat the response as one message for storage, retrieval, and rewind.

Reconnects and refreshes stop being special cases

Short disconnects are one thing. Refreshes are painful because local state is gone, and streaming each token as a separate message forces you to replay fragments and hope the client reconstructs the same response.

With a message‑per‑response approach, hydration is straightforward because there is always a current accumulated version of the response message. Clients joining late or reloading can fetch the latest state as a single message and continue.

const channel = realtime.channels.get('ai:chat', {
  params: { rewind: '2m' }   // Rewind 2 minutes
});

Now rewind and history become useful again because you are rewinding meaningful messages, not token confetti.

Token rates without token‑rate pain

Models can emit tokens far faster than most realtime setups want to publish. Publishing a message per token leads to rate‑limit problems and forces you to batch in your code.

Appends are designed for high‑frequency workloads and include automatic roll‑ups. Subscribers still receive progressive updates, but Ably can roll up rapid appends under the hood so you don’t have to build your own throttling layer.

If you need to tune the trade‑off between smoothness and message rate, adjust appendRollupWindow:

  • Smaller windows → more responsive but higher message‑rate usage.
  • Larger windows → more aggressive batching, fewer messages.

Enabling appends

Appends require the “Message annotations, updates, appends, and deletes” channel rule for the namespace you’re using. Enabling it also means messages are persisted, which affects usage and billing.

Why this is a better default for AI output

If you are shipping agentic AI apps, you eventually need three things:

  1. Progressive rendering for live users.
  2. A single, compact representation of the full response for storage and retrieval.
  3. Robust handling of reconnects, refreshes, and late joins without custom plumbing.

Message appends give you all three out of the box. 🎉

at the same time:

- streaming UX  
- history that's usable  
- recovery that does not depend on luck  

Appends are how you get there without building your own "message reconstruction" subsystem. If you want the deeper mechanics (including the *message‑per‑response* pattern and rollup tuning), the [AI Transport docs](https://ably.com/docs/ai-transport) are the best place to start.
0 views
Back to Blog

Related posts

Read more »