When 5 Minutes Isn't Enough: Moving AI Ingestion from Sync to Async (And Saving 99% Compute)

Published: 3 days ago (February 12, 2026 at 10:38 PM EST)

3 min read

Source: Dev.to

Background

In a previous post I introduced Synapse, the AI system I built for my wife that uses a Knowledge Graph to give her LLM a “deep memory.” Early demos showed the graph updating in about 50 seconds after a chat ended, but real‑world usage quickly exposed a fundamental flaw.

The Problem

During 45‑minute chat sessions with dozens of messages, the “End Session” button would spin for minutes and eventually crash. The root cause wasn’t a simple timeout bug—it was the architecture.

Initial Synchronous Implementation

Convex (Orchestrator) → triggers an HTTP POST to my Python backend.
FastAPI (Brain) → calls Graphiti + Gemini to process the text.
FastAPI waits for the result and returns it.
Convex saves the result to the database.

This is a classic synchronous request‑reply pattern.

Why it failed: Convex Actions have a hard execution limit (5–10 minutes depending on the plan). Short conversations finished in 1–2 minutes, but larger sessions required 12–18 minutes, far exceeding the limit.

The Cascade of Failures

Added exponential‑backoff retries on Convex actions.
Each retry started a new background process while the previous one kept running, doubling token usage and creating “zombie” jobs.
The user still saw an error, and the backend was overwhelmed.

Diagnosis

OpenTelemetry traces (sent to Axiom) showed that ingestion wasn’t failing—it was simply slow, consistently taking 12–18 minutes for large sessions.

Switching to an Async Polling Architecture

When a task exceeds the time a client or server is willing to wait, the request must be decoupled from the response.

New Flow

Convex sends POST /ingest.
FastAPI immediately returns 202 Accepted with a jobId (≈ 300 ms).
FastAPI launches the heavy processing in a background task (asyncio.create_task).
Convex sleeps, then polls the job status every few minutes.

Polling Strategy

Switched from exponential to linear backoff.
Schedule: check after 5 minutes, then after 10 minutes, then every 10 minutes thereafter.
Reduces unnecessary load and noise on the server.

Resource Usage Comparison

Scenario	Action Time	Total Billed Compute	Token Waste
Synchronous	5 min (blocking) → timeout → retry (another 5 min)	~10–15 min	High (duplicate processing)
Async Polling	Trigger ≈ 300 ms, Poll ≈ 300 ms, Final fetch ≈ 300 ms	< 2 seconds	Minimal

We went from wasting ~10 minutes of compute per job to under 2 seconds of active execution time, while eliminating duplicate processing.

Lessons Learned

AI tasks are inherently slow. A “fast” LLM call can be 30 seconds; a “deep” knowledge‑graph update can be 15 minutes.
Don’t just increase timeouts. Decouple request and response to keep the system resilient and cost‑effective.
Linear backoff for polling matches the expected duration of long‑running jobs and reduces server chatter.

Code Repository

The implementation of this async request‑reply pattern is available in the following repositories:

Frontend (Body): synapse-chat-ai
Backend (Cortex): synapse-cortex

Call for Feedback

I’m interested in how others handle long‑running LLM tasks. Feel free to reach out on X or LinkedIn.