Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas
Source: Hacker News
Hey HN!
We’re Ashwin and Akshay from Spine AI – https://www.getspine.ai
What is Spine Swarm?
Spine Swarm is a multi‑agent system that works on an infinite visual canvas to complete complex, non‑coding projects such as:
- Competitive analysis
- Financial modeling
- SEO audits
- Pitch decks
- Interactive prototypes
- …and many more
▶️ Demo video: https://youtu.be/R_2-ggpZz0Q
Our Story
- Friends for 13+ years – met in an ML course at NTU (the “North Spine” campus area that inspired our name).
- Went through Y Combinator S23.
- Spent ~3 years iterating on Spine across many product versions.
The Core Insight
Chat is the wrong interface for complex AI work.
- Chat is a linear thread, but real projects are non‑linear.
- Relying on a chatbot to juggle context implicitly makes it impossible to:
- See how pieces are connected.
- Correct a single step without re‑running everything.
- Branch off and explore multiple strategies side‑by‑side.
We needed a workspace where the structure of the work is explicit and user‑controllable, not hidden inside a context window.
The Canvas + Blocks Model
| Concept | Description |
|---|---|
| Infinite visual canvas | Think in blocks instead of threads. |
| Blocks | Abstractions on top of AI models (LLM calls, image generation, web browsing, apps, slides, spreadsheets, etc.). |
| Lego‑brick analogy | Each block does something specific; they can be snapped together and composed in countless ways. |
| Connections | Any block can connect to any other block, guaranteeing context passing regardless of type. |
| Model‑agnostic | A single workflow can mix OpenAI LLMs, Nano Banana Pro image generation, Claude for interactive apps, etc. |
| Fan‑out | Multiple blocks can branch from the same input, analyze it with different models, then feed results into a downstream synthesizer. |
Evolution of the UI
Manual canvas (v1) – Users entered prompts, chose models, ran blocks, and made connections themselves.
- Loved by founders & product managers: easy branching (prototype, PRD, competitive critique, pitch deck) from a shared upstream context.
Chat‑layer request – New users wanted a chat interface that would generate & connect blocks for them.
Autonomous agents – Building the chat layer revealed that agents could run hours‑long autonomous workflows, keeping context clean by delegating work to blocks and storing intermediate results on the canvas.
How It Works Now
- Task submission → a central orchestrator decomposes the task into subtasks.
- Persona agents (specialized per subtask) operate on canvas blocks:
- Override default settings (model, prompt) as needed.
- Pick the best model for each block; sometimes run the same block with multiple models for comparison.
- Parallel execution – Independent subtasks run concurrently; downstream agents automatically receive upstream context.
- Human‑in‑the‑loop – Any agent can pause and ask the user for clarification/feedback before continuing.
- Iterative refinement – After output, you can select a subset of blocks and iterate via chat without re‑running the whole workflow.
Why the Canvas Matters
- Provides a persistent, structured representation of the entire project that any agent can read/write at any point.
- Avoids the context degradation typical in multi‑agent pipelines (agents no longer need to hold everything in memory).
- Enables explicit handoffs between agents, improving efficiency and auditability.
- Every step is fully auditable, allowing you to trace exactly how each conclusion was reached.
Benchmarks
| Benchmark | Scope | Result |
|---|---|---|
| DeepMind DeepSearchQA | 900 questions across 17 fields, each requiring a causal chain of steps. | 87.6 % accuracy with zero human intervention. |
| GAIA Level 3 | Prior benchmark where we discovered many ground‑truth errors. | 1.0 % hit‑rate (after correcting benchmark issues). |
- For DeepSearchQA we used only the relevant block types (LLM calls, web browsing, tables) and disabled human clarification, forcing agents to run fully independently.
- The auditability exposed actual errors in the older GAIA benchmark (wrong or ambiguous expected answers) – something a black‑box pipeline would miss.
Full methodology, architecture details, and benchmark error analysis are available in our write‑up:
https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-…
We’re excited to keep pushing the limits of multi‑agent, canvas‑based AI workflows. Feedback and questions are welcome!
Overview
Measure accuracy on closed‑ended questions. Turns out the same architecture also leads to better open‑ended outputs like decks, reports, and prototypes with minimal supervision.
We’ve seen early users split into two camps:
- Live observers – watch the agents work and jump in to redirect mid‑flow.
- Task queue users – queue a task and return later to a finished deliverable.
Both approaches work because the canvas preserves the full chain of work, so you can audit or intervene whenever you want.
Quick Starter Task
Try this: give the system your website URL and ask for:
- A full SEO analysis
- Competitive landscape overview
- A prioritized growth roadmap with a slide deck
You’ll see multiple agents spin up on the canvas simultaneously.
Common Use Cases
- Fundraising pitch decks with financial models
- Prototyping features from screenshots and PRDs
- Competitive‑analysis reports
- Deep‑dive learning plans that research a topic from multiple angles and produce structured material you can explore further
Pricing
- Usage‑based credits tied to block usage and the underlying models used.
- Agents typically consume more credits than manual workflows because they’re tuned to get the best possible outcome (they pick the best blocks and do more work).
Details:
- Free tier available.
- Caveat: We sized the free tier to let you try a real task, but tasks vary in complexity. If you run out of credits before you’ve had a proper chance to explore, email us at founders@getspine.ai and we’ll work with you.
Feedback Request
We’d love your feedback on the experience:
- What worked?
- What didn’t?
- Where did it fall short?
We’re also curious how others here approach complex, multi‑step AI work beyond coding.
- What tools are you using?
- What breaks first?
We’ll be in the comments all day.
Comments URL:
Points: 8
# Comments: 3