How We Built Platybot: An AI-Powered Analytics Assistant
Source: Pulumi Blog
Before Platybot, our #analytics Slack channel was a support queue
Every day, people from every team would ask questions:
- “Which customers use feature X?”
- “What’s our ARR by plan type?”
- “Do we have a report for template usage?”
Our two‑person data team was a bottleneck.
Our #analytics channel, before Platybot (dramatized).
Why we didn’t just throw an LLM at Snowflake
Without guardrails, large language models generate SQL that may run but silently returns the wrong answer:
- Different join logic
- Wrong filters
- Missing snapshot handling
- Incorrect summarization
We needed something that could answer reliably for most queries, otherwise we’d end up fixing LLM‑generated SQL manually.
Introducing Platybot
Platybot (platypus + bot, named after our mascot) is an AI‑powered analytics assistant that any Pulumi employee can use to query our Data Warehouse in natural language.
- Interfaces: Web App, Slack bot, Model Context Protocol (MCP) server
- Deployment: Pulumi IaC (Infrastructure as Code)
The most important lesson we learned building it is that the AI was the easy part. The semantic layer is what makes it work.
The problem with throwing AI at your Data Warehouse
The naive solution is obvious: connect an LLM to your database and let it write SQL. In practice, this fails, and the failure mode is insidious.
Examples from our warehouse:
-
ARR is a snapshot metric.
If you query ARR (Annual Recurring Revenue) without filtering by end‑of‑period dates (last day of month or quarter), you get duplicate rows and wildly inflated numbers. An LLM doesn’t automatically know this. -
Account queries need exclusions.
Most queries should exclude Pulumi’s own internal accounts and deleted ones. Without these filters, you’re counting test data alongside real customers. -
User queries need employee filters.
Querying active users without excluding Pulumi employees inflates adoption metrics.
The danger shows up when results feel decision‑ready before anyone has validated how the numbers were derived. A confidently wrong ARR figure presented to leadership is worse than no answer at all.
Many organizations run into the same constraint once data usage spreads:
- Dashboards answer yesterday’s questions, not today’s.
- Ad‑hoc LLM queries answer today’s questions, but incorrectly.
The gap between “I have a question” and “I have a trustworthy answer” is where data teams can get stuck.
Why we built a semantic layer first
Before writing a single line of AI code, we built a semantic layer using Cube (open source). This was the hardest, least glamorous, and most important part of the entire project.
What is a semantic layer?
A semantic layer is a shared, versioned definition of what your business metrics mean.
“Monthly active users” starts with
COUNT(DISTINCT user_id), but the aggregation is only the outer layer. It must be applied to the right table (fct_pulumi_operations), with the right filters (exclude Pulumi employees, exclude deleted organizations), scoped to a calendar month, and counting only users who performed real operations — not just previews.
The semantic layer encodes all of this once, and the AI can use it to build queries without needing to guess which tables are related, whether the relationship is one‑to‑one or many‑to‑many, etc.
Our domain organization
We organized our data into seven domains:
- Revenue
- Cloud
- Core
- Clickstream
- Community
- Support
- People
Each domain contains cubes (well‑defined, composable views) with explicit measures, dimensions, and joins.
Real example – Cloud domain (trimmed for readability)
cubes:
- name: fct_pulumi_operations
sql_table: CLOUD.FCT_PULUMI_OPERATIONS
description: >
Pulumi CLI operations (update, preview, destroy, refresh).
Each row is one operation with resource changes, duration,
and CLI environment.
joins:
- name: dim_organization
sql: '{CUBE}.ORGANIZATION_HK = {dim_organization}.ORGANIZATION_HK'
relationship: many_to_one
- name: dim_stacks
sql: '{CUBE}.STACK_PROGRAM_HK = {dim_stacks}.STACK_PROGRAM_HK'
relationship: many_to_one
- name: dim_user
sql: '{CUBE}.USER_HK = {dim_user}.USER_HK'
relationship: many_to_one
measures:
- name: count
type: count
- name: resource_count
sql: '{CUBE}.RESOURCE_COUNT'
type: sum
description: Number of resources active when the operation finished
- name: operations_succeeded
sql: "CASE WHEN {CUBE}.OPERATION_STATE = 'succeeded' THEN 1 ELSE 0 END"
type: sum
description: Count of operations that succeeded
# ... plus other measures
dimensions:
- name: operation_type
sql: '{CUBE}.OPERATION_TYPE'
type: string
description: Type of operation (Refresh, Update, Destroy, etc)
- name: operation_state
sql: '{CUBE}.OPERATION_STATE'
type: string
description: Last known state of this operation
# ... 20+ more dimensions
The key insight
The semantic layer makes the AI’s job tractable. Instead of generating arbitrary SQL from scratch—where the search space is “any possible SQL query against hundreds of tables”—the AI picks from a defined set of measures, dimensions, and joins. The search space shrinks from almost infinite to a well‑bounded set of valid combinations.
A semantic layer is to an AI data assistant what a type system is to a programming language. It doesn’t eliminate errors, but it makes entire categories of mistakes structurally impossible. The AI can’t calculate ARR wrong because it doesn’t calculate ARR at all; it references a pre‑defined measure that already encodes the correct logic.
Building the semantic layer was mainly data‑engineering work. We already had agreed‑upon metric definitions across the company. The challenge was encoding those definitions into Cube: specifying the correct joins, wiring up the right filters, and ensuring every measure matched the logic our dashboards already used. Tedious, but essential.
Adding the AI layer
With the semantic layer in place, the AI becomes a translation problem: convert natural‑language queries into selections of the pre‑defined measures, dimensions, and joins. The rest of the system (Web App, Slack bot, MCP server) handles the orchestration, execution, and result presentation.
End of cleaned markdown segment.
Platybot Overview
Platybot translates natural‑language questions into Cube queries (or read‑only SQL when needed).
It works with three LLM models—Claude Opus 4.6, Claude Sonnet 4.5, and Gemini 3 Pro—letting users pick the model that best fits the task.
- Claude shines on structured‑data queries.
- Gemini excels on text‑heavy tasks (e.g., analyzing call transcriptions).
The system prompt gives the model full awareness of available cubes, their measures, dimensions, and joins. When a user asks, for example, “What’s the ARR breakdown by plan type?”, the model does not write SQL. Instead it builds a Cube query that:
- Selects the
total_arrmeasure from the ARR table. - Groups by the
skudimension from the subscriptions cube.
Cube then generates the underlying SQL, handles joins, and applies any filters.
If the semantic layer can’t cover an edge case, the model can fall back to a read‑only SQL query against Snowflake (often already close to the final answer).
Query‑Generation Workflow
The workflow mirrors what a human analyst does:
- Discover domains – Identify the business areas that contain the data you need.
- Explore cubes – Examine the available semantic models (measures, dimensions, joins).
- Understand the schema – Grasp how tables relate and what each metric represents.
- Construct & execute the query – Build a Cube query (or fallback SQL) and run it.
This step‑by‑step approach ensures reliability: Platybot is a translator between intent and a well‑defined data model, not a generic data‑science engine. The creativity lies in interpreting the question, not in inventing SQL.
Meeting Users Where They Are
Platybot is exposed through three interfaces, each tuned for a different workflow.
1. Web UI
- Built with React 19, TypeScript, Vite, and Tailwind.
- Conversational, multi‑turn UI for deep data exploration.
- Features:
- Table visualisation, data export, conversation history.
- Shareable reports (permanent link, company‑auth protected) that show the agent’s reasoning, every query, and a data sample.
- Direct “Run in Metabase” links for any query, enabling scheduling or extension.
- Highest adoption: users perform multi‑step analyses, follow‑up questions, and cross‑dimensional comparisons here.
Fun Easter egg – While the analysis iterates (discovering domains, exploring cubes, executing queries), a platypus‑themed runner game runs in the corner (think Chrome’s dinosaur game, but with a skate‑boarding platypus). It gives users something to do while waiting for longer iterations.
2. Slack Bot
- Invoke with
@Platybotin any channel. - Replies in a thread, keeping the channel tidy while making results visible to the whole team.
- Ideal for quick look‑ups (e.g., “What’s the ARR for account X?”) where the answer fits in a single message.
- Complements the Web UI by handling “in‑context” queries.
3. Model Context Protocol (MCP)
The MCP server (launched Feb 4 2026) turns Platybot into a capability that any MCP‑compatible client can consume.
| Tool | Description |
|---|---|
list_domains | Discover available data domains |
list_cubes | Enumerate cubes within a domain |
get_cube_details | Retrieve measures, dimensions, and joins for a cube |
execute_cube_query | Run a structured Cube query |
execute_sql_query | Fallback read‑only SQL execution |
get_generated_sql | Preview the SQL that would be run (no execution) |
Setup (single command)
claude mcp add --transport http platybot
Use‑case example: An engineer writing a post‑mortem can pull live metrics directly from the terminal; an analyst can embed live queries inside Claude Code. The data warehouse becomes ambient—always available, never in the way.
Security
- OAuth 2.0 Device Authorization Flow with PKCE (CLI‑friendly).
- Domain restriction to
@pulumi.com. - Read‑only enforcement, rate limiting, and full audit logging.
Meta note: We used the Platybot MCP server itself to query our Slack messages table to verify the usage numbers reported above.
Deploying with Pulumi
Platybot runs on AWS and is fully managed with Pulumi IaC. The stack includes:
- ECS Fargate – Application containers.
- Application Load Balancer – Traffic routing.
- EFS – Persistent storage.
- ECR – Container image registry.
- Route 53 – DNS.
The infrastructure is intentionally lightweight (small instance size) because it serves internal users only.
Representative Fargate Service Definition
const service = new awsx.ecs.FargateService("platybot", {
cluster: cluster.arn,
assignPublicIp: false,
taskDefinitionArgs: {
container: {
name: "platybot",
image: image.imageUri,
essential: true,
portMappings: [{
containerPort: 3000,
targetGroup: targetGroup,
}],
environment: [
{ name: "CUBE_API_URL", value: cubeApiUrl },
{ name: "NODE_ENV", value: "production" },
],
secrets: [
{ name: "ANTHROPIC_API_KEY", valueFrom: anthropicKeySecret.arn },
{ name: "SNOWFLAKE_PASSWORD", valueFrom: snowflakePasswordSecret.arn },
],
logConfiguration: {
logDriver: "awslogs",
options: {
"awslogs-group": logGroup.name,
"awslogs-region": aws.config.region!,
"awslogs-stream-prefix": "platybot",
},
},
},
},
});
TL;DR
- Platybot = natural‑language → Cube query (or read‑only SQL).
- Supports Claude and Gemini models, each with its own strengths.
- Three access points: Web UI, Slack bot, and MCP (programmatic).
- Secure, OAuth‑protected, read‑only, with full audit logging.
- Deployed on AWS via Pulumi, using a minimal Fargate service.
All of the original content and structure have been retained, now presented in clean, navigable Markdown.
"platybot",
},
},
},
},
},
});
Dogfooding Pulumi for Our Internal Tools
Dogfooding Pulumi for our internal tools has real benefits beyond the obvious.
- Staging environments are trivial – spin up a full copy of the stack with different parameters.
- Secrets management is built‑in, so API keys and database credentials never touch a config file.
- When something needs to change, the diff‑and‑preview workflow (
pulumi preview) catches mistakes before they hit production.
Results
Since launch in September 2025:
-
1,700+ questions from 83 employees across every team.
-
Usage grew steadily—from ~8 questions per day in the first month to 18 per day by January 2026, with 51 unique users that month alone.
-
The queries were real production work, not experimentation:
- Customer analysis for sales calls
- Resource breakdowns for account managers
- Blog performance metrics for marketing
- Policy‑adoption research for product
- ARR deep‑dives for leadership
The impact on the data team was immediate. Questions that used to land in the #analytics channel and wait for a human now get answered in seconds. The data team shifted from answering routine queries to building better models and improving data quality. We went from being a help desk to being a platform team.
Accuracy is harder to quantify, but the semantic layer gives us confidence. Because Platybot uses pre‑defined measures instead of generating arbitrary SQL, entire categories of errors (wrong joins, missing filters, incorrect aggregations) are structurally less likely. When the model does get something wrong, it’s usually in interpreting the question—not in the data—and users can verify that through the report’s reasoning trace.
What We Learned
-
The semantic layer matters more than the model.
We spent more time defining metrics in Cube than we did on any AI work. That investment pays for itself: swap the model, and the answers stay correct. Swap the semantic layer, and nothing works. -
Meet users where they already are.
Different UIs handle different cognitive loads.- Slack catches quick questions.
- The web UI handles deep exploration.
- MCP makes data ambient for AI‑native workflows.
Each interface serves a different mode of thinking.
-
Transparency builds trust.
Showing the AI’s reasoning, the queries it ran, and linking to Metabase for verification turned skeptics into regular users. People don’t trust a black box, but they’ll trust a tool that shows its work.