How We Built Platybot: An AI-Powered Analytics Assistant

Published: (February 10, 2026 at 07:00 PM EST)
11 min read

Source: Pulumi Blog

Before Platybot, our #analytics Slack channel was a support queue

Every day, people from every team would ask questions:

  • “Which customers use feature X?”
  • “What’s our ARR by plan type?”
  • “Do we have a report for template usage?”

Our two‑person data team was a bottleneck.

Our #analytics channel, before Platybot (dramatized).


Why we didn’t just throw an LLM at Snowflake

Without guardrails, large language models generate SQL that may run but silently returns the wrong answer:

  • Different join logic
  • Wrong filters
  • Missing snapshot handling
  • Incorrect summarization

We needed something that could answer reliably for most queries, otherwise we’d end up fixing LLM‑generated SQL manually.


Introducing Platybot

Platybot (platypus + bot, named after our mascot) is an AI‑powered analytics assistant that any Pulumi employee can use to query our Data Warehouse in natural language.

  • Interfaces: Web App, Slack bot, Model Context Protocol (MCP) server
  • Deployment: Pulumi IaC (Infrastructure as Code)

The most important lesson we learned building it is that the AI was the easy part. The semantic layer is what makes it work.


The problem with throwing AI at your Data Warehouse

The naive solution is obvious: connect an LLM to your database and let it write SQL. In practice, this fails, and the failure mode is insidious.

Examples from our warehouse:

  1. ARR is a snapshot metric.
    If you query ARR (Annual Recurring Revenue) without filtering by end‑of‑period dates (last day of month or quarter), you get duplicate rows and wildly inflated numbers. An LLM doesn’t automatically know this.

  2. Account queries need exclusions.
    Most queries should exclude Pulumi’s own internal accounts and deleted ones. Without these filters, you’re counting test data alongside real customers.

  3. User queries need employee filters.
    Querying active users without excluding Pulumi employees inflates adoption metrics.

The danger shows up when results feel decision‑ready before anyone has validated how the numbers were derived. A confidently wrong ARR figure presented to leadership is worse than no answer at all.

Many organizations run into the same constraint once data usage spreads:

  • Dashboards answer yesterday’s questions, not today’s.
  • Ad‑hoc LLM queries answer today’s questions, but incorrectly.

The gap between “I have a question” and “I have a trustworthy answer” is where data teams can get stuck.


Why we built a semantic layer first

Before writing a single line of AI code, we built a semantic layer using Cube (open source). This was the hardest, least glamorous, and most important part of the entire project.

What is a semantic layer?

A semantic layer is a shared, versioned definition of what your business metrics mean.

“Monthly active users” starts with COUNT(DISTINCT user_id), but the aggregation is only the outer layer. It must be applied to the right table (fct_pulumi_operations), with the right filters (exclude Pulumi employees, exclude deleted organizations), scoped to a calendar month, and counting only users who performed real operations — not just previews.

The semantic layer encodes all of this once, and the AI can use it to build queries without needing to guess which tables are related, whether the relationship is one‑to‑one or many‑to‑many, etc.

Our domain organization

We organized our data into seven domains:

  1. Revenue
  2. Cloud
  3. Core
  4. Clickstream
  5. Community
  6. Support
  7. People

Each domain contains cubes (well‑defined, composable views) with explicit measures, dimensions, and joins.

Real example – Cloud domain (trimmed for readability)

cubes:
  - name: fct_pulumi_operations
    sql_table: CLOUD.FCT_PULUMI_OPERATIONS
    description: >
      Pulumi CLI operations (update, preview, destroy, refresh).
      Each row is one operation with resource changes, duration,
      and CLI environment.

    joins:
      - name: dim_organization
        sql: '{CUBE}.ORGANIZATION_HK = {dim_organization}.ORGANIZATION_HK'
        relationship: many_to_one
      - name: dim_stacks
        sql: '{CUBE}.STACK_PROGRAM_HK = {dim_stacks}.STACK_PROGRAM_HK'
        relationship: many_to_one
      - name: dim_user
        sql: '{CUBE}.USER_HK = {dim_user}.USER_HK'
        relationship: many_to_one

    measures:
      - name: count
        type: count
      - name: resource_count
        sql: '{CUBE}.RESOURCE_COUNT'
        type: sum
        description: Number of resources active when the operation finished
      - name: operations_succeeded
        sql: "CASE WHEN {CUBE}.OPERATION_STATE = 'succeeded' THEN 1 ELSE 0 END"
        type: sum
        description: Count of operations that succeeded
      # ... plus other measures

    dimensions:
      - name: operation_type
        sql: '{CUBE}.OPERATION_TYPE'
        type: string
        description: Type of operation (Refresh, Update, Destroy, etc)
      - name: operation_state
        sql: '{CUBE}.OPERATION_STATE'
        type: string
        description: Last known state of this operation
      # ... 20+ more dimensions

The key insight

The semantic layer makes the AI’s job tractable. Instead of generating arbitrary SQL from scratch—where the search space is “any possible SQL query against hundreds of tables”—the AI picks from a defined set of measures, dimensions, and joins. The search space shrinks from almost infinite to a well‑bounded set of valid combinations.

A semantic layer is to an AI data assistant what a type system is to a programming language. It doesn’t eliminate errors, but it makes entire categories of mistakes structurally impossible. The AI can’t calculate ARR wrong because it doesn’t calculate ARR at all; it references a pre‑defined measure that already encodes the correct logic.

Building the semantic layer was mainly data‑engineering work. We already had agreed‑upon metric definitions across the company. The challenge was encoding those definitions into Cube: specifying the correct joins, wiring up the right filters, and ensuring every measure matched the logic our dashboards already used. Tedious, but essential.


Adding the AI layer

With the semantic layer in place, the AI becomes a translation problem: convert natural‑language queries into selections of the pre‑defined measures, dimensions, and joins. The rest of the system (Web App, Slack bot, MCP server) handles the orchestration, execution, and result presentation.


End of cleaned markdown segment.

Platybot Overview

Platybot translates natural‑language questions into Cube queries (or read‑only SQL when needed).
It works with three LLM models—Claude Opus 4.6, Claude Sonnet 4.5, and Gemini 3 Pro—letting users pick the model that best fits the task.

  • Claude shines on structured‑data queries.
  • Gemini excels on text‑heavy tasks (e.g., analyzing call transcriptions).

The system prompt gives the model full awareness of available cubes, their measures, dimensions, and joins. When a user asks, for example, “What’s the ARR breakdown by plan type?”, the model does not write SQL. Instead it builds a Cube query that:

  1. Selects the total_arr measure from the ARR table.
  2. Groups by the sku dimension from the subscriptions cube.

Cube then generates the underlying SQL, handles joins, and applies any filters.
If the semantic layer can’t cover an edge case, the model can fall back to a read‑only SQL query against Snowflake (often already close to the final answer).


Query‑Generation Workflow

The workflow mirrors what a human analyst does:

  1. Discover domains – Identify the business areas that contain the data you need.
  2. Explore cubes – Examine the available semantic models (measures, dimensions, joins).
  3. Understand the schema – Grasp how tables relate and what each metric represents.
  4. Construct & execute the query – Build a Cube query (or fallback SQL) and run it.

This step‑by‑step approach ensures reliability: Platybot is a translator between intent and a well‑defined data model, not a generic data‑science engine. The creativity lies in interpreting the question, not in inventing SQL.


Meeting Users Where They Are

Platybot is exposed through three interfaces, each tuned for a different workflow.

1. Web UI

  • Built with React 19, TypeScript, Vite, and Tailwind.
  • Conversational, multi‑turn UI for deep data exploration.
  • Features:
    • Table visualisation, data export, conversation history.
    • Shareable reports (permanent link, company‑auth protected) that show the agent’s reasoning, every query, and a data sample.
    • Direct “Run in Metabase” links for any query, enabling scheduling or extension.
  • Highest adoption: users perform multi‑step analyses, follow‑up questions, and cross‑dimensional comparisons here.

Fun Easter egg – While the analysis iterates (discovering domains, exploring cubes, executing queries), a platypus‑themed runner game runs in the corner (think Chrome’s dinosaur game, but with a skate‑boarding platypus). It gives users something to do while waiting for longer iterations.

2. Slack Bot

  • Invoke with @Platybot in any channel.
  • Replies in a thread, keeping the channel tidy while making results visible to the whole team.
  • Ideal for quick look‑ups (e.g., “What’s the ARR for account X?”) where the answer fits in a single message.
  • Complements the Web UI by handling “in‑context” queries.

3. Model Context Protocol (MCP)

The MCP server (launched Feb 4 2026) turns Platybot into a capability that any MCP‑compatible client can consume.

ToolDescription
list_domainsDiscover available data domains
list_cubesEnumerate cubes within a domain
get_cube_detailsRetrieve measures, dimensions, and joins for a cube
execute_cube_queryRun a structured Cube query
execute_sql_queryFallback read‑only SQL execution
get_generated_sqlPreview the SQL that would be run (no execution)

Setup (single command)

claude mcp add --transport http platybot 

Use‑case example: An engineer writing a post‑mortem can pull live metrics directly from the terminal; an analyst can embed live queries inside Claude Code. The data warehouse becomes ambient—always available, never in the way.

Security

  • OAuth 2.0 Device Authorization Flow with PKCE (CLI‑friendly).
  • Domain restriction to @pulumi.com.
  • Read‑only enforcement, rate limiting, and full audit logging.

Meta note: We used the Platybot MCP server itself to query our Slack messages table to verify the usage numbers reported above.


Deploying with Pulumi

Platybot runs on AWS and is fully managed with Pulumi IaC. The stack includes:

  • ECS Fargate – Application containers.
  • Application Load Balancer – Traffic routing.
  • EFS – Persistent storage.
  • ECR – Container image registry.
  • Route 53 – DNS.

The infrastructure is intentionally lightweight (small instance size) because it serves internal users only.

Representative Fargate Service Definition

const service = new awsx.ecs.FargateService("platybot", {
  cluster: cluster.arn,
  assignPublicIp: false,
  taskDefinitionArgs: {
    container: {
      name: "platybot",
      image: image.imageUri,
      essential: true,
      portMappings: [{
        containerPort: 3000,
        targetGroup: targetGroup,
      }],
      environment: [
        { name: "CUBE_API_URL", value: cubeApiUrl },
        { name: "NODE_ENV", value: "production" },
      ],
      secrets: [
        { name: "ANTHROPIC_API_KEY", valueFrom: anthropicKeySecret.arn },
        { name: "SNOWFLAKE_PASSWORD", valueFrom: snowflakePasswordSecret.arn },
      ],
      logConfiguration: {
        logDriver: "awslogs",
        options: {
          "awslogs-group": logGroup.name,
          "awslogs-region": aws.config.region!,
          "awslogs-stream-prefix": "platybot",
        },
      },
    },
  },
});

TL;DR

  • Platybot = natural‑language → Cube query (or read‑only SQL).
  • Supports Claude and Gemini models, each with its own strengths.
  • Three access points: Web UI, Slack bot, and MCP (programmatic).
  • Secure, OAuth‑protected, read‑only, with full audit logging.
  • Deployed on AWS via Pulumi, using a minimal Fargate service.

All of the original content and structure have been retained, now presented in clean, navigable Markdown.

"platybot",
},
},
},
},
},
});

Dogfooding Pulumi for Our Internal Tools

Dogfooding Pulumi for our internal tools has real benefits beyond the obvious.

  • Staging environments are trivial – spin up a full copy of the stack with different parameters.
  • Secrets management is built‑in, so API keys and database credentials never touch a config file.
  • When something needs to change, the diff‑and‑preview workflow (pulumi preview) catches mistakes before they hit production.

Results

Since launch in September 2025:

  • 1,700+ questions from 83 employees across every team.

  • Usage grew steadily—from ~8 questions per day in the first month to 18 per day by January 2026, with 51 unique users that month alone.

  • The queries were real production work, not experimentation:

    • Customer analysis for sales calls
    • Resource breakdowns for account managers
    • Blog performance metrics for marketing
    • Policy‑adoption research for product
    • ARR deep‑dives for leadership

The impact on the data team was immediate. Questions that used to land in the #analytics channel and wait for a human now get answered in seconds. The data team shifted from answering routine queries to building better models and improving data quality. We went from being a help desk to being a platform team.

Accuracy is harder to quantify, but the semantic layer gives us confidence. Because Platybot uses pre‑defined measures instead of generating arbitrary SQL, entire categories of errors (wrong joins, missing filters, incorrect aggregations) are structurally less likely. When the model does get something wrong, it’s usually in interpreting the question—not in the data—and users can verify that through the report’s reasoning trace.


What We Learned

  1. The semantic layer matters more than the model.
    We spent more time defining metrics in Cube than we did on any AI work. That investment pays for itself: swap the model, and the answers stay correct. Swap the semantic layer, and nothing works.

  2. Meet users where they already are.
    Different UIs handle different cognitive loads.

    • Slack catches quick questions.
    • The web UI handles deep exploration.
    • MCP makes data ambient for AI‑native workflows.

    Each interface serves a different mode of thinking.

  3. Transparency builds trust.
    Showing the AI’s reasoning, the queries it ran, and linking to Metabase for verification turned skeptics into regular users. People don’t trust a black box, but they’ll trust a tool that shows its work.


Try Pulumi for Free

Get started with Pulumi today →

0 views
Back to Blog

Related posts

Read more »