OpenAI's AI data agent, built by two engineers, now serves 4,000 employees — and the company says anyone can replicate it

Published: (March 3, 2026 at 09:00 AM EST)
10 min read

Source: VentureBeat

When an OpenAI finance analyst needed to compare revenue across geographies and customer cohorts last year, it took hours of work — hunting through 70,000 datasets, writing SQL queries, verifying table schemas.
Today, the same analyst types a plain‑English question into Slack and gets a finished chart in minutes.

  • Built by two engineers in three months.
  • 70 % of its code was written by AI.
  • Used daily by more than 4,000 of OpenAI’s ~5,000 employees, making it one of the most aggressive deployments of an AI data agent inside any company.

In an exclusive interview with VentureBeat, Emma Tang, head of data infrastructure at OpenAI, offered a rare look inside the system — how it works, how it fails, and what it signals about the future of enterprise data. The conversation, paired with the company’s blog post announcing the tool, paints a picture of a company that turned its own AI on itself and discovered something that every enterprise will soon confront:

The bottleneck to smarter organizations isn’t better models. It’s better data.
“The agent is used for any kind of analysis,” Tang said. “Almost every team in the company uses it.”

A Plain‑English Interface to 600 PB of Corporate Data

MetricValue
Data platform size> 600 petabytes
Number of datasets≈ 70,000
Employees at OpenAI≈ 5,000
Daily users of data tools> 4,000

Tang’s Data Platform team (under infrastructure, overseeing big‑data systems, streaming, and the data‑tooling layer) serves this massive internal user base.

How the agent is accessed

  • Slack
  • Web interface
  • IDEs
  • Codex CLI
  • OpenAI’s internal ChatGPT app

The agent, built on GPT‑5.2, accepts plain‑English questions and returns charts, dashboards, and long‑form analytical reports. In follow‑up responses shared with VentureBeat, the team estimated it saves 2–4 hours of work per query. Tang emphasized a larger, harder‑to‑measure win:

“The agent gives people access to analysis they simply couldn’t have done before, regardless of how much time they had.”
“Engineers, growth, product, as well as non‑technical teams, who may not know all the ins and outs of the company data systems and table schemas can now pull sophisticated insights on their own.”

From Revenue Breakdowns to Latency Debugging – One Agent Does It All

  • Finance – Revenue comparisons across geographies and customer cohorts.
  • Product Management – Understanding feature adoption.
  • Engineering – Diagnosing performance regressions (e.g., “Is a specific ChatGPT component slower than yesterday? Which latency components explain the change?”).
  • Cross‑departmental analysis – Combining sales data, engineering metrics, and product analytics in a single query, something most enterprise AI agents cannot do because they are siloed.

“We launched department by department, curating specific memory and context for each group, but at some point it’s all in the same database.” – Tang

How Codex Solved the Hardest Problem in Enterprise Data

The hardest technical challenge: Finding the right table among 70,000 datasets.

Codex (OpenAI’s AI coding agent) plays three roles:

  1. Front‑end gateway – Users access the data agent through Codex via MCP.
  2. Code generator – Codex generated > 70 % of the agent’s code, enabling two engineers to ship the product in three months.
  3. Daily asynchronous enrichment process – Codex examines important data tables, parses the underlying pipeline code, and determines each table’s:
    • Upstream & downstream dependencies
    • Ownership
    • Granularity
    • Join keys
    • Similar tables

“We give it a prompt, have Codex look at the code and respond with what we need, and then persist that to the database.” – Tang

When a user asks about revenue, the agent searches a vector database to find which tables Codex has already mapped to that concept.

Six Context Layers the Agent Uses

LayerDescription
1️⃣ Schema metadataBasic table/column definitions
2️⃣ Curated expert descriptionsHuman‑written annotations
3️⃣ Institutional knowledgeExtracted from Slack, Google Docs, Notion
4️⃣ Learning memoryStores corrections from prior conversations
5️⃣ Codex EnrichmentTable‑level lineage, ownership, join keys
6️⃣ Live warehouse queriesFallback when no prior info exists

The team also tiers historical query patterns. “All query history is everybody’s ‘select star, limit 10.’ It’s not really helpful,” Tang noted.

Takeaways

  • Speed: Queries that once took hours now finish in minutes.
  • Accessibility: Non‑technical staff can perform sophisticated analyses without deep knowledge of data schemas.
  • Cross‑functional insight: A single agent can blend data across traditionally siloed departments.
  • AI‑generated infrastructure: Codex not only wrote most of the agent’s code but also continuously enriches the data catalog, turning a massive, unwieldy data lake into a searchable knowledge base.

OpenAI’s AI data agent demonstrates that the next frontier for enterprises is not bigger models, but smarter, AI‑augmented data pipelines that make the right data instantly reachable.

Canonical Dashboards and Executive Reports

“Canonical dashboards and executive reports — where analysts invested significant effort determining the correct representation — get flagged as ‘source of truth.’ Everything else gets deprioritized.”

The Prompt That Forces the AI to Slow Down and Think

Even with six context layers, Tang was remarkably candid about the agent’s biggest behavioral flaw: over‑confidence.

“It’s a really big problem, because what the model often does is feel overconfident,” Tang said. “It’ll say, ‘This is the right table,’ and just go forth and start doing analysis. That’s actually the wrong approach.”

The Fix: Prompt Engineering

The solution came through a prompt that forces the agent to linger in a discovery phase. The prompt reads almost like coaching a junior analyst:

“Before you run ahead with this, I really want you to do more validation on whether this is the right table. So please check more sources before you go and create actual data.”

Key take‑aways from Tang’s evaluation

  • More time in discovery → better results.
  • Less context can produce better results.

“It’s very easy to dump everything in and just expect it to do better,” Tang said. “From our evals, we actually found the opposite. The fewer things you give it, and the more curated and accurate the context is, the better the results.”

Building Trust

  • The agent streams its intermediate reasoning to users in real time.
  • It exposes which tables it selected and why, linking directly to underlying query results.
  • Users can interrupt the agent mid‑analysis to redirect it.
  • The system checkpoints its progress, enabling it to resume after failures.
  • At the end of every task, the model evaluates its own performance:

“We ask the model, ‘how did you think that went? Was that good or bad?’” Tang said. “And it’s actually fairly good at evaluating how well it’s doing.”

Guardrails That Are Deliberately Simple — and Surprisingly Effective

When it comes to safety, Tang took a pragmatic approach that may surprise enterprises expecting sophisticated AI‑alignment techniques.

“I think you just have to have even more dumb guardrails,” she said.

Core Guardrails

  1. Strong access control – the agent always uses the user’s personal token, so it can only access what the user can.
  2. Interface‑layer only – it inherits the same permissions that govern OpenAI’s data and never appears in public channels; it lives only in private channels or a user’s own interface.
  3. Write restrictions – write access is limited to a temporary test schema that is wiped periodically and cannot be shared.
  4. No random writes – the agent never writes to systems arbitrarily.

Feedback Loop

  • Employees flag incorrect results directly, prompting the team to investigate.
  • The model’s self‑evaluation provides an additional safety check.

Future Direction

Tang noted that the long‑term plan is to move toward a multi‑agent architecture where specialized agents monitor and assist each other:

“We’re moving towards that eventually, but right now, even as it is, we’ve gotten pretty far.”

Why OpenAI Won’t Sell This Tool — but Wants You to Build Your Own

Despite obvious commercial potential, OpenAI told VentureBeat that it has no plans to productize its internal data agent. Instead, the strategy is to provide building blocks so enterprises can construct their own solutions.

“We use all the same APIs that are available externally,” Tang said. “The Responses API, the Evals API. We don’t have a fine‑tuned model. We just use 5.2. So you can definitely build this.”

Supporting Ecosystem

  • OpenAI Frontier (launched early February) – an end‑to‑end platform for enterprises to build and manage AI agents.
  • Partnerships with McKinsey, BCG, Accenture, and Capgemini to sell and implement the platform.
  • AWS + OpenAI are jointly developing a Stateful Runtime Environment for Amazon Bedrock, mirroring some persistent‑context capabilities of OpenAI’s data agent.
  • Apple recently integrated Codex directly into Xcode.

Codex Adoption

  • Used by 95 % of engineers at OpenAI, reviewing all pull requests before they’re merged.
  • Global weekly active users have tripled since the start of the year, surpassing one million.
  • Overall usage has grown more than fivefold.

“Codex isn’t even a coding tool anymore. It’s much more than that,” Tang said. “I see non‑technical teams use it to organize thoughts, create slides, and produce daily summaries.”

One engineering manager has Codex review her notes each morning, identify the most important tasks, pull in Slack messages and DMs, and draft responses. “It’s really operating on her behalf in a lot of ways,” Tang added.

The Unsexy Prerequisite That Will Determine Who Wins the AI‑Agent Race

When asked what other enterprises should take away from OpenAI’s experience, Tang didn’t point to model capabilities or clever prompt engineering. She highlighted something far more mundane:

“This is not sexy, but data governance is really important for data agents to work well,” she said. “Your data needs to be clean enough and annotated enough, and there needs to be a source of truth somewhere for the agent to crawl.”

Key Points

  • The underlying infrastructure—storage, compute, orchestration, and business‑intelligence layers—has not been replaced by the agent.
  • Those tools are still required for the agent to do its job.
  • The agent serves as a fundamentally new entry point for data intelligence, offering a more autonomous and accessible interface than anything that came before.

Tang closed the interview.

“Companies that adopt this are going to see the benefits very rapidly,” she said.
“And companies that don’t are going to fall behind. It’s going to pull apart. The companies who use it are going to advance very, very quickly.”

Asked whether that acceleration worried her own colleagues — especially after a wave of recent layoffs at companies like Block — Tang paused.

“How much we’re able to do as a company has accelerated,” she said, “but it still doesn’t match our ambitions, not even one bit.”

0 views
Back to Blog

Related posts

Read more »