OpenAI's AI data agent, built by two engineers, now serves 4,000 employees — and the company says anyone can replicate it

Published: 6 hours ago (March 3, 2026 at 09:00 AM EST)

10 min read

Source: VentureBeat

When an OpenAI finance analyst needed to compare revenue across geographies and customer cohorts last year, it took hours of work — hunting through 70,000 datasets, writing SQL queries, verifying table schemas.
Today, the same analyst types a plain‑English question into Slack and gets a finished chart in minutes.

Built by two engineers in three months.
70 % of its code was written by AI.
Used daily by more than 4,000 of OpenAI’s ~5,000 employees, making it one of the most aggressive deployments of an AI data agent inside any company.

In an exclusive interview with VentureBeat, Emma Tang, head of data infrastructure at OpenAI, offered a rare look inside the system — how it works, how it fails, and what it signals about the future of enterprise data. The conversation, paired with the company’s blog post announcing the tool, paints a picture of a company that turned its own AI on itself and discovered something that every enterprise will soon confront:

The bottleneck to smarter organizations isn’t better models. It’s better data.
“The agent is used for any kind of analysis,” Tang said. “Almost every team in the company uses it.”

A Plain‑English Interface to 600 PB of Corporate Data

Metric	Value
Data platform size	> 600 petabytes
Number of datasets	≈ 70,000
Employees at OpenAI	≈ 5,000
Daily users of data tools	> 4,000

Tang’s Data Platform team (under infrastructure, overseeing big‑data systems, streaming, and the data‑tooling layer) serves this massive internal user base.

How the agent is accessed

Slack
Web interface
IDEs
Codex CLI
OpenAI’s internal ChatGPT app

The agent, built on GPT‑5.2, accepts plain‑English questions and returns charts, dashboards, and long‑form analytical reports. In follow‑up responses shared with VentureBeat, the team estimated it saves 2–4 hours of work per query. Tang emphasized a larger, harder‑to‑measure win:

“The agent gives people access to analysis they simply couldn’t have done before, regardless of how much time they had.”
“Engineers, growth, product, as well as non‑technical teams, who may not know all the ins and outs of the company data systems and table schemas can now pull sophisticated insights on their own.”

From Revenue Breakdowns to Latency Debugging – One Agent Does It All

Finance – Revenue comparisons across geographies and customer cohorts.
Product Management – Understanding feature adoption.
Engineering – Diagnosing performance regressions (e.g., “Is a specific ChatGPT component slower than yesterday? Which latency components explain the change?”).
Cross‑departmental analysis – Combining sales data, engineering metrics, and product analytics in a single query, something most enterprise AI agents cannot do because they are siloed.

“We launched department by department, curating specific memory and context for each group, but at some point it’s all in the same database.” – Tang

How Codex Solved the Hardest Problem in Enterprise Data

The hardest technical challenge: Finding the right table among 70,000 datasets.

Codex (OpenAI’s AI coding agent) plays three roles:

Front‑end gateway – Users access the data agent through Codex via MCP.
Code generator – Codex generated > 70 % of the agent’s code, enabling two engineers to ship the product in three months.
Daily asynchronous enrichment process – Codex examines important data tables, parses the underlying pipeline code, and determines each table’s:
- Upstream & downstream dependencies
- Ownership
- Granularity
- Join keys
- Similar tables

“We give it a prompt, have Codex look at the code and respond with what we need, and then persist that to the database.” – Tang

When a user asks about revenue, the agent searches a vector database to find which tables Codex has already mapped to that concept.

Six Context Layers the Agent Uses

Layer	Description
1️⃣ Schema metadata	Basic table/column definitions
2️⃣ Curated expert descriptions	Human‑written annotations
3️⃣ Institutional knowledge	Extracted from Slack, Google Docs, Notion
4️⃣ Learning memory	Stores corrections from prior conversations
5️⃣ Codex Enrichment	Table‑level lineage, ownership, join keys
6️⃣ Live warehouse queries	Fallback when no prior info exists

The team also tiers historical query patterns. “All query history is everybody’s ‘select star, limit 10.’ It’s not really helpful,” Tang noted.

Takeaways

Speed: Queries that once took hours now finish in minutes.
Accessibility: Non‑technical staff can perform sophisticated analyses without deep knowledge of data schemas.
Cross‑functional insight: A single agent can blend data across traditionally siloed departments.
AI‑generated infrastructure: Codex not only wrote most of the agent’s code but also continuously enriches the data catalog, turning a massive, unwieldy data lake into a searchable knowledge base.

OpenAI’s AI data agent demonstrates that the next frontier for enterprises is not bigger models, but smarter, AI‑augmented data pipelines that make the right data instantly reachable.

Canonical Dashboards and Executive Reports

“Canonical dashboards and executive reports — where analysts invested significant effort determining the correct representation — get flagged as ‘source of truth.’ Everything else gets deprioritized.”

The Prompt That Forces the AI to Slow Down and Think

Even with six context layers, Tang was remarkably candid about the agent’s biggest behavioral flaw: over‑confidence.

“It’s a really big problem, because what the model often does is feel overconfident,” Tang said. “It’ll say, ‘This is the right table,’ and just go forth and start doing analysis. That’s actually the wrong approach.”

The Fix: Prompt Engineering

The solution came through a prompt that forces the agent to linger in a discovery phase. The prompt reads almost like coaching a junior analyst:

“Before you run ahead with this, I really want you to do more validation on whether this is the right table. So please check more sources before you go and create actual data.”

Key take‑aways from Tang’s evaluation

More time in discovery → better results.
Less context can produce better results.

“It’s very easy to dump everything in and just expect it to do better,” Tang said. “From our evals, we actually found the opposite. The fewer things you give it, and the more curated and accurate the context is, the better the results.”

Building Trust

The agent streams its intermediate reasoning to users in real time.
It exposes which tables it selected and why, linking directly to underlying query results.
Users can interrupt the agent mid‑analysis to redirect it.
The system checkpoints its progress, enabling it to resume after failures.
At the end of every task, the model evaluates its own performance:

“We ask the model, ‘how did you think that went? Was that good or bad?’” Tang said. “And it’s actually fairly good at evaluating how well it’s doing.”

Guardrails That Are Deliberately Simple — and Surprisingly Effective

When it comes to safety, Tang took a pragmatic approach that may surprise enterprises expecting sophisticated AI‑alignment techniques.

“I think you just have to have even more dumb guardrails,” she said.

Core Guardrails

Strong access control – the agent always uses the user’s personal token, so it can only access what the user can.
Interface‑layer only – it inherits the same permissions that govern OpenAI’s data and never appears in public channels; it lives only in private channels or a user’s own interface.
Write restrictions – write access is limited to a temporary test schema that is wiped periodically and cannot be shared.
No random writes – the agent never writes to systems arbitrarily.

Feedback Loop

Employees flag incorrect results directly, prompting the team to investigate.
The model’s self‑evaluation provides an additional safety check.

Future Direction

Tang noted that the long‑term plan is to move toward a multi‑agent architecture where specialized agents monitor and assist each other:

“We’re moving towards that eventually, but right now, even as it is, we’ve gotten pretty far.”

Why OpenAI Won’t Sell This Tool — but Wants You to Build Your Own

Despite obvious commercial potential, OpenAI told VentureBeat that it has no plans to productize its internal data agent. Instead, the strategy is to provide building blocks so enterprises can construct their own solutions.

“We use all the same APIs that are available externally,” Tang said. “The Responses API, the Evals API. We don’t have a fine‑tuned model. We just use 5.2. So you can definitely build this.”

Supporting Ecosystem

OpenAI Frontier (launched early February) – an end‑to‑end platform for enterprises to build and manage AI agents.
Partnerships with McKinsey, BCG, Accenture, and Capgemini to sell and implement the platform.
AWS + OpenAI are jointly developing a Stateful Runtime Environment for Amazon Bedrock, mirroring some persistent‑context capabilities of OpenAI’s data agent.
Apple recently integrated Codex directly into Xcode.

Codex Adoption

Used by 95 % of engineers at OpenAI, reviewing all pull requests before they’re merged.
Global weekly active users have tripled since the start of the year, surpassing one million.
Overall usage has grown more than fivefold.

“Codex isn’t even a coding tool anymore. It’s much more than that,” Tang said. “I see non‑technical teams use it to organize thoughts, create slides, and produce daily summaries.”

One engineering manager has Codex review her notes each morning, identify the most important tasks, pull in Slack messages and DMs, and draft responses. “It’s really operating on her behalf in a lot of ways,” Tang added.

The Unsexy Prerequisite That Will Determine Who Wins the AI‑Agent Race

When asked what other enterprises should take away from OpenAI’s experience, Tang didn’t point to model capabilities or clever prompt engineering. She highlighted something far more mundane:

“This is not sexy, but data governance is really important for data agents to work well,” she said. “Your data needs to be clean enough and annotated enough, and there needs to be a source of truth somewhere for the agent to crawl.”

Key Points

The underlying infrastructure—storage, compute, orchestration, and business‑intelligence layers—has not been replaced by the agent.
Those tools are still required for the agent to do its job.
The agent serves as a fundamentally new entry point for data intelligence, offering a more autonomous and accessible interface than anything that came before.

Tang closed the interview.

“Companies that adopt this are going to see the benefits very rapidly,” she said.
“And companies that don’t are going to fall behind. It’s going to pull apart. The companies who use it are going to advance very, very quickly.”

Asked whether that acceleration worried her own colleagues — especially after a wave of recent layoffs at companies like Block — Tang paused.

“How much we’re able to do as a company has accelerated,” she said, “but it still doesn’t match our ambitions, not even one bit.”

OpenAI's AI data agent, built by two engineers, now serves 4,000 employees — and the company says anyone can replicate it

A Plain‑English Interface to 600 PB of Corporate Data

How the agent is accessed

From Revenue Breakdowns to Latency Debugging – One Agent Does It All

How Codex Solved the Hardest Problem in Enterprise Data

Six Context Layers the Agent Uses

Takeaways

Canonical Dashboards and Executive Reports

The Prompt That Forces the AI to Slow Down and Think

The Fix: Prompt Engineering

Key take‑aways from Tang’s evaluation

Building Trust

Guardrails That Are Deliberately Simple — and Surprisingly Effective

Core Guardrails

Feedback Loop

Future Direction

Why OpenAI Won’t Sell This Tool — but Wants You to Build Your Own

Supporting Ecosystem

Codex Adoption

The Unsexy Prerequisite That Will Determine Who Wins the AI‑Agent Race

Key Points

Related posts

Vibe coding with overeager AI: Lessons learned from treating Google AI Studio like a teammate

Anthropic vs. The Pentagon: what enterprises should do

X says it will suspend creators from revenue-sharing program for unlabeled AI posts of ‘armed conflict’

Hacked traffic cams and hijacked TVs: How cyber operations supported the war against Iran

A Plain‑English Interface to 600 PB of Corporate Data

How the agent is accessed

From Revenue Breakdowns to Latency Debugging – One Agent Does It All

How Codex Solved the Hardest Problem in Enterprise Data

Six Context Layers the Agent Uses

Takeaways

Canonical Dashboards and Executive Reports

The Prompt That Forces the AI to Slow Down and Think

The Fix: Prompt Engineering

Key take‑aways from Tang’s evaluation

Building Trust

Guardrails That Are Deliberately Simple — and Surprisingly Effective

Core Guardrails

Feedback Loop

Future Direction

Why OpenAI Won’t Sell This Tool — but Wants You to Build Your Own

Supporting Ecosystem

Codex Adoption

The Unsexy Prerequisite That Will Determine Who Wins the AI‑Agent Race

Key Points

Related posts

Vibe coding with overeager AI: Lessons learned from treating Google AI Studio like a teammate

Anthropic vs. The Pentagon: what enterprises should do

X says it will suspend creators from revenue-sharing program for unlabeled AI posts of ‘armed conflict’

Hacked traffic cams and hijacked TVs: How cyber operations supported the war against Iran

A Plain‑English Interface to 600 PB of Corporate Data