OpenAI's AI data agent, built by two engineers, now serves thousands of employees — and the company says anyone can replicate it

Published: 2 days ago (March 3, 2026 at 09:00 AM EST)

10 min read

Source: VentureBeat

OpenAI’s AI‑Powered Data Agent: From Hours‑Long Queries to Instant Insights

When an OpenAI finance analyst needed to compare revenue across geographies and customer cohorts last year, it took hours of work—hunting through 70,000 datasets, writing SQL queries, and verifying table schemas.
Today, the same analyst types a plain‑English question into Slack and gets a finished chart in minutes.

The tool behind that transformation was built by two engineers in three months.
70 % of its code was written by AI.
It is now used by thousands of OpenAI employees every day, making it one of the most aggressive deployments of an AI data agent inside any company.

In an exclusive interview with VentureBeat, Emma Tang, head of data infrastructure at OpenAI, gave a rare look inside the system—how it works, how it fails, and what it signals about the future of enterprise data. Paired with the company’s blog post announcing the tool, the conversation paints a picture of a company that turned its own AI on itself and discovered something every enterprise will soon confront:

The bottleneck to smarter organizations isn’t better models. It’s better data.

“The agent is used for any kind of analysis,” Tang said. “Almost every team in the company uses it.”

A Plain‑English Interface to 600 PB of Corporate Data

To understand why OpenAI built this system, consider the scale of the problem:

Metric	Detail
Data platform size	> 600 petabytes
Number of datasets	~ 70,000
Internal user base	~ 5,000 employees (4,000+ use data tools provided by Tang’s team)

The agent, built on GPT‑5.2, is accessible wherever employees already work:

Slack
Web interface
IDEs
Codex CLI
OpenAI’s internal ChatGPT app

It accepts plain‑English questions and returns charts, dashboards, and long‑form analytical reports. In follow‑up responses shared with VentureBeat, the team estimated it saves 2–4 hours of work per query. Tang emphasized that the larger win is harder to measure:

“The agent gives people access to analysis they simply couldn’t have done before, regardless of how much time they had.”

“Engineers, growth, product, as well as non‑technical teams, who may not know all the ins and outs of the company data systems and table schemas can now pull sophisticated insights on their own.”

From Revenue Breakdowns to Latency Debugging: One Agent Does It All

Concrete use cases Tang highlighted:

Finance – Revenue comparisons across geographies and customer cohorts.

“You can literally send the agent a query in plain text, and it will respond with charts and dashboards.”
Discrepancy investigation – A user spotted mismatches between two dashboards tracking Plus subscriber growth.
- The agent stack‑ranked the differences, revealing five distinct factors in minutes (a task that would take a human hours or days).
Product Management – Understanding feature adoption.
Engineering – Diagnosing performance regressions, e.g., “Is a specific ChatGPT component slower than yesterday? If so, which latency components explain the change?”
Cross‑departmental analysis – Senior leaders can combine sales data, engineering metrics, and product analytics in a single query.

“Most enterprise AI agents today are siloed within departments. OpenAI’s cuts horizontally across the company. We launched department by department, curating specific memory and context for each group, but at some point it’s all in the same database.” – Emma Tang

How Codex Solved the Hardest Problem in Enterprise Data

The toughest technical challenge: Finding the right table among 70,000 datasets.

Codex—OpenAI’s AI coding agent—plays three pivotal roles:

Role	Description
Code generation	Produced > 70 % of the data‑agent’s code, enabling two engineers to ship the product in three months.
User access layer	Users reach the data agent through Codex via MCP.
Metadata enrichment (the most fascinating)	A daily asynchronous process where Codex:
• Examines important data tables
• Analyzes underlying pipeline code
• Determines upstream/downstream dependencies, ownership, granularity, join keys, and similar tables
• Persists this information to a vector database.

When a user asks about “revenue,” the agent queries the vector store to find tables Codex has already mapped to that concept.

Six Context Layers the Agent Uses

Basic schema metadata – Table names, column types, etc.
Curated expert descriptions – Human‑written summaries.
Institutional knowledge – Extracted from Slack, Google Docs, Notion.
Codex Enrichment – Automated mapping of dependencies and semantics.
Learning memory – Stores corrections from prior conversations.
Live warehouse queries – Fallback when no prior information exists.

The team also tiers historical query patterns. Generic “SELECT * LIMIT 10” queries are deemed unhelpful, so they focus on more purposeful patterns.

Takeaways

Speed & accessibility: Plain‑English prompts turn weeks of data‑engineering effort into minutes of insight.
Horizontal reach: One unified agent serves finance, product, engineering, growth, and non‑technical teams alike.
AI‑generated infrastructure: Codex not only wrote most of the agent’s code but also continuously enriches the data catalog, solving the “find the right table” problem.
Future of enterprise data: The real competitive edge will come from AI‑augmented data discovery and orchestration, not just larger language models.

OpenAI’s internal AI data agent demonstrates that when a company turns its own AI on itself, it can unlock a new level of organizational intelligence—one that many enterprises will soon be forced to emulate.

Over‑confidence in Data Agents

Dashboards and executive reports — where analysts invest significant effort determining the correct representation — get flagged as “source of truth.” Everything else gets deprioritized.

The prompt that forces the AI to slow down and think

Even with six context layers, Tang was remarkably candid about the agent’s biggest behavioral flaw: over‑confidence. It’s a problem anyone who has worked with large language models will recognize.

“It’s a really big problem, because what the model often does is feel overconfident,” Tang said. “It’ll say, ‘This is the right table,’ and just go forth and start doing analysis. That’s actually the wrong approach.”

The fix came through prompt engineering that forces the agent to linger in a discovery phase. Tang explained:

“We found that the more time it spends gathering possible scenarios and comparing which table to use — just spending more time in the discovery phase — the better the results.”

The prompt reads almost like coaching a junior analyst:

“Before you run ahead with this, I really want you to do more validation on whether this is the right table. So please check more sources before you go and create actual data.”

The team also learned, through rigorous evaluation, that less context can produce better results.

“It’s very easy to dump everything in and just expect it to do better,” Tang said. “From our evals, we actually found the opposite. The fewer things you give it, and the more curated and accurate the context is, the better the results.”

To build trust, the agent:

Streams its intermediate reasoning to users in real time.
Exposes which tables it selected and why, linking directly to underlying query results.
Allows users to interrupt the agent mid‑analysis to redirect it.
Checkpoints its progress, enabling resume after failures.
Evaluates its own performance at the end of every task:

“We ask the model, ‘how did you think that went? Was that good or bad?’” Tang said. “And it’s actually fairly good at evaluating how well it’s doing.”

Guardrails that are deliberately simple — and surprisingly effective

When it comes to safety, Tang took a pragmatic approach that may surprise enterprises expecting sophisticated AI‑alignment techniques.

“I think you just have to have even more dumb guardrails,” she said. “We have really strong access control. It’s always using your personal token, so whatever you have access to is only what you have access to.”

Key guardrails:

The agent operates purely as an interface layer, inheriting the same permissions that govern OpenAI’s data.
It never appears in public channels — only in private channels or a user’s own interface.
Write access is restricted to a temporary test schema that gets wiped periodically and can’t be shared.
Random writes to systems are disallowed.

User feedback closes the loop: employees flag incorrect results directly, and the team investigates. The model’s self‑evaluation adds another check.

Long‑term, Tang said the plan is to move toward a multi‑agent architecture where specialized agents monitor and assist each other.

“We’re moving towards that eventually,” she said, “but right now, even as it is, we’ve gotten pretty far.”

Why OpenAI won’t sell this tool — but wants you to build your own

Despite obvious commercial potential, OpenAI told VentureBeat that it has no plans to productize its internal data agent. Instead, the strategy is to provide building blocks so enterprises can construct their own.

“We use all the same APIs that are available externally,” Tang said. “The Responses API, the Evals API. We don’t have a fine‑tuned model. We just use 5.2. So you can definitely build this.”

OpenAI Frontier – launched in early February as an end‑to‑end platform for enterprises to build and manage AI agents.
Partnerships with McKinsey, Boston Consulting Group, Accenture, and Capgemini to sell and implement the platform.
AWS + OpenAI – jointly developing a Stateful Runtime Environment for Amazon Bedrock that mirrors some persistent‑context capabilities of OpenAI’s data agent.
Apple – recently integrated Codex directly into Xcode.

According to information shared with VentureBeat:

Codex is now used by 95 % of engineers at OpenAI and reviews all pull requests before they’re merged.
Its global weekly active user base has tripled since the start of the year, surpassing one million.
Overall usage has grown more than fivefold.

Tang described a shift in how employees use Codex that transcends coding entirely:

“Codex isn’t even a coding tool anymore. It’s much more than that,” she said. “I see non‑technical teams use it to organize thoughts and create slides and to create daily summaries.”

One engineering manager has Codex review her notes each morning, identify the most important tasks, pull in Slack messages and DMs, and draft responses.

“It’s really operating on her behalf in a lot of ways,” Tang added.

The unsexy prerequisite that will determine who wins the AI‑agent race

When asked what other enterprises should take away from OpenAI’s experience, Tang didn’t point to model capabilities or clever prompt engineering. She highlighted something far more mundane:

“This is not sexy, but data governance is really important for data agents to work well,” she said. “Your data needs to be clean enough and annotated enough, and there needs to be a source of truth somewhere for the agent to crawl.”

The underlying infrastructure — storage, compute, orchestration, and business‑intelligence layers — hasn’t been replaced by the agent. Those tools are still required for the agent to do its job, but the agent serves as a fundamentally new entry point for data intelligence, one that is more autonomous and accessible than anything that came before it.

Tang closed the interview with a warning for (text truncated).

> Companies that adopt this are going to see the benefits very rapidly," she said.  
> "And companies that don't are going to fall behind. It's going to pull apart.  
> The companies who use it are going to advance very, very quickly."

Asked whether that acceleration worried her own colleagues — especially after a wave of recent layoffs at companies like Block — Tang paused.

> "How much we're able to do as a company has accelerated," she said,  
> "but it still doesn't match our ambitions, not even one bit."

OpenAI's AI data agent, built by two engineers, now serves thousands of employees — and the company says anyone can replicate it

OpenAI’s AI‑Powered Data Agent: From Hours‑Long Queries to Instant Insights

A Plain‑English Interface to 600 PB of Corporate Data

From Revenue Breakdowns to Latency Debugging: One Agent Does It All

How Codex Solved the Hardest Problem in Enterprise Data

Six Context Layers the Agent Uses

Takeaways

Over‑confidence in Data Agents

The prompt that forces the AI to slow down and think

Guardrails that are deliberately simple — and surprisingly effective

Why OpenAI won’t sell this tool — but wants you to build your own

The unsexy prerequisite that will determine who wins the AI‑agent race

Related posts

Databricks built a RAG agent it says can handle every kind of enterprise search

Pentagon vendor cutoff exposes the AI dependency map most enterprises never built

Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake of latest open source release

OpenAI's AI data agent, built by two engineers, now serves 4,000 employees — and the company says anyone can replicate it

OpenAI’s AI‑Powered Data Agent: From Hours‑Long Queries to Instant Insights

A Plain‑English Interface to 600 PB of Corporate Data

From Revenue Breakdowns to Latency Debugging: One Agent Does It All

How Codex Solved the Hardest Problem in Enterprise Data

Six Context Layers the Agent Uses

Takeaways

Over‑confidence in Data Agents

The prompt that forces the AI to slow down and think

Guardrails that are deliberately simple — and surprisingly effective

Why OpenAI won’t sell this tool — but wants you to build your own

Related initiatives

The unsexy prerequisite that will determine who wins the AI‑agent race

Related posts

Databricks built a RAG agent it says can handle every kind of enterprise search

Pentagon vendor cutoff exposes the AI dependency map most enterprises never built

Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake of latest open source release

OpenAI's AI data agent, built by two engineers, now serves 4,000 employees — and the company says anyone can replicate it

A Plain‑English Interface to 600 PB of Corporate Data