8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

Published: 3 days ago (February 25, 2026 at 07:20 PM EST)

4 min read

Source: VentureBeat

Scaling Agentic AI at AT&T

When your average daily token usage is 8 billion tokens a day, you have a massive scale problem.

This was the case at AT&T, and chief data officer Andy Markus and his team recognized that it simply wasn’t feasible (or economical) to push everything through large‑reasoning models.

The solution: a multi‑agent stack

Re‑architected the orchestration layer using LangChain.
Built “super agents” (large language models) that direct smaller, purpose‑driven worker agents.

“I believe the future of agentic AI is many, many, many small language models (SLMs). We find small language models to be just about as accurate, if not as accurate, as a large language model on a given domain area.” – Andy Markus

Benefits reported

Metric	Result
Latency / speed	Dramatically improved
Cost	Up to 90 % savings
Response time	Faster, more consistent

Ask AT&T Workflows

A graphical drag‑and‑drop agent builder for employees to automate tasks.
Agents pull from a suite of proprietary AT&T tools (document processing, NL‑to‑SQL conversion, image analysis).

“As the workflow is executed, it’s AT&T’s data that’s really driving the decisions.” – Markus

Human‑in‑the‑loop: all actions are logged, data is isolated, and role‑based access is enforced.

“Things do happen autonomously, but the human on the loop still provides a check and balance of the entire process.” – Markus

Not over‑building: interchangeable & selectable models

AT&T avoids a “build everything from scratch” mindset.
Relies on interchangeable, selectable models and never rebuilds a commodity.
Rapid iteration: “things change every week, sometimes multiple times a week.”

Evaluation highlights

Ask Data with Relational Knowledge Graph – topped the Spider 2.0 text‑to‑SQL accuracy leaderboard.
Other tools scored highly on the BERT SQL benchmark.
Core framework: LangChain + fine‑tuned models (RAG, in‑house algorithms).
Partnership with Microsoft Azure for vector‑store search functionality.

Guiding principles

Accuracy – aim for the highest possible within constraints.
Cost – keep spend proportional to value.
Tool responsiveness – ensure low latency and reliability.

“Sometimes we over‑complicate things… Sometimes I’ve seen a solution over‑engineered.” – Markus

“Builders should ask whether a given tool actually needs to be agentic. What accuracy could be achieved with a simpler, single‑turn generative solution? How could we break it down into smaller pieces that can be delivered way more accurately?” – Markus

Adoption at Scale

100 000+ employees have access to Ask AT&T Workflows.
> 50 % use it daily.
Reported productivity gains up to 90 %.

Two user journeys

Journey	Description
Pro‑code	Users write Python behind the scenes to dictate agent rules.
No‑code	Drag‑and‑drop visual interface for a “pretty light user experience.”

“Even proficient users gravitated toward the low‑code option at a recent hackathon; more than half chose it despite being strong programmers.” – Markus

Real‑world examples

Network engineer workflow:
1. Agent 1 – correlates telemetry, identifies issue, pulls change logs, opens a trouble ticket.
2. Agent 2 – proposes solutions, writes patch code.
3. Agent 3 – generates a post‑mortem summary with preventative measures.

Human engineer monitors the entire chain, ensuring correct actions.

AI‑Fueled Coding: The Future

AT&T is applying the same “small, purpose‑built” philosophy to code generation, calling it AI‑fueled coding.
Mirrors Retrieval‑Augmented Generation (RAG): developers work in an IDE with function‑specific build archetypes that dictate how code should interact.
The output is structured, production‑ready code, not loose snippets.

AI‑Fueled Coding: A Game Changer

“to production grade,” and could reach that quality in one turn. “We’ve all worked with vibe coding, where we have an agentic kind of code editor,” Markus noted. But AI‑fueled coding “eliminates a lot of the back and forth iterations that you might see in vibe coding.”

He sees this coding technique as “tangibly redefining” the software development cycle, ultimately shortening development timelines and increasing output of production‑grade code. Non‑technical teams can also get in on the action, using plain‑language prompts to build software prototypes.

His team built an internal curated data product in 20 minutes; without AI, it would have taken six weeks.

“We develop software with it, modify software with it, do data science with it, do data analytics with it, do data engineering with it,” Markus said. “So it’s a game changer.”

All quotes and data are attributed to Andy Markus and AT&T as reported in VentureBeat.

8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

Scaling Agentic AI at AT&T

The solution: a multi‑agent stack

Benefits reported

Ask AT&T Workflows

Not over‑building: interchangeable & selectable models

Evaluation highlights

Guiding principles

Adoption at Scale

Two user journeys

Real‑world examples

AI‑Fueled Coding: The Future

AI‑Fueled Coding: A Game Changer

Related posts

Vibe coding with overeager AI: Lessons learned from treating Google AI Studio like a teammate

Anthropic vs. The Pentagon: what enterprises should do

OpenAI's big investment from Amazon comes with something else: new 'stateful' architecture for enterprise agents

OpenAI's big investment from AWS comes with something else: new 'stateful' architecture for enterprise agents

Scaling Agentic AI at AT&T

The solution: a multi‑agent stack

Benefits reported

Ask AT&T Workflows

Not over‑building: interchangeable & selectable models

Evaluation highlights

Guiding principles

Adoption at Scale

Two user journeys

Real‑world examples

AI‑Fueled Coding: The Future

AI‑Fueled Coding: A Game Changer

Related posts

Vibe coding with overeager AI: Lessons learned from treating Google AI Studio like a teammate

Anthropic vs. The Pentagon: what enterprises should do

OpenAI's big investment from Amazon comes with something else: new 'stateful' architecture for enterprise agents

OpenAI's big investment from AWS comes with something else: new 'stateful' architecture for enterprise agents

Ask AT&T Workflows