8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

Published: (February 25, 2026 at 07:20 PM EST)
4 min read

Source: VentureBeat

Scaling Agentic AI at AT&T

When your average daily token usage is 8 billion tokens a day, you have a massive scale problem.

This was the case at AT&T, and chief data officer Andy Markus and his team recognized that it simply wasn’t feasible (or economical) to push everything through large‑reasoning models.

The solution: a multi‑agent stack

  • Re‑architected the orchestration layer using LangChain.
  • Built “super agents” (large language models) that direct smaller, purpose‑driven worker agents.

“I believe the future of agentic AI is many, many, many small language models (SLMs). We find small language models to be just about as accurate, if not as accurate, as a large language model on a given domain area.” – Andy Markus

Benefits reported

MetricResult
Latency / speedDramatically improved
CostUp to 90 % savings
Response timeFaster, more consistent

Ask AT&T Workflows

  • A graphical drag‑and‑drop agent builder for employees to automate tasks.
  • Agents pull from a suite of proprietary AT&T tools (document processing, NL‑to‑SQL conversion, image analysis).

“As the workflow is executed, it’s AT&T’s data that’s really driving the decisions.” – Markus

  • Human‑in‑the‑loop: all actions are logged, data is isolated, and role‑based access is enforced.

“Things do happen autonomously, but the human on the loop still provides a check and balance of the entire process.” – Markus

Not over‑building: interchangeable & selectable models

  • AT&T avoids a “build everything from scratch” mindset.
  • Relies on interchangeable, selectable models and never rebuilds a commodity.
  • Rapid iteration: “things change every week, sometimes multiple times a week.”
Evaluation highlights
  • Ask Data with Relational Knowledge Graph – topped the Spider 2.0 text‑to‑SQL accuracy leaderboard.
  • Other tools scored highly on the BERT SQL benchmark.
  • Core framework: LangChain + fine‑tuned models (RAG, in‑house algorithms).
  • Partnership with Microsoft Azure for vector‑store search functionality.

Guiding principles

  1. Accuracy – aim for the highest possible within constraints.
  2. Cost – keep spend proportional to value.
  3. Tool responsiveness – ensure low latency and reliability.

“Sometimes we over‑complicate things… Sometimes I’ve seen a solution over‑engineered.” – Markus

“Builders should ask whether a given tool actually needs to be agentic. What accuracy could be achieved with a simpler, single‑turn generative solution? How could we break it down into smaller pieces that can be delivered way more accurately?” – Markus

Adoption at Scale

  • 100 000+ employees have access to Ask AT&T Workflows.
  • > 50 % use it daily.
  • Reported productivity gains up to 90 %.

Two user journeys

JourneyDescription
Pro‑codeUsers write Python behind the scenes to dictate agent rules.
No‑codeDrag‑and‑drop visual interface for a “pretty light user experience.”

“Even proficient users gravitated toward the low‑code option at a recent hackathon; more than half chose it despite being strong programmers.” – Markus

Real‑world examples

  • Network engineer workflow:
    1. Agent 1 – correlates telemetry, identifies issue, pulls change logs, opens a trouble ticket.
    2. Agent 2 – proposes solutions, writes patch code.
    3. Agent 3 – generates a post‑mortem summary with preventative measures.

Human engineer monitors the entire chain, ensuring correct actions.

AI‑Fueled Coding: The Future

  • AT&T is applying the same “small, purpose‑built” philosophy to code generation, calling it AI‑fueled coding.
  • Mirrors Retrieval‑Augmented Generation (RAG): developers work in an IDE with function‑specific build archetypes that dictate how code should interact.
  • The output is structured, production‑ready code, not loose snippets.

AI‑Fueled Coding: A Game Changer

“to production grade,” and could reach that quality in one turn. “We’ve all worked with vibe coding, where we have an agentic kind of code editor,” Markus noted. But AI‑fueled coding “eliminates a lot of the back and forth iterations that you might see in vibe coding.”

He sees this coding technique as “tangibly redefining” the software development cycle, ultimately shortening development timelines and increasing output of production‑grade code. Non‑technical teams can also get in on the action, using plain‑language prompts to build software prototypes.

  • His team built an internal curated data product in 20 minutes; without AI, it would have taken six weeks.

“We develop software with it, modify software with it, do data science with it, do data analytics with it, do data engineering with it,” Markus said. “So it’s a game changer.”

All quotes and data are attributed to Andy Markus and AT&T as reported in VentureBeat.

0 views
Back to Blog

Related posts

Read more »