Why AI Analytics Tools Are Solving the Wrong Problem
Source: Dev.to
TLDR: The AI analytics industry is obsessed with building better query engines—using LLMs to turn natural language into SQL. But that’s only 20 % of the real challenge. The other 80 %? Capturing and maintaining the massive amount of business context that exists in people’s heads, undocumented meetings, and scattered wikis across five layers of your organization. Until we solve this unglamorous documentation problem, AI‑powered analytics will remain impressive demos that struggle in production.
Every “chat with your data” demo looks the same. Someone types “show me sales by region last month” into a sleek interface. An LLM generates a SQL query. Results appear. Everyone nods approvingly.
Then you try to deploy it at your company.
Suddenly, questions that should be simple become impossible. “What’s our revenue from premium customers?” sounds straightforward until you realize three different teams define “premium” differently, and “revenue” means something else to finance than it does to operations.
The demo worked because the demo had clean, simple data. Your reality is messier.
What Everyone Is Building
Open any AI analytics product and you’ll find roughly the same architecture under the hood.
- They connect to your database and pull the schema—tables, columns, data types, foreign keys.
- They use retrieval‑augmented generation (RAG) to find relevant metadata when you ask a question.
- An LLM takes that context and generates SQL.
- Execute the query, format the results, maybe generate some insights.
For simple questions against well‑designed databases, this works. “Total orders last week” or “top 10 customers by spend”—no problem.
This is the 20 % of analytics that everyone’s racing to perfect.
The 80 % That Nobody Talks About
The real work begins when you step outside the database schema and into the messy world of business meaning.

Layer 1: Business Definitions That Live Nowhere
Your database has a customers table with 2 million rows. Great. But which ones are “premium customers”?
- Is it customers who spend over $10 K annually?
- The VIP tier from your loyalty program?
- Anyone with a dedicated account manager?
Different teams give different answers.
What about “high‑volume stores”? Is that top 10 % by revenue? By transaction count? By square footage? The answer exists somewhere—maybe in a strategy deck from 18 months ago, maybe in someone’s head who’s been here for five years.
“Peak hours” sounds objective until you learn that retail defines it as 10 am–2 pm and 5 pm–8 pm, while the warehouse team uses 7 am–11 am and 3 pm–7 pm.
None of this lives in your database. It’s business knowledge that needs to be documented in a structured way before any LLM can use it.
Layer 2: Metrics and Their Hidden Complexity
Ask five people what “revenue” means and you might get five different answers.
- Does revenue include pending orders?
- What about returns?
- Is it before or after discounts?
- Do you count the shipping fee?
- What about tax?
- Is it when the order is placed, when it ships, or when payment clears?
Each question has an answer somewhere in your organization—often in multiple places with multiple versions, some contradictory.
Your analytics team might calculate “Monthly Recurring Revenue” one way. Finance calculates it differently for the board. The sales dashboard shows a third number because it excludes trials.
Each metric needs a single source of truth—not just a plain‑English definition, but the actual business logic: conditions, exclusions, edge cases, all documented and maintained.
Layer 3: Domain‑Specific Business Rules
Now add the decisions each business unit makes to solve its specific problems.
- Marketing runs a campaign and excludes customers who purchased in the last 30 days.
- Operations has special handling for orders over $5 000.
- Customer service treats warranty claims differently than regular support tickets.
- Finance has revenue‑recognition rules that vary by product type.
These rules are implemented to solve today’s problems. The people making these decisions aren’t thinking about downstream analytics, nor are they documenting for future AI systems. Yet every decision affects what the data means and how it should be interpreted.
Layer 4: Technical Implementation Decisions
The business requirements land with engineering, and developers make their own choices.
- They build microservices, each owning its own data.
- They choose data structures that make sense for their use case.
- They optimize for performance, API contracts, deployment constraints.
Questions that arise:
- Is a customer ID a string or an integer?
- Are addresses stored as structured fields or free‑form text?
- Is the timestamp in UTC or local time?
Different services make different choices. These are pragmatic engineering decisions, not “wrong” ones. But data becomes a byproduct of operations, not a first‑class concern, and most of these decisions aren’t documented anywhere a data system can access.
Layer 5: The Data Platform Transformation Layer
Finally, the data team pulls everything together. They extract data from dozens of sources, cleanse it, standardize it, transform it.
- They create
dim_customerby joining six different customer tables. - They build
fact_ordersby combining order data with returns, refunds, and adjustments. - They calculate derived metrics like
customer_lifetime_valueusing complex logic.
Every table, every transformation, every derived field represents decisions. What business logic is embedded in this ETL job? Why was this data transformed this way? What assumptions were made? What edge cases are handled?
Without documentation, this knowledge lives in the data engineer’s head or buried in hundreds of lines of SQL code.

The Documentation Debt Crisis
Add it up:
- Business definitions for every domain term
- Precise logic for every metric and KPI
- Rules and exclusions from every business unit
- Technical decisions from every engineering team
- Transformation logic from every data pipeline
This is the context an LLM needs to generate correct queries for real business questions. Almost none of it is documented in a way machines can understand. This documentation problem is the 80 %.
Why This Is So Hard
Documentation is manual work—unglamorous, time‑consuming, never‑ending. Business definitions change; a “premium customer” today might be redefined next quarter. Metrics evolve as the business grows. Rules get updated when regulations change. The data platform refactors tables and schemas.
Static documentation becomes stale the moment it’s written. You need a living system that evolves with the business.
But who owns this? The business teams are focused on business problems. Engineering is shipping features. The data team is drowning in pipeline maintenance. Nobody has…