The Deterministic Problem with Probabilistic AI Analytics
Source: Dev.to
TL;DR
AI‑powered analytics tools rely on probabilistic systems (LLMs, semantic search, RAG) to answer business questions that require deterministic accuracy. Slight wording changes can produce different SQL queries and different answers. This isn’t a minor bug—it’s an architectural mismatch. The solution isn’t “better AI”; it’s rethinking how we match business questions to data, using exact matching for core concepts instead of fuzzy semantic search.
The Problem with Probabilistic AI Analytics
Imagine asking your CFO, “How many claims did we deny in Q3?”
They pull up a dashboard and say, “approximately 1,247, give or take.”
When an audit asks for the exact number, “approximately” isn’t acceptable—it must be 1,247 or it isn’t.
Now ask the same CFO the next day, phrased slightly differently: “What’s the count of rejected claims last quarter?”
If the answer changes to 1,189, confidence in the system erodes.
This is exactly how most AI‑powered analytics tools operate today.
Modern AI Analytics Stack
- User asks a question in natural language.
- Semantic search finds relevant tables and columns from metadata.
- RAG (retrieval‑augmented generation) pulls additional context about those data elements.
- An LLM generates a SQL query based on the retrieved information.
- The query is executed and results are returned.
Every step in this pipeline is probabilistic:
- Semantic search uses embeddings; different phrasings produce different vector representations, retrieving different metadata chunks.
- RAG ranks results by relevance scores; small query changes can reshuffle the ranking, altering the context that reaches the LLM.
- LLM generation is inherently non‑deterministic. Even with
temperature = 0, the same prompt can yield variations in SQL structure, joins, or filters.
Probabilistic pipelines work well for creative tasks—marketing copy, brainstorming, drafting emails—where multiple valid outputs are acceptable. Business analytics, however, demands precision.
Real‑World Example
Question: “What’s the average order value for premium customers last quarter?”
First Attempt
Semantic search returns:
| Item | Relevance |
|---|---|
fact_orders table | 0.89 |
customer_tier column | 0.87 |
order_total column | 0.85 |
LLM generates:
SELECT AVG(order_total)
FROM fact_orders
WHERE customer_tier = 'premium'
AND order_date >= '2024-07-01';
Result: $247.83
Second Attempt (rephrased)
User asks: “What’s the mean order amount for our top‑tier customers in Q3?”
Semantic search now returns:
| Item | Relevance |
|---|---|
fact_orders table | 0.88 |
customer_segment column | 0.86 |
order_value column | 0.84 |
LLM generates:
SELECT AVG(order_value)
FROM fact_orders
WHERE customer_segment = 'gold'
AND order_date >= '2024-07-01';
Result: $231.56
The question is unchanged, yet the system selects different columns and produces a different answer. The root cause isn’t a “wrong” AI; it’s the lack of a ground‑truth enforcement mechanism. The system guesses which semantic mapping is correct, and tiny wording shifts tip the balance.
How Human Analysts Approach the Same Problem
Given a complex request such as:
“What’s the average order value and customer satisfaction rating for premium customers who made at least 3 purchases in high‑volume stores during peak hours last quarter? Break it down by product and region.”
An analyst decomposes the request into structured components:
- Metrics: average order value, customer satisfaction rating
- Filters: premium customers, at least 3 purchases, high‑volume stores, peak hours, last quarter
- Dimensions: product, region
They then perform exact lookups:
- Search the metrics catalog for “average order value” → exact match or none.
- Check the business glossary for “premium customers” → exact definition.
- Look up “high‑volume stores” → exact criteria.
If any concept lacks a definition, the analyst pauses and asks for clarification. This deterministic workflow ensures that the same question always yields the same underlying metadata and, consequently, the same SQL query.
Proposed Deterministic Architecture
-
Extract Business Concepts (Probabilistic Step)
Use an LLM to parse the natural‑language question into a structured JSON payload:{ "metrics": ["average order value", "customer satisfaction rating"], "filters": [ "premium customers", "at least 3 purchases", "high volume stores", "peak hours", "last quarter" ], "dimensions": ["product", "region"] }This extraction can be nondeterministic; the goal is simply to identify the concepts the user intends.
-
Exact Matching Against Catalogs (Deterministic Steps)
- Metrics catalog → map “average order value” →
metrics.avg_order_value(found). - Business glossary → map each filter to a precise definition (e.g.,
customer_tier = 'premium'). - Dimension mapping → resolve “product” →
dim_product.product_name, etc.
If any concept is missing, halt and request clarification.
- Metrics catalog → map “average order value” →
-
Assemble Concrete Metadata
Retrieve full definitions for each component (tables, columns, calculation logic, required joins). -
Generate SQL from Predefined Building Blocks (Deterministic)
Use an LLM only to stitch together the known pieces into a final query:SELECT p.product_name, g.region_name, AVG(o.order_total) AS avg_order_value, AVG(o.satisfaction_score) AS avg_csat FROM fact_orders o JOIN dim_customer c ON o.customer_id = c.customer_id JOIN dim_product p ON o.product_id = p.product_id JOIN dim_geography g ON o.store_id = g.store_id WHERE c.customer_tier = 'premium' AND o.order_date >= DATE_TRUNC('quarter', CURRENT_DATE - INTERVAL '3 months') AND o.store_id IN ( SELECT store_id FROM dim_stores WHERE annual_revenue > 5000000 ) AND EXTRACT(HOUR FROM o.order_time) IN (10,11,12,13,17,18,19,20) GROUP BY p.product_name, g.region_name;Because the same metadata is retrieved each time, the generated query is identical for identical questions—deterministic.
Prerequisites for the Deterministic Approach
- Metrics Catalog – Every KPI, metric, and measure defined with exact calculation logic and data lineage.
- Business Glossary – Precise definitions for all business terms, segments, and filter conditions, including the underlying SQL snippets.
- Dimension Mapping – Clear relationships between business concepts and data model entities (tables, columns).
These assets constitute the “80 % solved” foundation (see the previous article). Once in place, the deterministic pipeline can reliably translate natural‑language questions into accurate, repeatable SQL.
Handling Synonyms and Ambiguities
Exact matching is rigid; users may employ synonyms (“VIP customers” vs. “premium customers”). Strategies to address this include:
- Synonym tables in the business glossary that map alternative terms to canonical definitions.
- User prompts that request clarification when a term lacks a direct match.
By explicitly managing synonyms, the system retains determinism while offering a user‑friendly experience.