Why Your AI Initiatives Fail Without a Semantic Layer

Published: (February 24, 2026 at 04:55 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Natural‑language analytics

Business users want to ask questions in plain English and get accurate answers—no SQL, no tickets, no waiting. Large language models can generate SQL from natural language with impressive syntactic accuracy, but syntax ≠ semantics. An LLM can write grammatically correct SQL that returns the wrong answer because it doesn’t understand your business definitions.

A semantic layer provides those definitions. Without one, AI analytics is a demo that works in a meeting but fails in production.

Common failure modes and semantic‑layer fixes

Failure modeSemantic layer fix
Metric hallucinationVirtual datasets with canonical formulas
Join confusionPre‑defined join relationships
Column misinterpretationWiki descriptions on every field
Security bypassAccess policies enforced at the view level
Inconsistent resultsDeterministic definitions (same question → same SQL)

Examples

  • Metric hallucination – The LLM decides that Revenue = SUM(amount) from the transactions table, but the true definition is SUM(order_total) WHERE status = 'completed' AND refunded = FALSE from the orders table. The AI’s number looks plausible yet is off by 15 %.
    Fix: Store the canonical metric definition in a virtual dataset; the AI references the view instead of inventing its own formula.

  • Join confusion – There are three paths from orders to customers: via customer_id, billing_address_id, and shipping_address_id. For revenue analysis you need the customer_id path, but the LLM picks billing_address_id. The resulting numbers are close enough to slip through review.
    Fix: Define approved join relationships in the semantic model; the AI follows them.

  • Column misinterpretation – A column named date exists in orders. Is it the order date, ship date, or invoice date? The LLM assumes order date, but it’s actually ship date, shifting every time‑based query by 2–5 days.
    Fix: Add wiki‑style descriptions to every column; the semantic layer tells the AI that date is ShipDate and that OrderDate should be used for revenue analysis.

  • Security bypass – Your BI dashboard applies row‑level security so regional managers only see their region’s data. The AI agent queries the raw table directly, bypassing the BI layer, and a manager sees the entire company’s numbers.
    Fix: Enforce fine‑grained access control at the semantic layer; the AI queries views, not raw tables, and security policies travel with the data.

  • Inconsistent results – The same question asked twice generates different SQL because the LLM’s output is probabilistic (e.g., Monday’s answer: $4.2 M; Wednesday’s answer: $4.5 M). Neither matches Finance’s number.
    Fix: Use deterministic definitions in the semantic layer so the same question always resolves to the same view, formula, and result.

Why platforms that take AI analytics seriously embed the semantic layer

Dremio’s approach combines virtual datasets, wikis, labels, and fine‑grained access control into a single layer that both humans and AI agents consume. The AI doesn’t just generate SQL; it consults the semantic layer to understand:

  • What the data means
  • Which formulas to apply
  • What the querying user is allowed to see

Building an AI‑ready data platform

  1. Semantic layer – Defines metrics, documents columns, and enforces security.
  2. AI agent – Reads the semantic layer to grasp business context.
  3. Query engine – Executes the AI‑generated SQL with full optimization (caching, reflections, push‑downs).
  4. Result delivery – Returns answers in business terms through the same interface humans use.

Without step 1, the AI is merely a SQL autocomplete tool with no business understanding. It writes syntactically valid queries that produce semantically wrong answers. The semantic layer is the difference between a toy demo and a production‑grade AI analytics system.

Takeaway

If your AI analytics initiative is producing unreliable results, don’t upgrade the model. Audit the context the model has access to:

  • Can it read your metric definitions?
  • Does it know column descriptions?
  • Are security policies enforced?

If the answer is no, the fix isn’t a better LLM—it’s a semantic layer.

Try Dremio Cloud free for 30 days.

0 views
Back to Blog

Related posts

Read more »

DevOps and Vibe Coding: A Journey

Things to Do Map Your Application - Map your application on paper, in a spreadsheet, or using graphics/flowcharts. This is the first step. - Understanding the...

OpenAI just raised $110 billion. Wow

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as we...