The Role of the Semantic Layer in Data Governance
Source: Dev.to
Most organizations have a data‑governance policy. It lives in a Confluence page. It defines who owns what data, what terms mean, and who should have access. And almost nobody follows it, because it isn’t enforced where queries actually run.
A semantic layer changes that. It moves governance from a document into the query path, where every rule is applied automatically, for every user, through every tool.
Governance on Paper vs. Governance in Practice
Data governance fails when it depends on people doing the right thing manually.
- A policy says “Revenue means completed orders minus refunds.”
- An analyst writes a slightly different formula.
- A dashboard uses the wrong table.
- An AI agent invents its own definition.
The governance policy exists, but nobody follows it, and the organization makes decisions on inconsistent data.
The root cause isn’t carelessness; it’s that governance is separated from the systems people actually use to query data. Enforcement happens in a side channel — documentation, review processes, audit logs — not in the query itself.
Centralized Definitions Eliminate Conflicting Metrics
A semantic layer solves the definition problem by turning the governance policy into code.
CREATE VIEW business.revenue AS
SELECT
OrderDate,
Region,
SUM(OrderTotal) AS Revenue
FROM silver.orders_enriched
WHERE Status = 'completed' AND Refunded = FALSE
GROUP BY OrderDate, Region;
Every dashboard, notebook, and AI agent that needs Revenue queries this view. There’s no alternative formula to use. The semantic layer is the governance for this metric.
When the definition changes (e.g., a new refund category is added), the view is updated once, and every consumer gets the new logic automatically—no rollout, no migration, no “did everyone update their dashboard?” required.
Access Policies Enforced at Query Time

The second governance gap is access control. Most organizations enforce security at the BI‑tool level (Tableau, Power BI, etc.). If someone opens a SQL client and queries the underlying table directly, those filters don’t apply.
A semantic layer enforces policies at a lower level, so they apply to every query path:
| Query Path | BI‑Level Security | Semantic‑Layer Security |
|---|---|---|
| Dashboard | Enforced | Enforced |
| SQL notebook | Not enforced | Enforced |
| AI agent | Not enforced | Enforced |
| API / programmatic access | Not enforced | Enforced |
Dremio implements this through Fine‑Grained Access Control (FGAC): policies are defined as UDFs that filter rows and mask columns based on the querying user’s role. These policies are applied at the virtual‑dataset (view) level.
Example: a regional manager queries business.revenue and sees only their region; a data engineer sees all regions. Same view, same SQL, different results based on identity.
This eliminates the “security gap” that appears when users bypass BI tools. Every route to the data flows through the semantic layer and inherits the policies.
Lineage and Accountability Through Views
The layered view architecture (Bronze → Silver → Gold) that a semantic layer uses is inherently traceable. Every Gold metric traces back to its Silver business logic, which traces back to the Bronze source mapping, which traces back to raw data.
When an auditor asks, “Where does your Revenue number come from?” you don’t hunt through dashboards and notebooks—you follow the view chain:
gold.monthly_revenue_by_region
→ references silver.orders_enriched
silver.orders_enriched
→ joins bronze.orders_raw with bronze.customers_raw
bronze.orders_raw
→ maps to production.public.orders in PostgreSQL
Every step is documented, every transformation is visible. The lineage isn’t reconstructed after the fact—it’s built into the structure.
Documentation as a Governance Tool

Governance is also about discoverability. Can someone find the right dataset without messaging five people? Can they tell whether a view is production‑ready or experimental?
Two mechanisms handle this in a semantic layer:
- Wikis – attach human‑readable (and AI‑readable) descriptions to tables, columns, and views.
- Tags / Labels – classify assets (e.g.,
PII,financial,experimental) so that users and automated tools can filter or enforce additional policies.
Together, these make the data catalog a living part of governance rather than a static document.
Explain What the Data Represents, Where It Comes From, and Any Caveats
A column named cltv gets a description:
Customer Lifetime Value, calculated as total revenue from first purchase to current date, excluding refunds.
Labels
Labels categorize data for governance workflows.
- PII – Triggers automatic column masking.
- Certified – Indicates the view has been reviewed and approved for production use.
- Deprecated – Warns consumers to migrate to the replacement.
For organizations with thousands of datasets, manual documentation is impractical. Dremio’s generative AI auto‑generates Wiki descriptions by sampling table data and suggests labels based on column content. This bootstraps documentation to ~70 % coverage automatically; the data team then fills in what the AI misses.
Certification and Change Management
Not all views are equal. A semantic layer should distinguish between views that are experimental, under review, and production‑ready.
A Practical Certification Workflow
| Stage | Description | Label |
|---|---|---|
| Draft | New view created by an analyst. Not yet reviewed. | Draft |
| Reviewed | View reviewed by the data team. Business logic validated. Documentation complete. | Reviewed |
| Certified | View approved for production use. Available in production dashboards and to AI agents. | Certified |
- Each Certified view should have a documented owner—the person accountable for its accuracy and freshness.
- When business requirements change, the owner updates the view and its documentation together.
- Changes are reviewed before the Certified label is reapplied.
This workflow doesn’t require advanced tooling. Labels, Wikis, and a team agreement on the process are sufficient. What matters is that governance is visible inside the semantic layer, not tracked in a separate system.
What to Do Next
- Audit your top 10 business metrics.
- For each metric, ask three questions:
- Is the formula defined in one place?
- Is access control enforced at the query level (not just the BI tool)?
- Can you trace the number back to its raw source in under 60 seconds?
Every “no” reveals a governance gap that a semantic layer can close.
