How to Build a Semantic Layer: A Step-by-Step Guide

Published: 3 days ago (February 24, 2026 at 03:52 PM EST)

5 min read

Source: Dev.to

Most teams start building a semantic layer the wrong way: they open their BI tool, create a few calculated fields, and call it done. Six months later, three dashboards define “churn” differently, nobody trusts the numbers, and the data team is debugging metric discrepancies instead of building new features.

A well‑built semantic layer prevents all of that. Here’s how to do it right.

Define Your Core Metrics

Before writing a single line of SQL, sit down with stakeholders from Sales, Finance, Marketing, and Product. Agree on the top 5‑10 business metrics your organization uses to make decisions.

For each metric, document:

Item	Example / Question
Calculation	`Revenue = SUM(order_total) WHERE status = 'completed' AND refunded = FALSE`
Owner	Who is accountable for this definition?
Grain	Daily? Monthly? Per customer?
Refresh cadence	Real‑time? Daily batch? Weekly?

This exercise is harder than it sounds. You will discover that “Monthly Active Users” has three competing definitions—that’s the point. The semantic layer can’t resolve disagreements that haven’t been surfaced yet.

Output: a metric glossary that becomes the source document for everything you build next.

Inventory All Data Sources

Source Type	Examples	Access Pattern
Transactional databases	PostgreSQL, MySQL, SQL Server	Federated query (read‑only)
Cloud data lakes	S3 (Parquet/Iceberg), Azure Data Lake	Direct scan or catalog
SaaS platforms	Salesforce, HubSpot, Stripe	API extraction or replication
Spreadsheets	Google Sheets, Excel	One‑time import or scheduled sync

Not all sources need to be replicated into a central store. Federation lets you query data where it lives without the cost and complexity of ETL pipelines. Platforms like Dremio connect to dozens of sources and present them in a single namespace, so your semantic layer can span everything without data movement.

Medallion Architecture (Three Layers of SQL Views)

Bronze – Raw Views

Create one view per raw source table.
Apply no business logic—just make the data human‑readable:
- Rename cryptic columns (col_7 → OrderDate, cust_id → CustomerID).
- Cast types to standard formats (strings to dates, integers to decimals).
- Normalize timestamps to UTC.
- Avoid SQL reserved words as column names (use EventTimestamp, TransactionDate, UserRole instead of Timestamp, Date, Role).

Bronze views should be boring; their only job is to make raw data safe to work with.

Silver – Business Logic Views

Silver views join Bronze views, deduplicate records, filter invalid data, and apply business rules.

-- silver.orders_enriched
CREATE VIEW silver.orders_enriched AS
SELECT
    o.OrderID,
    o.OrderDate,
    o.Total AS OrderTotal,
    c.Region,
    c.Segment
FROM bronze.orders_raw o
JOIN bronze.customers_raw c
  ON o.CustomerID = c.CustomerID
WHERE o.Total > 0
  AND o.Status = 'completed';

Each Silver view encodes exactly one business concept. “Revenue” is defined in one place, and every dashboard, notebook, or AI agent that needs revenue queries this view—no exceptions.

Gold – Presentation Views

Gold views are pre‑aggregated for specific consumers:

BI dashboard → monthly_revenue_by_region
AI agent → customer_360_summary
Finance report → quarterly_financial_summary

Gold views don’t add new business logic; they simply reshape and aggregate Silver views for performance and usability.

Documentation & Governance

An undocumented semantic layer is a semantic layer nobody uses. Every table and column should have a description that explains:

What the data represents
Where it comes from
Any known limitations or caveats

Modern platforms accelerate this with AI. For example, Dremio’s generative AI can auto‑generate Wiki descriptions by sampling table data and suggest labels (e.g., “PII”, “Finance”, “Certified”) for governance and discoverability. The AI provides a ~70 % first draft; the data team fills in domain‑specific context.

Rich, accurate descriptions serve both human analysts browsing the catalog and AI agents that need context to generate correct SQL.

Security Embedded in the Layer

Enforce security once at the semantic layer rather than in each downstream tool.

Pattern	Description
Row‑Level Security	Filter rows based on user role (e.g., a regional manager sees only their region).
Column Masking	Mask sensitive columns (SSN, email, salary) for roles that don’t need them (e.g., `****@email.com`).

Every downstream query—whether from a dashboard, a Python notebook, or an AI agent—automatically inherits these rules, eliminating gaps.

Incremental Build Approach

Start small
- 3‑5 core metrics from your glossary
- The 2‑3 source systems those metrics depend on
- One Bronze → Silver → Gold pipeline per metric
Validate
- Run the same question across two different tools (e.g., a BI dashboard and a SQL notebook).
- If both return the same number, the semantic layer is working.
- If not, fix the Silver view definition before adding more.
Expand incrementally
- Add new sources, new Silver views, new Gold views.
- Each addition is low‑risk because the layered structure isolates changes.

Pick the metric your organization argues about the most. Define it explicitly in a Silver view, test it against current dashboards, and you’ll either confirm consistency or uncover the hidden discrepancy that’s eroding trust.

Try Dremio Cloud

Try Dremio Cloud free for 30 days.

How to Build a Semantic Layer: A Step-by-Step Guide

Define Your Core Metrics

Inventory All Data Sources

Medallion Architecture (Three Layers of SQL Views)

Bronze – Raw Views

Silver – Business Logic Views

Gold – Presentation Views

Documentation & Governance

Security Embedded in the Layer

Incremental Build Approach

Try Dremio Cloud

Related posts

Semantic Layer vs. Metrics Layer: What's the Difference?

Jsonb e Gin Index: Otimizando consultas no PostgreSQL

Internal Datatypes and Record Format in SQLite

How to Find and Fix Missing Indexes in PostgreSQL