Part 3: Testing, Documentation & Deployment πŸš€

Published: (February 16, 2026 at 05:46 PM EST)
7 min read
Source: Dev.to

Source: Dev.to

Data Engineering Zoomcamp

#dbt #AnalyticsEngineering #DataModeling

Macros – Reusable SQL Functions πŸ”§

Without macros (repeated code)

-- ❌ Repeated everywhere
CASE 
    WHEN payment_type = 1 THEN 'Credit card'
    WHEN payment_type = 2 THEN 'Cash'
    WHEN payment_type = 3 THEN 'No charge'
    WHEN payment_type = 4 THEN 'Dispute'
    WHEN payment_type = 5 THEN 'Unknown'
    ELSE 'Unknown'
END AS payment_type_description

With macros (write once)

-- macros/get_payment_type_description.sql
{% macro get_payment_type_description(payment_type) %}
    CASE {{ payment_type }}
        WHEN 1 THEN 'Credit card'
        WHEN 2 THEN 'Cash'
        WHEN 3 THEN 'No charge'
        WHEN 4 THEN 'Dispute'
        WHEN 5 THEN 'Unknown'
        ELSE 'Unknown'
    END
{% endmacro %}

Use it in any model

-- models/staging/stg_green_tripdata.sql
SELECT
    payment_type,
    {{ get_payment_type_description('payment_type') }} AS payment_type_description
FROM {{ source('staging', 'green_tripdata') }}

Jinja Syntax in dbt

SyntaxPurposeExample
{{ }}Output expression{{ ref('my_model') }}
{% %}Logic / control flow{% if is_incremental() %}
{# #}Comment{# This is a comment #}

Packages – Re‑use macros & models built by others

PackageWhat it Does
dbt_utilsCommon SQL helpers (surrogate keys, pivot, etc.)
dbt_codegenAuto‑generate YAML and SQL
dbt_expectationsGreat Expectations‑style tests
dbt_audit_helperCompare model outputs when refactoring

Create packages.yml

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1

Install packages

dbt deps

Use a macro from a package

-- Using dbt_utils to generate surrogate keys
SELECT
    {{ dbt_utils.generate_surrogate_key(['vendorid', 'pickup_datetime']) }} AS trip_id,
    *
FROM {{ source('staging', 'green_tripdata') }}

Tests – Ensure data meets expectations

1️⃣ Generic Tests (most common)

Add them in a schema YAML file:

# models/staging/schema.yml
version: 2

models:
  - name: stg_green_tripdata
    columns:
      - name: trip_id
        tests:
          - unique       # No duplicate values
          - not_null     # No null values

      - name: payment_type
        tests:
          - accepted_values:
              values: [1, 2, 3, 4, 5, 6]  # Allowed values only

      - name: pickup_location_id
        tests:
          - relationships:               # Referential integrity
              to: ref('dim_zones')
              field: location_id
TestDescription
uniqueNo duplicate values in column
not_nullNo NULL values in column
accepted_valuesValues must be in the specified list
relationshipsValues must exist in another table

2️⃣ Singular (custom) Tests

Place a .sql file in the tests/ folder:

-- tests/assert_positive_fare_amount.sql
-- Test FAILS if any rows are returned
SELECT
    trip_id,
    fare_amount
FROM {{ ref('fct_trips') }}
WHERE fare_amount < 0

Fact table containing all taxi trips (yellow and green).
One row per trip with fare details and zone information.

Generate & serve docs

dbt docs generate   # Build the site
dbt docs serve      # Open in a browser

The site includes:

  • Model descriptions
  • Column definitions
  • Dependency graph (visual DAG)
  • Source information

Common dbt Commands

CommandWhat it Does
dbt runBuild all models (create views/tables)
dbt testRun all tests
dbt buildRun models and tests together (recommended)
dbt compileGenerate SQL without executing
dbt debugCheck connection & project configuration
dbt seedLoad seed CSV files
dbt depsInstall packages
dbt docs generateBuild documentation
dbt docs serveServe docs locally
dbt retryRetry failed models

Selecting specific models

# Single model
dbt run --select stg_green_tripdata

# Model + all upstream dependencies
dbt run --select +fct_trips

# Model + all downstream models
dbt run --select stg_green_tripdata+

# Both directions
dbt run --select +fct_trips+

# All models in a folder
dbt run --select staging.*

# Multiple models
dbt run --select stg_green_tripdata stg_yellow_tripdata

# Development (default target)
dbt run

# Production target
dbt run --target prod

Materializations – How dbt persists models

TypeWhat it CreatesTypical Use Case
viewSQL view (query stored, runs on access)Staging models, frequently changing logic
tablePhysical table (data stored)Final marts, large datasets, performance‑critical queries
incrementalAppends new data onlyVery large tables, event‑style data
ephemeralNot created (CTE in downstream)Helper models, intermediate steps

Set materialization in a model file

{{ config(materialized='table') }}

SELECT *
FROM {{ ref('stg_trips') }}

Or project‑wide in dbt_project.yml

models:
  my_project:
    staging:
      materialized: view
    marts:
      materialized: table

Quick Decision Helper

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               Should I use a view or a table?             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Use a view when the underlying logic changes often or the dataset is small enough that recomputing on each query is cheap.
Use a table when you need persisted data for performance, downstream consumption, or when the dataset is large and expensive to recompute.

Decision Flow

      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Is the query expensive?  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚            β”‚
      Yes          No
       β”‚            β”‚
       β–Ό            β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚  TABLE  β”‚   β”‚  VIEW   β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Use VIEW when

  • Staging models (simple transformations)
  • Logic changes frequently
  • Storage cost is a concern

Use TABLE when

  • Final marts are queried often
  • Complex joins / aggregations
  • Query performance matters

Project Overview – NYC Taxi Data

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      RAW DATA                                 β”‚
β”‚  green_tripdata (GCS/BigQuery) β”‚ yellow_tripdata (GCS/BigQuery)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚                     β”‚
                    β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    STAGING LAYER                              β”‚
β”‚  stg_green_tripdata    β”‚    stg_yellow_tripdata               β”‚
β”‚  (cleaned, renamed)    β”‚    (cleaned, renamed)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚                     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  INTERMEDIATE LAYER                           β”‚
β”‚                   int_trips_unioned                           β”‚
β”‚            (green + yellow combined)                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      MARTS LAYER                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ dim_zones   β”‚  β”‚   fct_trips   β”‚  β”‚ fct_monthly_zone_rev β”‚ β”‚
β”‚  β”‚ (dimension)β”‚  β”‚    (fact)     β”‚  β”‚      (report)        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Model Catalog

ModelTypeDescription
stg_green_tripdataStagingCleaned green taxi data
stg_yellow_tripdataStagingCleaned yellow taxi data
int_trips_unionedIntermediateCombined yellowβ€―+β€―green trips
dim_zonesDimensionZone lookup table
fct_tripsFactOne row per trip
fct_monthly_zone_revenueReportMonthly revenue by zone

Local Development (DuckDB)

Pros: Free, no cloud account needed
Cons: Limited to your machine’s resources

# 1. Install dbt with the DuckDB adapter
pip install dbt-duckdb

# 2. Clone the project
git clone https://github.com/DataTalksClub/data-engineering-zoomcamp
cd data-engineering-zoomcamp/04-analytics-engineering/taxi_rides_ny

# 3. Create `profiles.yml` in `~/.dbt/`
# 4. Test the connection
dbt debug

# 5. Build the project
dbt build --target prod

Cloud Development (dbt Cloudβ€―+β€―BigQuery)

Pros: Powerful, team collaboration, scheduler
Cons: Requires a GCP account (free tier available)

  1. Create a dbt Cloud account (free).
  2. Connect it to your BigQuery project.
  3. Clone the repo in the dbt Cloud IDE.
  4. Run:
dbt build --target prod

Troubleshooting Common Issues πŸ”

β€œProfile not found”

  • Verify that the profile name in dbt_project.yml matches the one in profiles.yml.
  • Ensure profiles.yml resides in ~/.dbt/.

β€œSource not found”

  • Check database/schema names in sources.yml.
  • Confirm the data is loaded in the warehouse.
  • Look for typos in ref() calls and make sure the referenced model exists.

Optional memory tweak (profiles.yml):

settings:
  memory_limit: '2GB'

Key Concepts

  • Analytics Engineering bridges data engineering and data analysis.
  • dbt brings software‑engineering best practices to SQL transformations.
  • Dimensional modeling organizes data into facts (events) and dimensions (attributes).
  • Three layers: staging (raw copy), intermediate (transformations), marts (final consumable tables).
  • ref() and source() are the primary functions for building model dependencies.
  • Testing ensures data quality – use unique, not_null, accepted_values, relationships.
  • Documentation is auto‑generated from YAML descriptions.
  • dbt build runs and tests everything in dependency order.

Additional Resources πŸ“š

Happy modeling! πŸš€

0 views
Back to Blog

Related posts

Read more Β»