Part 3: Testing, Documentation & Deployment 🚀

Published: 2 days ago (February 16, 2026 at 05:46 PM EST)

7 min read

Source: Dev.to

Data Engineering Zoomcamp

#dbt #AnalyticsEngineering #DataModeling

Macros – Reusable SQL Functions 🔧

Without macros (repeated code)

-- ❌ Repeated everywhere
CASE 
    WHEN payment_type = 1 THEN 'Credit card'
    WHEN payment_type = 2 THEN 'Cash'
    WHEN payment_type = 3 THEN 'No charge'
    WHEN payment_type = 4 THEN 'Dispute'
    WHEN payment_type = 5 THEN 'Unknown'
    ELSE 'Unknown'
END AS payment_type_description

With macros (write once)

-- macros/get_payment_type_description.sql
{% macro get_payment_type_description(payment_type) %}
    CASE {{ payment_type }}
        WHEN 1 THEN 'Credit card'
        WHEN 2 THEN 'Cash'
        WHEN 3 THEN 'No charge'
        WHEN 4 THEN 'Dispute'
        WHEN 5 THEN 'Unknown'
        ELSE 'Unknown'
    END
{% endmacro %}

Use it in any model

-- models/staging/stg_green_tripdata.sql
SELECT
    payment_type,
    {{ get_payment_type_description('payment_type') }} AS payment_type_description
FROM {{ source('staging', 'green_tripdata') }}

Jinja Syntax in dbt

Syntax	Purpose	Example
`{{ }}`	Output expression	`{{ ref('my_model') }}`
`{% %}`	Logic / control flow	`{% if is_incremental() %}`
`{# #}`	Comment	`{# This is a comment #}`

Packages – Re‑use macros & models built by others

Package	What it Does
dbt_utils	Common SQL helpers (surrogate keys, pivot, etc.)
dbt_codegen	Auto‑generate YAML and SQL
dbt_expectations	Great Expectations‑style tests
dbt_audit_helper	Compare model outputs when refactoring

Create `packages.yml`

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1

Install packages

dbt deps

Use a macro from a package

-- Using dbt_utils to generate surrogate keys
SELECT
    {{ dbt_utils.generate_surrogate_key(['vendorid', 'pickup_datetime']) }} AS trip_id,
    *
FROM {{ source('staging', 'green_tripdata') }}

Tests – Ensure data meets expectations

1️⃣ Generic Tests (most common)

Add them in a schema YAML file:

# models/staging/schema.yml
version: 2

models:
  - name: stg_green_tripdata
    columns:
      - name: trip_id
        tests:
          - unique       # No duplicate values
          - not_null     # No null values

      - name: payment_type
        tests:
          - accepted_values:
              values: [1, 2, 3, 4, 5, 6]  # Allowed values only

      - name: pickup_location_id
        tests:
          - relationships:               # Referential integrity
              to: ref('dim_zones')
              field: location_id

Test	Description
`unique`	No duplicate values in column
`not_null`	No NULL values in column
`accepted_values`	Values must be in the specified list
`relationships`	Values must exist in another table

2️⃣ Singular (custom) Tests

Place a .sql file in the tests/ folder:

-- tests/assert_positive_fare_amount.sql
-- Test FAILS if any rows are returned
SELECT
    trip_id,
    fare_amount
FROM {{ ref('fct_trips') }}
WHERE fare_amount < 0

Fact table containing all taxi trips (yellow and green).
One row per trip with fare details and zone information.

Generate & serve docs

dbt docs generate   # Build the site
dbt docs serve      # Open in a browser

The site includes:

Model descriptions
Column definitions
Dependency graph (visual DAG)
Source information

Common dbt Commands

Command	What it Does
`dbt run`	Build all models (create views/tables)
`dbt test`	Run all tests
`dbt build`	Run models and tests together (recommended)
`dbt compile`	Generate SQL without executing
`dbt debug`	Check connection & project configuration
`dbt seed`	Load seed CSV files
`dbt deps`	Install packages
`dbt docs generate`	Build documentation
`dbt docs serve`	Serve docs locally
`dbt retry`	Retry failed models

Selecting specific models

# Single model
dbt run --select stg_green_tripdata

# Model + all upstream dependencies
dbt run --select +fct_trips

# Model + all downstream models
dbt run --select stg_green_tripdata+

# Both directions
dbt run --select +fct_trips+

# All models in a folder
dbt run --select staging.*

# Multiple models
dbt run --select stg_green_tripdata stg_yellow_tripdata

# Development (default target)
dbt run

# Production target
dbt run --target prod

Materializations – How dbt persists models

Type	What it Creates	Typical Use Case
`view`	SQL view (query stored, runs on access)	Staging models, frequently changing logic
`table`	Physical table (data stored)	Final marts, large datasets, performance‑critical queries
`incremental`	Appends new data only	Very large tables, event‑style data
`ephemeral`	Not created (CTE in downstream)	Helper models, intermediate steps

Set materialization in a model file

{{ config(materialized='table') }}

SELECT *
FROM {{ ref('stg_trips') }}

Or project‑wide in `dbt_project.yml`

models:
  my_project:
    staging:
      materialized: view
    marts:
      materialized: table

Quick Decision Helper

┌─────────────────────────────────────────────────────────────┐
│               Should I use a view or a table?             │
└─────────────────────────────────────────────────────────────┘

Use a view when the underlying logic changes often or the dataset is small enough that recomputing on each query is cheap.
Use a table when you need persisted data for performance, downstream consumption, or when the dataset is large and expensive to recompute.

Decision Flow

      ▼
┌──────────────────────────┐
│ Is the query expensive?  │
└──────────────────────────┘
       │            │
      Yes          No
       │            │
       ▼            ▼
 ┌─────────┐   ┌─────────┐
 │  TABLE  │   │  VIEW   │
 └─────────┘   └─────────┘

Use VIEW when

Staging models (simple transformations)
Logic changes frequently
Storage cost is a concern

Use TABLE when

Final marts are queried often
Complex joins / aggregations
Query performance matters

Project Overview – NYC Taxi Data

┌──────────────────────────────────────────────────────────────┐
│                      RAW DATA                                 │
│  green_tripdata (GCS/BigQuery) │ yellow_tripdata (GCS/BigQuery)│
└───────────────────┬─────────────────────┬────────────────────┘
                    │                     │
                    ▼                     ▼
┌──────────────────────────────────────────────────────────────┐
│                    STAGING LAYER                              │
│  stg_green_tripdata    │    stg_yellow_tripdata               │
│  (cleaned, renamed)    │    (cleaned, renamed)                │
└───────────────────┬─────────────────────┬────────────────────┘
                    │                     │
                    └──────────┬──────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────┐
│                  INTERMEDIATE LAYER                           │
│                   int_trips_unioned                           │
│            (green + yellow combined)                          │
└───────────────────────────────┬──────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────┐
│                      MARTS LAYER                              │
│  ┌─────────────┐  ┌───────────────┐  ┌─────────────────────┐ │
│  │ dim_zones   │  │   fct_trips   │  │ fct_monthly_zone_rev │ │
│  │ (dimension)│  │    (fact)     │  │      (report)        │ │
│  └─────────────┘  └───────────────┘  └─────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Model Catalog

Model	Type	Description
`stg_green_tripdata`	Staging	Cleaned green taxi data
`stg_yellow_tripdata`	Staging	Cleaned yellow taxi data
`int_trips_unioned`	Intermediate	Combined yellow + green trips
`dim_zones`	Dimension	Zone lookup table
`fct_trips`	Fact	One row per trip
`fct_monthly_zone_revenue`	Report	Monthly revenue by zone

Local Development (DuckDB)

Pros: Free, no cloud account needed
Cons: Limited to your machine’s resources

# 1. Install dbt with the DuckDB adapter
pip install dbt-duckdb

# 2. Clone the project
git clone https://github.com/DataTalksClub/data-engineering-zoomcamp
cd data-engineering-zoomcamp/04-analytics-engineering/taxi_rides_ny

# 3. Create `profiles.yml` in `~/.dbt/`
# 4. Test the connection
dbt debug

# 5. Build the project
dbt build --target prod

Cloud Development (dbt Cloud + BigQuery)

Pros: Powerful, team collaboration, scheduler
Cons: Requires a GCP account (free tier available)

Create a dbt Cloud account (free).
Connect it to your BigQuery project.
Clone the repo in the dbt Cloud IDE.
Run:

dbt build --target prod

Troubleshooting Common Issues 🔍

“Profile not found”

Verify that the profile name in dbt_project.yml matches the one in profiles.yml.
Ensure profiles.yml resides in ~/.dbt/.

“Source not found”

Check database/schema names in sources.yml.
Confirm the data is loaded in the warehouse.
Look for typos in ref() calls and make sure the referenced model exists.

Optional memory tweak (profiles.yml):

settings:
  memory_limit: '2GB'

Key Concepts

Analytics Engineering bridges data engineering and data analysis.
dbt brings software‑engineering best practices to SQL transformations.
Dimensional modeling organizes data into facts (events) and dimensions (attributes).
Three layers: staging (raw copy), intermediate (transformations), marts (final consumable tables).
ref() and source() are the primary functions for building model dependencies.
Testing ensures data quality – use unique, not_null, accepted_values, relationships.
Documentation is auto‑generated from YAML descriptions.
dbt build runs and tests everything in dependency order.

Additional Resources 📚

dbt Documentation
dbt Fundamentals Course (free)
SQL Refresher for Window Functions
dbt Community Slack

Happy modeling! 🚀

Part 3: Testing, Documentation & Deployment 🚀

Data Engineering Zoomcamp

Macros – Reusable SQL Functions 🔧

Without macros (repeated code)

With macros (write once)

Use it in any model

Jinja Syntax in dbt

Packages – Re‑use macros & models built by others

Create `packages.yml`

Install packages

Use a macro from a package

Tests – Ensure data meets expectations

1️⃣ Generic Tests (most common)

2️⃣ Singular (custom) Tests

Generate & serve docs

Common dbt Commands

Selecting specific models

Materializations – How dbt persists models

Set materialization in a model file

Or project‑wide in `dbt_project.yml`

Quick Decision Helper

Decision Flow

Use VIEW when

Use TABLE when

Project Overview – NYC Taxi Data

Model Catalog

Local Development (DuckDB)

Cloud Development (dbt Cloud + BigQuery)

Troubleshooting Common Issues 🔍

“Profile not found”

“Source not found”

Key Concepts

Additional Resources 📚

Related posts

Part 2: dbt Project Structure & Building Models 📁

Two ClickHouse Internals That Change How You Write Queries

No Framework, No Pain: Writing Aether Slices

The System Design Interview Is Broken. Here's How to Pass It Anyway.

Data Engineering Zoomcamp

Macros – Reusable SQL Functions 🔧

Without macros (repeated code)

With macros (write once)

Use it in any model

Jinja Syntax in dbt

Packages – Re‑use macros & models built by others

Create packages.yml

Install packages

Use a macro from a package

Tests – Ensure data meets expectations

1️⃣ Generic Tests (most common)

2️⃣ Singular (custom) Tests

Generate & serve docs

Common dbt Commands

Selecting specific models

Materializations – How dbt persists models

Set materialization in a model file

Or project‑wide in dbt_project.yml

Quick Decision Helper

Decision Flow

Use VIEW when

Use TABLE when

Project Overview – NYC Taxi Data

Model Catalog

Local Development (DuckDB)

Cloud Development (dbt Cloud + BigQuery)

Troubleshooting Common Issues 🔍

“Profile not found”

“Source not found”

Key Concepts

Additional Resources 📚

Related posts

Part 2: dbt Project Structure & Building Models 📁

Two ClickHouse Internals That Change How You Write Queries

No Framework, No Pain: Writing Aether Slices

The System Design Interview Is Broken. Here's How to Pass It Anyway.

Create `packages.yml`

Or project‑wide in `dbt_project.yml`

Cloud Development (dbt Cloud + BigQuery)