ํŒŒํŠธ 3: Testing, Documentation & Deployment ๐Ÿš€

๋ฐœํ–‰: (2026๋…„ 2์›” 17์ผ ์˜ค์ „ 07:46 GMT+9)
11 ๋ถ„ ์†Œ์š”
์›๋ฌธ: Dev.to

Source: Dev.to

๋งคํฌ๋กœ โ€“ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ SQL ํ•จ์ˆ˜ ๐Ÿ”ง
๋งคํฌ๋กœ๋Š” ํŒŒ์ด์ฌ์˜ ํ•จ์ˆ˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค โ€“ ํ•œ ๋ฒˆ ์ž‘์„ฑํ•˜๋ฉด ์–ด๋””์„œ๋“  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์คŒ์บ ํ”„

#dbt #AnalyticsEngineering #DataModeling

Source: โ€ฆ

๋งคํฌ๋กœ โ€“ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ SQL ํ•จ์ˆ˜ ๐Ÿ”ง

๋งคํฌ๋กœ ์—†์ด (์ค‘๋ณต ์ฝ”๋“œ)

-- โŒ Repeated everywhere
CASE 
    WHEN payment_type = 1 THEN 'Credit card'
    WHEN payment_type = 2 THEN 'Cash'
    WHEN payment_type = 3 THEN 'No charge'
    WHEN payment_type = 4 THEN 'Dispute'
    WHEN payment_type = 5 THEN 'Unknown'
    ELSE 'Unknown'
END AS payment_type_description

๋งคํฌ๋กœ์™€ ํ•จ๊ป˜ (ํ•œ ๋ฒˆ๋งŒ ์ž‘์„ฑ)

-- macros/get_payment_type_description.sql
{% macro get_payment_type_description(payment_type) %}
    CASE {{ payment_type }}
        WHEN 1 THEN 'Credit card'
        WHEN 2 THEN 'Cash'
        WHEN 3 THEN 'No charge'
        WHEN 4 THEN 'Dispute'
        WHEN 5 THEN 'Unknown'
        ELSE 'Unknown'
    END
{% endmacro %}

๋ชจ๋“  ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ

-- models/staging/stg_green_tripdata.sql
SELECT
    payment_type,
    {{ get_payment_type_description('payment_type') }} AS payment_type_description
FROM {{ source('staging', 'green_tripdata') }}

dbt์˜ Jinja ๊ตฌ๋ฌธ

๊ตฌ๋ฌธ๋ชฉ์ ์˜ˆ์‹œ
{{ }}ํ‘œํ˜„์‹ ์ถœ๋ ฅ{{ ref('my_model') }}
{% %}๋…ผ๋ฆฌ / ์ œ์–ด ํ๋ฆ„{% if is_incremental() %}
{# #}์ฃผ์„{# This is a comment #}

Packages โ€“ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ๋งŒ๋“  ๋งคํฌ๋กœ์™€ ๋ชจ๋ธ ์žฌ์‚ฌ์šฉ

PackageWhat it Does
dbt_utils์ผ๋ฐ˜์ ์ธ SQL ํ—ฌํผ(๋Œ€๋ฆฌํ‚ค, ํ”ผ๋ฒ— ๋“ฑ)
dbt_codegenYAML ๋ฐ SQL ์ž๋™ ์ƒ์„ฑ
dbt_expectationsGreat Expectations ์Šคํƒ€์ผ ํ…Œ์ŠคํŠธ
dbt_audit_helper๋ฆฌํŒฉํ„ฐ๋ง ์‹œ ๋ชจ๋ธ ์ถœ๋ ฅ ๋น„๊ต

Create packages.yml

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1

Install packages

dbt deps

Use a macro from a package

-- Using dbt_utils to generate surrogate keys
SELECT
    {{ dbt_utils.generate_surrogate_key(['vendorid', 'pickup_datetime']) }} AS trip_id,
    *
FROM {{ source('staging', 'green_tripdata') }}

ํ…Œ์ŠคํŠธ โ€“ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ธฐ๋Œ€์— ๋ถ€ํ•ฉํ•˜๋Š”์ง€ ํ™•์ธ

1๏ธโƒฃ ์ผ๋ฐ˜ ํ…Œ์ŠคํŠธ (๊ฐ€์žฅ ์ผ๋ฐ˜์ )

์Šคํ‚ค๋งˆ YAML ํŒŒ์ผ์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค:

# models/staging/schema.yml
version: 2

models:
  - name: stg_green_tripdata
    columns:
      - name: trip_id
        tests:
          - unique       # No duplicate values
          - not_null     # No null values

      - name: payment_type
        tests:
          - accepted_values:
              values: [1, 2, 3, 4, 5, 6]  # Allowed values only

      - name: pickup_location_id
        tests:
          - relationships:               # Referential integrity
              to: ref('dim_zones')
              field: location_id
ํ…Œ์ŠคํŠธ์„ค๋ช…
unique์ปฌ๋Ÿผ์— ์ค‘๋ณต ๊ฐ’์ด ์—†์Œ
not_null์ปฌ๋Ÿผ์— NULL ๊ฐ’์ด ์—†์Œ
accepted_values๊ฐ’์ด ์ง€์ •๋œ ๋ชฉ๋ก์— ํฌํ•จ๋˜์–ด์•ผ ํ•จ
relationships๊ฐ’์ด ๋‹ค๋ฅธ ํ…Œ์ด๋ธ”์— ์กด์žฌํ•ด์•ผ ํ•จ

2๏ธโƒฃ ๋‹จ์ผ(๋งž์ถค) ํ…Œ์ŠคํŠธ

tests/ ํด๋”์— .sql ํŒŒ์ผ์„ ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค:

-- tests/assert_positive_fare_amount.sql
-- Test FAILS if any rows are returned
SELECT
    trip_id,
    fare_amount
FROM {{ ref('fct_trips') }}
WHERE fare_amount < 0

๋ชจ๋“  ํƒ์‹œ ์—ฌํ–‰(๋…ธ๋ž€์ƒ‰ ๋ฐ ์ดˆ๋ก์ƒ‰)์„ ํฌํ•จํ•˜๋Š” ์‚ฌ์‹ค ํ…Œ์ด๋ธ”.
์—ฌํ–‰๋‹น ํ•˜๋‚˜์˜ ํ–‰์— ์š”๊ธˆ ์ƒ์„ธ ๋ฐ ๊ตฌ์—ญ ์ •๋ณด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

Generate & serve docs:

dbt docs generate   # Build the site
dbt docs serve      # Open in a browser

์‚ฌ์ดํŠธ์— ํฌํ•จ๋˜๋Š” ๋‚ด์šฉ:

  • ๋ชจ๋ธ ์„ค๋ช…
  • ์ปฌ๋Ÿผ ์ •์˜
  • ์ข…์†์„ฑ ๊ทธ๋ž˜ํ”„ (์‹œ๊ฐ์  DAG)
  • ์†Œ์Šค ์ •๋ณด

์ผ๋ฐ˜ dbt ๋ช…๋ น

๋ช…๋ น์ˆ˜ํ–‰ ๋‚ด์šฉ
dbt run๋ชจ๋“  ๋ชจ๋ธ์„ ๋นŒ๋“œ (๋ทฐ/ํ…Œ์ด๋ธ” ์ƒ์„ฑ)
dbt test๋ชจ๋“  ํ…Œ์ŠคํŠธ ์‹คํ–‰
dbt build๋ชจ๋ธ๊ณผ ํ…Œ์ŠคํŠธ๋ฅผ ํ•จ๊ป˜ ์‹คํ–‰ (์ถ”์ฒœ)
dbt compile์‹คํ–‰ ์—†์ด SQL ์ƒ์„ฑ
dbt debug์—ฐ๊ฒฐ ๋ฐ ํ”„๋กœ์ ํŠธ ์„ค์ • ํ™•์ธ
dbt seed์‹œ๋“œ CSV ํŒŒ์ผ ๋กœ๋“œ
dbt depsํŒจํ‚ค์ง€ ์„ค์น˜
dbt docs generate๋ฌธ์„œ ์ƒ์„ฑ
dbt docs serve๋กœ์ปฌ์—์„œ ๋ฌธ์„œ ์ œ๊ณต
dbt retry์‹คํŒจํ•œ ๋ชจ๋ธ ์žฌ์‹œ๋„

ํŠน์ • ๋ชจ๋ธ ์„ ํƒ

# ๋‹จ์ผ ๋ชจ๋ธ
dbt run --select stg_green_tripdata

# ๋ชจ๋ธ + ๋ชจ๋“  ์ƒ์œ„ ์ข…์†์„ฑ
dbt run --select +fct_trips

# ๋ชจ๋ธ + ๋ชจ๋“  ํ•˜์œ„ ๋ชจ๋ธ
dbt run --select stg_green_tripdata+

# ์–‘๋ฐฉํ–ฅ
dbt run --select +fct_trips+

# ํด๋” ๋‚ด ๋ชจ๋“  ๋ชจ๋ธ
dbt run --select staging.*

# ์—ฌ๋Ÿฌ ๋ชจ๋ธ
dbt run --select stg_green_tripdata stg_yellow_tripdata

# ๊ฐœ๋ฐœ (๊ธฐ๋ณธ ๋Œ€์ƒ)
dbt run

# ํ”„๋กœ๋•์…˜ ๋Œ€์ƒ
dbt run --target prod

Materializations โ€“ dbt๊ฐ€ ๋ชจ๋ธ์„ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹

์œ ํ˜•์ƒ์„ฑ๋˜๋Š” ๊ฒƒ์ผ๋ฐ˜์ ์ธ ์‚ฌ์šฉ ์‚ฌ๋ก€
viewSQL ๋ทฐ (์ฟผ๋ฆฌ๊ฐ€ ์ €์žฅ๋˜๊ณ  ์ ‘๊ทผ ์‹œ ์‹คํ–‰)์Šคํ…Œ์ด์ง• ๋ชจ๋ธ, ์ž์ฃผ ๋ณ€๊ฒฝ๋˜๋Š” ๋กœ์ง
table๋ฌผ๋ฆฌ ํ…Œ์ด๋ธ” (๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋จ)์ตœ์ข… ๋งˆํŠธ, ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹, ์„ฑ๋Šฅ์ด ์ค‘์š”ํ•œ ์ฟผ๋ฆฌ
incremental์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋งŒ ์ถ”๊ฐ€๋งค์šฐ ํฐ ํ…Œ์ด๋ธ”, ์ด๋ฒคํŠธํ˜• ๋ฐ์ดํ„ฐ
ephemeral์ƒ์„ฑ๋˜์ง€ ์•Š์Œ (๋‹ค์šด์ŠคํŠธ๋ฆผ์—์„œ CTE)ํ—ฌํผ ๋ชจ๋ธ, ์ค‘๊ฐ„ ๋‹จ๊ณ„

๋ชจ๋ธ ํŒŒ์ผ์—์„œ materialization ์„ค์ •ํ•˜๊ธฐ

{{ config(materialized='table') }}

SELECT *
FROM {{ ref('stg_trips') }}

๋˜๋Š” dbt_project.yml ์ „์ฒด์— ์ ์šฉํ•˜๊ธฐ

models:
  my_project:
    staging:
      materialized: view
    marts:
      materialized: table

๋น ๋ฅธ ๊ฒฐ์ • ๋„์šฐ๋ฏธ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚               Should I use a view or a table?             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๊ธฐ์ € ๋กœ์ง์ด ์ž์ฃผ ๋ณ€๊ฒฝ๋˜๊ฑฐ๋‚˜ ๋ฐ์ดํ„ฐ์…‹์ด ์ถฉ๋ถ„ํžˆ ์ž‘์•„ ๊ฐ ์ฟผ๋ฆฌ๋งˆ๋‹ค ์žฌ๊ณ„์‚ฐํ•˜๋Š” ๋น„์šฉ์ด ์ €๋ ดํ•  ๋•Œ ๋ทฐ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.
์„ฑ๋Šฅ, ํ•˜์œ„ ์†Œ๋น„๋ฅผ ์œ„ํ•œ ์ง€์†์ ์ธ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๊ฑฐ๋‚˜ ๋ฐ์ดํ„ฐ์…‹์ด ํฌ๊ณ  ์žฌ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋งŽ์ด ๋“ค ๋•Œ ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์„ธ์š”.

Decision Flow

      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Is the query expensive?  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚            โ”‚
      Yes          No
       โ”‚            โ”‚
       โ–ผ            โ–ผ
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚  TABLE  โ”‚   โ”‚  VIEW   โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Use VIEW when:

  • Staging models (simple transformations) โ†’ ์Šคํ…Œ์ด์ง• ๋ชจ๋ธ(๊ฐ„๋‹จํ•œ ๋ณ€ํ™˜)
  • Logic changes frequently โ†’ ๋กœ์ง์ด ์ž์ฃผ ๋ณ€๊ฒฝ๋˜๋Š” ๊ฒฝ์šฐ
  • Storage cost is a concern โ†’ ์Šคํ† ๋ฆฌ์ง€ ๋น„์šฉ์ด ์šฐ๋ ค๋˜๋Š” ๊ฒฝ์šฐ

Use TABLE when:

  • Final marts are queried often โ†’ ์ตœ์ข… ๋งˆํŠธ๊ฐ€ ์ž์ฃผ ์กฐํšŒ๋˜๋Š” ๊ฒฝ์šฐ
  • Complex joins / aggregations โ†’ ๋ณต์žกํ•œ ์กฐ์ธ/์ง‘๊ณ„
  • Query performance matters โ†’ ์ฟผ๋ฆฌ ์„ฑ๋Šฅ์ด ์ค‘์š”ํ•œ ๊ฒฝ์šฐ

ํ”„๋กœ์ ํŠธ ๊ฐœ์š” โ€“ NYC ํƒ์‹œ ๋ฐ์ดํ„ฐ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      RAW DATA                                 โ”‚
โ”‚  green_tripdata (GCS/BigQuery) โ”‚ yellow_tripdata (GCS/BigQuery)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚                     โ”‚
                    โ–ผ                     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    STAGING LAYER                              โ”‚
โ”‚  stg_green_tripdata    โ”‚    stg_yellow_tripdata               โ”‚
โ”‚  (cleaned, renamed)    โ”‚    (cleaned, renamed)                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚                     โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  INTERMEDIATE LAYER                           โ”‚
โ”‚                   int_trips_unioned                           โ”‚
โ”‚            (green + yellow combined)                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      MARTS LAYER                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ dim_zones   โ”‚  โ”‚   fct_trips   โ”‚  โ”‚ fct_monthly_zone_rev โ”‚ โ”‚
โ”‚  โ”‚ (dimension)โ”‚  โ”‚    (fact)     โ”‚  โ”‚      (report)        โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๋ชจ๋ธ ์นดํƒˆ๋กœ๊ทธ

๋ชจ๋ธ์œ ํ˜•์„ค๋ช…
stg_green_tripdataStaging์ •์ œ๋œ ๊ทธ๋ฆฐ ํƒ์‹œ ๋ฐ์ดํ„ฐ
stg_yellow_tripdataStaging์ •์ œ๋œ ์˜๋กœ์šฐ ํƒ์‹œ ๋ฐ์ดํ„ฐ
int_trips_unionedIntermediate์˜๋กœ์šฐโ€ฏ+โ€ฏ๊ทธ๋ฆฐ ์—ฌํ–‰ ๊ฒฐํ•ฉ
dim_zonesDimension๊ตฌ์—ญ ์กฐํšŒ ํ…Œ์ด๋ธ”
fct_tripsFact์—ฌํ–‰๋‹น ํ•œ ํ–‰
fct_monthly_zone_revenueReport๊ตฌ์—ญ๋ณ„ ์›”๋ณ„ ์ˆ˜์ต

๋กœ์ปฌ ๊ฐœ๋ฐœ (DuckDB)

์žฅ์ : ๋ฌด๋ฃŒ, ํด๋ผ์šฐ๋“œ ๊ณ„์ • ํ•„์š” ์—†์Œ
๋‹จ์ : ๋จธ์‹ ์˜ ๋ฆฌ์†Œ์Šค๋กœ ์ œํ•œ๋จ

# 1. Install dbt with the DuckDB adapter
pip install dbt-duckdb

# 2. Clone the project
git clone https://github.com/DataTalksClub/data-engineering-zoomcamp
cd data-engineering-zoomcamp/04-analytics-engineering/taxi_rides_ny

# 3. Create `profiles.yml` in `~/.dbt/`
# 4. Test the connection
dbt debug

# 5. Build the project
dbt build --target prod

ํด๋ผ์šฐ๋“œ ๊ฐœ๋ฐœ (dbt Cloudโ€ฏ+โ€ฏBigQuery)

Pros: ๊ฐ•๋ ฅํ•จ, ํŒ€ ํ˜‘์—…, ์Šค์ผ€์ค„๋Ÿฌ
Cons: GCP ๊ณ„์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค (๋ฌด๋ฃŒ ํ‹ฐ์–ด ์ด์šฉ ๊ฐ€๋Šฅ)

  1. dbt Cloud ๊ณ„์ •์„ ์ƒ์„ฑํ•˜์„ธ์š” (๋ฌด๋ฃŒ).
  2. ์ด๋ฅผ BigQuery ํ”„๋กœ์ ํŠธ์— ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  3. dbt Cloud IDE์—์„œ ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋ฅผ ๋ณต์ œํ•ฉ๋‹ˆ๋‹ค.
  4. ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:
dbt build --target prod

์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ ํ•ด๊ฒฐ ๐Ÿ”

โ€œProfile not foundโ€

  • dbt_project.yml์— ์žˆ๋Š” profile ์ด๋ฆ„์ด profiles.yml์— ์žˆ๋Š” ์ด๋ฆ„๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.
  • profiles.yml์ด ~/.dbt/์— ์œ„์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

โ€œSource not foundโ€

  • sources.yml์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค/์Šคํ‚ค๋งˆ ์ด๋ฆ„์„ ํ™•์ธํ•˜์„ธ์š”.
  • ๋ฐ์ดํ„ฐ๊ฐ€ ์›จ์–ดํ•˜์šฐ์Šค์— ๋กœ๋“œ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.
  • ref() ํ˜ธ์ถœ์— ์˜คํƒ€๊ฐ€ ์—†๋Š”์ง€ ํ™•์ธํ•˜๊ณ , ์ฐธ์กฐ๋œ ๋ชจ๋ธ์ด ์กด์žฌํ•˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

์˜ต์…˜ ๋ฉ”๋ชจ๋ฆฌ ์กฐ์ • (profiles.yml):

settings:
  memory_limit: '2GB'

ํ•ต์‹ฌ ๊ฐœ๋…

  • Analytics Engineering์€ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง๊ณผ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • dbt๋Š” SQL ๋ณ€ํ™˜์— ์†Œํ”„ํŠธ์›จ์–ดโ€‘์—”์ง€๋‹ˆ์–ด๋ง ๋ชจ๋ฒ” ์‚ฌ๋ก€๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.
  • Dimensional modeling์€ ๋ฐ์ดํ„ฐ๋ฅผ facts(์ด๋ฒคํŠธ)์™€ dimensions(์†์„ฑ)์œผ๋กœ ์กฐ์งํ•ฉ๋‹ˆ๋‹ค.
  • ์„ธ ๊ณ„์ธต: staging(์›์‹œ ๋ณต์‚ฌ), intermediate(๋ณ€ํ™˜), marts(์ตœ์ข… ์†Œ๋น„์šฉ ํ…Œ์ด๋ธ”).
  • ref()์™€ source()๋Š” ๋ชจ๋ธ ์˜์กด์„ฑ์„ ๊ตฌ์ถ•ํ•˜๋Š” ์ฃผ์š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • Testing์€ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค โ€“ unique, not_null, accepted_values, relationships๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฌธ์„œ๋Š” YAML ์„ค๋ช…์œผ๋กœ๋ถ€ํ„ฐ ์ž๋™ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • dbt build๋Š” ์˜์กด์„ฑ ์ˆœ์„œ๋Œ€๋กœ ๋ชจ๋“  ๊ฒƒ์„ ์‹คํ–‰ํ•˜๊ณ  ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.

์ถ”๊ฐ€ ์ž๋ฃŒ ๐Ÿ“š

ํ–‰๋ณตํ•œ ๋ชจ๋ธ๋ง! ๐Ÿš€

0 ์กฐํšŒ
Back to Blog

๊ด€๋ จ ๊ธ€

๋” ๋ณด๊ธฐ ยป

์ฟผ๋ฆฌ ์ž‘์„ฑ ๋ฐฉ์‹์„ ๋ฐ”๊พธ๋Š” ๋‘ ๊ฐ€์ง€ ClickHouse ๋‚ด๋ถ€ ๊ตฌ์กฐ

๋Œ€๋ถ€๋ถ„์˜ ClickHouse ์‹ค์ˆ˜๋Š” SQL ๊ตฌ๋ฌธ์—์„œ ๋น„๋กฏ๋œ ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์€ ์ž˜๋ชป๋œ ์‚ฌ๊ณ  ๋ชจ๋ธ์„ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ClickHouse๋Š” ์นœ์ˆ™ํ•œ ๋ถ„์„์šฉ SQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ...

ํ”„๋ ˆ์ž„์›Œํฌ ์—†์ด, ๊ณ ํ†ต ์—†์ด: Aether Slices ์ž‘์„ฑ

ํ”„๋ ˆ์ž„์›Œํฌ ์—†์ด, ๊ณ ํ†ต ์—†์ด: Aether ์Šฌ๋ผ์ด์Šค ์ž‘์„ฑ ์ด์ „ ๊ธฐ์‚ฌ์—์„œ๋Š” Aether์˜ ์ฒ ํ•™์„ ์†Œ๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค: Java๋ฅผ ๊ด€๋ฆฌ๋˜๋Š” ๋Ÿฐํƒ€์ž„์œผ๋กœ ๋˜๋Œ๋ฆฌ๊ณ , ๋Ÿฐํƒ€์ž„์ด ์ธํ”„๋ผ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋„๋ก ํ•˜์„ธ์š”.

์‹œ์Šคํ…œ ๋””์ž์ธ ์ธํ„ฐ๋ทฐ๋Š” ์—‰๋ง์ด๋‹ค. ๊ทธ๋ž˜๋„ ํ†ต๊ณผํ•˜๋Š” ๋ฐฉ๋ฒ•

1. Generated Columns๊ฐ€ Application Code์—์„œ ๊ณ„์‚ฐ์„ ์ค‘๋‹จํ•จ users ํ…Œ์ด๋ธ”์— first_name๊ณผ last_name์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด ์ด๋ฆ„์ด ํ•„์š”ํ•œ ๋ชจ๋“  ์ฟผ๋ฆฌ๋Š” ๋จผ์ €โ€ฆ