Module 3 Summary - Data Warehousing & BigQuery

Published: (February 9, 2026 at 06:07 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Understanding Data Warehouses & OLAP vs OLTP 🏒

Why Do We Need Data Warehouses? πŸ€”

Imagine you run an online store. Your website has a database that handles:

  • Customer sign‑ups
  • Product orders
  • Payment processing
  • Inventory updates

This database needs to be FAST because customers are waiting. Every millisecond counts!

Now, your boss asks: β€œWhat were our top‑selling products last year by region, and how did that compare to the year before?”

Running that query on your production database would:

  • Slow down your website (bad for customers!)
  • Take forever because the database isn’t designed for such complex analysis
  • Potentially crash things if the query is too heavy

This is exactly why data warehouses exist! They’re a separate place to store your data, specifically designed for answering complex analytical questions without affecting your live applications.

OLTP vs OLAP – The Two Worlds of Databases

AspectOLTP (Transactional)OLAP (Analytical)
What it’s forRunning your app – orders, logins, updatesAnswering business questions – reports, dashboards
Type of queriesSimple: β€œGet user #123’s info”Complex: β€œShow sales trends by region for 5 years”
SpeedSuper fast for small operationsCan take minutes for huge analyses
Data freshnessReal‑time, always up‑to‑dateUsually updated daily/hourly (batch)
Data organizationNormalized (many tables, no duplicates)Denormalized (fewer tables, some duplication OK)
Data sizeGigabytes (current data)Terabytes/Petabytes (years of history)
Who uses itYour application, customersData analysts, managers, executives
ExamplesMySQL, PostgreSQL for your website/appBigQuery, Snowflake, Amazon Redshift

Real-World Example πŸ›’

OLTP scenario (your app database)

-- A customer places an order – needs to be FAST
INSERT INTO orders (customer_id, product_id, quantity, price) 
VALUES (123, 456, 2, 29.99);

OLAP scenario (data warehouse)

-- Your CEO wants to know Q4 performance – can take a minute, that’s fine
SELECT 
    region,
    product_category,
    SUM(revenue) AS total_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM sales_data
WHERE order_date BETWEEN '2023-10-01' AND '2023-12-31'
GROUP BY region, product_category
ORDER BY total_revenue DESC;

πŸ’‘ Key insight: OLTP is like a cashier at a store – fast, handles one customer at a time. OLAP is like the accounting department – takes time to analyze all the receipts and produce reports.

What Exactly is a Data Warehouse? πŸ—οΈ

A data warehouse is a centralized repository where you collect data from all your different systems and store it in a way that’s optimized for analysis.

Think of it like this:

Imagine a company with multiple departments:

  • Sales team uses Salesforce
  • Marketing uses HubSpot
  • Website runs on PostgreSQL
  • Inventory managed in SAP

Each system has its own database, but your CEO wants a report combining data from all of them. This is where a data warehouse comes in!

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Salesforce β”‚   β”‚   HubSpot   β”‚   β”‚  PostgreSQL β”‚
β”‚   (Sales)   β”‚   β”‚ (Marketing) β”‚   β”‚  (Website)  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚                 β”‚                 β”‚
       β”‚    ETL/ELT      β”‚                 β”‚
       β”‚   (Extract,     β”‚                 β”‚
       β”‚   Transform,   β”‚                 β”‚
       β”‚    Load)        β”‚                 β”‚
       β–Ό                 β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              DATA WAREHOUSE (BigQuery)           β”‚
β”‚   All your data, cleaned, organized, ready       β”‚
β”‚   for analysis!                                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Reports, Dashboards,    β”‚
         β”‚  Machine Learning, etc. β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key characteristics of a data warehouse

  • πŸ“Š Subject‑oriented – Organized by business topics (sales, customers, products)
  • πŸ”— Integrated – Data from multiple sources combined together
  • πŸ“… Time‑variant – Keeps historical data (years worth!)
  • πŸ”’ Non‑volatile – Data doesn’t change once loaded (it’s a historical record)

Modern Cloud Data Warehouses ☁️

Traditional data warehouses (like Oracle, Teradata) required:

  • Buying expensive hardware
  • Hiring DBAs to manage servers
  • Months of setup time
  • Huge upfront costs

Modern cloud data warehouses (BigQuery, Snowflake, Redshift) changed everything:

  • βœ… No servers to manage (serverless)
  • βœ… Pay only for what you use
  • βœ… Scales automatically
  • βœ… Set up in minutes
  • βœ… Access from anywhere
0 views
Back to Blog

Related posts

Read more Β»