Data modelling for Dummies: Because your data shouldn't live in a dumpster.
Source: Dev.to
If your spreadsheet looks like a teenager’s room with clothes scattered everywhere, you’re in the right place.
Designing how your data fits together before you build anything is the blueprint that prevents a messy pile. It translates complex business requirements into organized data structures, improving accuracy, efficiency, and data quality. The typical stages—conceptual, logical, and physical modeling—guide database design, analytics, and data management.
What Is a Schema?
A schema is essentially a blueprint for your data. It defines:
- Tables you have
- Columns in each table
- Data types for each column (e.g., text, number)
- Relationships between tables
Think of it as the ultimate dashboard‑friendly structure, designed for fast queries and easy analysis—perfect for tools like Power BI or Tableau.
Types of Schemas
Star Schema
- Fact table at the center (the foundation of a data warehouse)
- Surrounding dimension tables that describe the facts
Pros
- Simple, easy to understand
- Fast query performance (few joins)
Cons
- Some data duplication
Snowflake Schema
A snowflake schema is a star schema with extra normalization—dimension tables are further broken down into related tables.
Advantages
- Less duplication → saves storage space
- More accurate (single source of truth, e.g., “USA” → “United States”)
- Normalized (database nerds love it)
Drawbacks
- More complex (more tables → more confusion)
- Slower queries (more joins)
- Harder for business users to navigate
When to Use Which Schema
✅ Good for Snowflake
- Data warehouses where accuracy matters more than speed
- Complex hierarchies (Country → State → City → Store)
- Expensive storage (saving disk space)
- Frequent updates (update once, fix everywhere)
❌ Not Ideal for Snowflake
- Power BI / Tableau dashboards (they prefer star schemas)
- Self‑service analytics (too many tables can confuse users)
- Real‑time dashboards (joins can slow performance)
✅ Good for Star
- Fast, interactive dashboards
- Self‑service analytics where simplicity is key
- Scenarios where query speed outweighs storage concerns
Benefits of Good Data Modeling
- Clarity: Everyone understands the data layout.
- Speed: Optimized queries reduce wait times.
- Reliability: Consistent definitions prevent errors.
- Scalability: A solid model grows with your data needs.
Getting Started
- Identify the business question you want to answer.
- Define fact tables (the measurable events).
- Create dimension tables that describe those facts.
- Choose a schema (star for speed, snowflake for complex hierarchies).
- Document the model so the whole team can reference it.
For most teams—especially those using Power BI—the star schema is the go‑to choice. It’s built for speed, designed for clarity, and trusted by analysts worldwide. Start with a simple blueprint, organize around facts and dimensions, and watch messy spreadsheets transform into trustworthy answers. Your data shouldn’t just exist—it should work for you.