Data Engineering Skills Gap Nobody Fills — and the Side Project I Finally Finished to Fill It

Published: (June 4, 2026 at 01:15 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

[![Petascale Labs](https://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3962856%2Fd98bb0fa-6966-4446-bae3-69c6a1427f64.png)](https://dev.to/petascalelabs)

# GitHub “Finish-Up-A-Thon” Challenge Submission

*This is a submission for the [GitHub Finish-Up-A-Thon Challenge](https://dev.to/challenges/github-2026-05-21).*

---

## What I Built

**Petascale Labs** – a data‑engineering learning platform that teaches the stack **from the bytes up**.  
Most DE curricula show you *which* button to click. We teach you *why* it breaks in production and how to reason about it from first principles.

### What makes it ours

- **The Strata model** – the data platform as layers:  
  `storage & file formats → ingestion → open table formats → compute engines → orchestration → query engines/OLAP → semantic layer`.  
  A mental map for the whole stack.

- **Incident‑driven lessons** – every lesson is a real production failure and its fix. You learn the way you actually grow at work.

- **Incident‑Response Arcade** – interactive, time‑pressured sims where you diagnose and resolve infra failures (phantom lag, shuffle spills, broken CDC) under a budget and a cluster‑health clock.  
  

- **Free, client‑side DE tools** – a Parquet Inspector, an SCD Playground, and a PII Masking Policy Generator that run entirely in your browser.  
  

---

## Demo

🔗 **Live:** 

![The Platform](https://media2.dev.to/dynamic/image/width=800,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ba7v3hl1vx3qvpzy32.png)

![Simulation Arcade](https://media2.dev.to/dynamic/image/width=800,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwectiz064mul4j4z81u6.png)

![Free Tools](https://media2.dev.to/dynamic/image/width=800,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faoush5yqnyfg9czd3tru.png)

![Arcade Access](https://media2.dev.to/dynamic/image/width=800,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2qu5p0xivllgod04ni1r.png)

### Things to try

- **The Incident‑Response Arcade** – pick a scenario, work the terminal, and ship a post‑mortem before the cluster falls over (timer + budget + cluster‑health clock).

- **Free DE Tools** – fast, **100 % client‑side** utilities for working data engineers:

  - **Parquet Inspector** – drop a `.parquet` file and read its schema, row groups, column stats, and metadata, all in‑browser (DuckDB‑WASM). Nothing is uploaded anywhere.
  - **SCD Playground** – simulate a customer relocation or tier upgrade and watch dimensions transform under each Slowly Changing Dimension type.
  - **PII Masking Policy Generator** – paste a sample, auto‑detect the PII, and generate ready‑to‑run dynamic data‑masking policies for Snowflake, Databricks, and BigQuery while learning the nuances of hashing, tokenization, redaction, and generalization.

- **The Strata map** – browse the data‑platform layer by layer, from storage & file formats up to the semantic layer.

---

## The Comeback Story

This started as scattered notes and a half‑built course engine – an idea buried under “I’ll finish it later.” The bones existed: a lesson renderer, a few Strata, a rough game loop. Nothing hung together.

The finish‑up sprint closed the gap:

- Shipped the **Incident‑Response Arcade** end‑to‑end – game engine, HUD (timer/credits/health), terminal, Slack‑style alert stream, and post‑mortem screen.  
- Built a **free tools hub** – Parquet Inspector, SCD Playground, and PII Masking Policy Generator – all client‑side, each shippable on its own.  
- Wired **content authoring** into a real contract so new incidents and lessons drop in as data, not code.  
- Fixed the unglamorous‑but‑fatal stuff: production SSR/routing, auth, and the rough edges that keep a side project from ever feeling “done.”

It went from a folder I was embarrassed to share to something I’m proud to put a demo link next to.

---

## My Experience with GitHub Copilot

Copilot was most useful in the **glue and grind** – the parts that stall a finishing sprint. Concretely:

- **Boilerplate velocity** – React component scaffolds, TypeScript interfaces for the game state, and repetitive handlers came out fast from a comment or a type signature, letting me focus on game *design* rather than plumbing.

- **In‑editor pattern‑matching** – once one phase component (e.g., the HUD) had a shape, Copilot inferred the pattern for subsequent components, reducing copy‑paste and keeping the codebase consistent.

- **Iterative prototyping** – quick “what‑if” snippets (e.g., a new terminal command or a budget‑calculation helper) could be drafted, tested, and refined without leaving the editor.

Overall, Copilot accelerated the mundane parts, giving me more mental bandwidth to polish the user experience and polish the educational content.

---

e next ones from context, keeping the codebase consistent.

Unblocking the boring last 20 % — Go handler stubs, JSON scaffolds for new incident scenarios, and small refactors where momentum matters more than novelty.

Where I stayed hands‑on: the architecture, the incident pedagogy, and anything touching correctness in production. Copilot is a force multiplier on the typing, not a substitute for the thinking — which is exactly the philosophy we teach.

Petascale Labs — understand the data stack from the bytes up.

0 views
Back to Blog

Related posts

Read more »

[Boost]

!https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprof...