Starting Dusty — A Tiny DSL for ETL & Research Data Cleaning

Published: 1 month ago (December 11, 2025 at 06:55 AM EST)

2 min read

Source: Dev.to

Cover image for Starting Dusty — A Tiny DSL for ETL & Research Data Cleaning

For the last few weeks I’ve been thinking seriously about building my own programming language. Not a big general‑purpose language, not a Python replacement, and definitely not something with heavy ambitions. I just wanted to create something small, useful, and focused.

That’s where Dusty comes in.

Dusty is a lightweight DSL (domain‑specific language) designed only for ETL tasks and research data cleaning. Nothing more. No huge ecosystem, no package manager, no frameworks. The entire goal is simple: turn messy CSV/JSON cleaning work into short, readable scripts.

I’m starting with problems I’ve personally faced. Whenever I work on research data or hackathon datasets, I end up writing the same pattern again and again:

load CSV
filter rows
fix missing values
rename some fields
join with another file
export the cleaned result

Python works, but the scripts get ugly fast. Pandas is powerful, but not great for small tasks. SQL is good for structured tables but not for irregular CSVs. Most ETL tools are built for companies, not students or indie developers.

So Dusty focuses on the middle ground: simple data transformations without the overhead.

What Dusty will look like (early prototype idea)

A Dusty script looks like this:

source users = csv("users.csv")

transform adults = users
  | filter(r -> int(r.age) >= 18)
  | map(r -> { id: r.id, name: r.name })

save adults to csv("clean_adults.csv")

Readable.
No imports.
No boilerplate.
Just the data flow.

Essential ETL operations

Dusty will support the following core operations:

source
filter
map
select / rename
join
aggregate
save

That’s enough to clean real datasets used in labs, projects, and university research.

How I’m building it

This is my first language project, so I’m keeping things practical:

The Dusty interpreter is written in Python (not related to Dusty syntax at all).
Dusty code will live in .dusty files.
Users run it with a simple CLI:

dusty run main.dsty

Roadmap for v0.1

My plan is to finish Dusty v0.1 with:

a working parser
CSV support
filter / map operations
save functionality
a couple of example pipelines
basic documentation

I’m not adding a package manager, modules, or big features yet. Dusty v0.1 should be small enough that anyone can understand the whole project in one sitting.

Why I’m writing this publicly

I’ve noticed something: when you build in silence, you get lost. When you build in public, even quietly, you naturally stay accountable. So this weekly blog is just a way to share the progress, mistakes, and insights along the journey of creating a tiny DSL from scratch.

No big promises.
No hype.
Just consistent work.

Starting Dusty — A Tiny DSL for ETL & Research Data Cleaning

What Dusty will look like (early prototype idea)

Essential ETL operations

How I’m building it

Roadmap for v0.1

Why I’m writing this publicly

Related posts

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Why Idempotency Is So Important in Data Engineering

REST API Calls for Data Engineers: A Practical Guide with Examples

A Minimal Go Toolkit for Cleaning, Validating, and Querying CSV/TSV/Excel/Parquet Files