How I Built a CSV Data Cleaner in 4 Days (Python Beginner Working Project)

Published: (February 24, 2026 at 07:25 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Background

After 2+ years in QA (Meta, Microsoft) and RPA consulting, I decided to transition to automation engineering. This is my first Python project, built in 4 days, documented completely.

The Challenge

Build a production‑ready CSV cleaner that:

  • Never loses data (even invalid entries)
  • Provides detailed error reports
  • Handles real‑world messy data
  • Uses quality‑first principles

What I Built

A Python script that:

  • ✅ Cleans 1000+ contacts in seconds
  • ✅ Validates emails, phones, names, ages
  • ✅ Separates valid from invalid data
  • ✅ Generates detailed error reports

The Journey (Day by Day)

Day 1‑2: Python Fundamentals

  • Variables, strings, functions
  • Dictionaries and lists
  • CSV file handling

Hardest part: Understanding loops and data flow

Day 3: Building the Core

  • Wrote 8 cleaning & validation functions
  • Implemented error handling

Breakthrough moment: Realizing each function should return errors as a list

Day 4: Integration & Testing

  • Combined all functions
  • Added file writing
  • Tested with messy data

Key learning: Separation of concerns (cleaning vs validation)

Key Code Sections

The Validation Pattern

def validate_email(email):
    """Check email structure"""
    errors = []

    if "@" not in email:
        errors.append("Missing @")

    # More checks...

    return errors
  • Returns a list (can collect multiple errors)
  • Clear error messages
  • Easy to extend

The Main Loop

for row_num, row in enumerate(reader, start=2):
    all_errors = []

    # Clean
    cleaned_name = clean_name(row.get("Name", ""))

    # Validate
    all_errors.extend(validate_name(cleaned_name))

    # Decide
    if all_errors:
        error_contacts.append(...)
    else:
        clean_contacts.append(...)

What I Learned

Technical Skills

  • Python fundamentals
  • CSV processing
  • Error handling patterns
  • Function design for reusability

Meta‑Skills

  • How to learn efficiently (fundamentals before frameworks)
  • How to debug systematically
  • How to write readable code
  • How to document your work

QA Mindset Applied to Code

  • Test edge cases (empty strings, None values)
  • Detailed error reporting
  • Data integrity (never lose information)
  • Clear documentation

Mistakes I Made

  • Initially tried to do everything in one function

    • Solution: Split into cleaning and validation
  • Forgot error handling on type conversions

    • Solution: Add try/except blocks wherever needed
  • Wanted to make it “perfect” before shipping

    • Solution: Ship a working version first, then iterate

The Results

Project Stats

  • ~200 lines of code
  • 8 functions
  • 4 days from start to finish
  • 100 % written by myself (with learning resources)

Real‑World Performance

  • 1,000 rows:

Feel free to:

  • Use it for your projects
  • Suggest improvements
  • Ask questions in comments
0 views
Back to Blog

Related posts

Read more »