How I Built a CSV Data Cleaner in 4 Days (Python Beginner Working Project)
Source: Dev.to
Background
After 2+ years in QA (Meta, Microsoft) and RPA consulting, I decided to transition to automation engineering. This is my first Python project, built in 4 days, documented completely.
The Challenge
Build a production‑ready CSV cleaner that:
- Never loses data (even invalid entries)
- Provides detailed error reports
- Handles real‑world messy data
- Uses quality‑first principles
What I Built
A Python script that:
- ✅ Cleans 1000+ contacts in seconds
- ✅ Validates emails, phones, names, ages
- ✅ Separates valid from invalid data
- ✅ Generates detailed error reports
The Journey (Day by Day)
Day 1‑2: Python Fundamentals
- Variables, strings, functions
- Dictionaries and lists
- CSV file handling
Hardest part: Understanding loops and data flow
Day 3: Building the Core
- Wrote 8 cleaning & validation functions
- Implemented error handling
Breakthrough moment: Realizing each function should return errors as a list
Day 4: Integration & Testing
- Combined all functions
- Added file writing
- Tested with messy data
Key learning: Separation of concerns (cleaning vs validation)
Key Code Sections
The Validation Pattern
def validate_email(email):
"""Check email structure"""
errors = []
if "@" not in email:
errors.append("Missing @")
# More checks...
return errors
- Returns a list (can collect multiple errors)
- Clear error messages
- Easy to extend
The Main Loop
for row_num, row in enumerate(reader, start=2):
all_errors = []
# Clean
cleaned_name = clean_name(row.get("Name", ""))
# Validate
all_errors.extend(validate_name(cleaned_name))
# Decide
if all_errors:
error_contacts.append(...)
else:
clean_contacts.append(...)
What I Learned
Technical Skills
- Python fundamentals
- CSV processing
- Error handling patterns
- Function design for reusability
Meta‑Skills
- How to learn efficiently (fundamentals before frameworks)
- How to debug systematically
- How to write readable code
- How to document your work
QA Mindset Applied to Code
- Test edge cases (empty strings,
Nonevalues) - Detailed error reporting
- Data integrity (never lose information)
- Clear documentation
Mistakes I Made
-
Initially tried to do everything in one function
- Solution: Split into cleaning and validation
-
Forgot error handling on type conversions
- Solution: Add
try/exceptblocks wherever needed
- Solution: Add
-
Wanted to make it “perfect” before shipping
- Solution: Ship a working version first, then iterate
The Results
Project Stats
- ~200 lines of code
- 8 functions
- 4 days from start to finish
- 100 % written by myself (with learning resources)
Real‑World Performance
- 1,000 rows:
Feel free to:
- Use it for your projects
- Suggest improvements
- Ask questions in comments