CSV: The Format Nobody Designed
Source: Dev.to
By Design — Episode 02
No specification. No schema. No data types. No standard encoding. No committee. No owner. No version number.
In 1972, IBM’s Fortran compiler started accepting comma‑separated values as input. Nobody wrote a design document. Nobody proposed a standard. Someone needed to move data from one place to another, separated values by commas, and it worked. That was the entire specification.
Thirty‑three years later, Yakov Shafranovich wrote RFC 4180 to formalise what was already everywhere. The format was faster than the standardisation.
“CSV is not a real format. No types, no schema, no validation. One misplaced comma and your import breaks. One semicolon‑delimited file from Germany and your pipeline explodes. It is amateur hour in a text file.”
Every data engineer has said this. Most of them said it today.
Why there is no design
- No committee → no politics.
- No schema → no version conflicts.
- No types → every system on earth can read it: databases, spreadsheets, shells, thirty‑year‑old mainframes.
grep finds rows. awk splits columns. sort orders them. The entire Unix toolchain works on CSV without knowing what CSV is.
It requires no parser beyond “split at delimiter.” It requires no agreement beyond “the first row might be headers.” It requires no dependency, no library, no runtime.
Pain points
- Encoding chaos.
- Delimiter conflicts.
- No escaping standard.
A quoted field containing a comma inside a file delimited by commas can break a system that does not handle quotes. These edge cases are a source of fragility; every surprise tends to land on a Friday afternoon.
Adoption
- 60 % of enterprises use CSV for data exchange between systems.
- Every spreadsheet application, database export, CRM, ERP, and accounting tool can produce CSV.
RFC 4180 was published in 2005, by which time billions of CSV files already existed.
Competition
- XML tried to replace it: too verbose.
- JSON tried: no tabular structure.
- Parquet tried: requires a runtime.
- Avro tried: requires a schema registry.
CSV survived them all because it requires nothing but a text editor and the ability to count commas.
The paradox of governance
The format that requires no agreement will always beat the format that requires consensus. CSV has no governance, no authority, no design document. That is not a flaw; it is the reason it outlived every format that tried to replace it.
Nobody designed CSV. Fifty‑three years later, everybody uses it.