Your Ruby CSV Import Ran Successfully — Your Data May Still Be Wrong
Source: Dev.to
Why Ruby CSV May Miss Errors
Are you sure that Ruby CSV imported all your data — and correctly? 🤔
While improving the performance of smarter_csv, I added a new round of tests (including some borrowed from Ruby CSV’s own test suite) as a sanity check. This led me to think through error scenarios and discover that Ruby CSV has several failure modes that produce no exception, warning, or indication that anything went wrong. Your import runs, your tests pass, but your data may be quietly wrong.
10 Failure Modes in Ruby CSV
I identified ten ways Ruby’s CSV.read can silently corrupt or lose data. Below are two representative examples; the full list with reproducible examples can be downloaded and run yourself.
Numeric conversion of leading‑zero strings
The ZIP code"00123"is silently converted to83because Ruby CSV interprets leading zeros as octal. This isn’t a rounding error—it’s a completely different number. ZIP codes, customer IDs, order numbers, and similar fields can be replaced with wrong integers that still pass validation and look plausible.Incorrect delimiter handling
A user uploads a tab‑separated file but names it with a.csvextension. The file‑type guard passes, Ruby CSV sees no commas, treats each entire row as a single field, and returns data that appears valid while the column structure is lost.
You can download the full set of examples here:
10 Ways Ruby’s CSV.read Can Silently Corrupt or Lose Your Data (link to the repository or gist)
SmarterCSV 1.16
SmarterCSV 1.16 addresses all ten failure modes. In addition to fixing these bugs, it offers:
- Performance improvements: 1.8×–8.6× faster than
CSV.readend‑to‑end. - Bad‑row quarantine system: isolates rows that fail parsing or validation.
- Instrumentation hooks: make it easy to monitor and log parsing behavior.
Read more about the release:
SmarterCSV 1.16 Released (link to the release notes)
Contributing
Found something? Issues, feedback, and bug reports are always welcome in the GitHub Discussions or Issues page.
If you have a success story to share, we’d love to hear from you!