CSV to Excel Conversion: Why Opening a CSV in Excel Corrupts Your Data
Source: Dev.to
How Excel corrupts CSV data
When Excel opens a CSV file directly, it applies automatic type detection to every column. There is no confirmation dialog and no way to preview the results before they are applied. The most common corruptions:
- Leading zeros stripped – Zip code
07302becomes7302. Any numeric identifier with leading zeros loses them. - Dates created from non‑dates –
"OCT4"(a gene name) becomes October 4th."3-5"becomes March 5th."1/2"becomes January 2nd. This issue caused enough problems in genomics research that 27 human genes were renamed. - Scientific notation – Long numbers like credit‑card numbers (
4111111111111111) display as4.11111E+15, and digits beyond the 15th are replaced with zeros. - Unicode mangling – Characters outside the ASCII range may display incorrectly depending on the encoding. Excel assumes a default encoding that varies by locale.
The correct approach
The safe way to import CSV data into Excel preserves your original data types.
In Excel – Use Data > From Text/CSV (or Get Data > From File > From Text/CSV). This opens an import wizard that lets you set each column’s data type before importing. Set problematic columns to Text to preserve exact values.
Programmatic conversion – Use a library that gives you explicit control over data types. In Python, pandas lets you specify dtypes:
import pandas as pd
df = pd.read_csv('data.csv', dtype={'zip_code': str, 'product_id': str})
df.to_excel('data.xlsx', index=False)The dtype={'zip_code': str} parameter forces those columns to be treated as text, preventing any type coercion.
CSV encoding issues
CSV files have no standard encoding declaration. A file might be UTF‑8, Latin‑1, Windows‑1252, or UTF‑16. Excel guesses, and when it guesses wrong, accented characters, currency symbols, and non‑Latin scripts display as garbled text.
The safest approach is to know your encoding. If you created the CSV, use UTF‑8 with BOM (Byte Order Mark). The BOM is a special character at the start of the file that tells Excel to use UTF‑8. Without it, Excel often defaults to Windows‑1252 on Western systems.
What proper conversion preserves
A well‑done CSV‑to‑Excel conversion:
- Preserves all data exactly as it appears in the CSV
- Sets appropriate column widths for readability
- Preserves Unicode characters with correct encoding
- Optionally applies formatting (number formats, date formats) without changing underlying values
- Handles quoted fields correctly (commas inside quoted strings are not treated as delimiters)
- Preserves empty cells (empty strings vs. null)
The converter
For quick conversion without opening Excel or writing code, try a CSV to Excel converter that handles encoding detection, preserves data types, and produces a clean .xlsx file. No data corruption, no type coercion, no mangled Unicode.