AI-Generated Code Looked Right, but the Data Was Wrong
Source: Dev.to
Background
I’m working on an AI Data Analyst in MLJAR Studio.
The idea is simple: you ask a question in natural language, the AI writes Python code, executes it, and shows the result.
Issue Encountered
While testing a medical data‑analysis use case with a diabetes CSV file, I noticed a discrepancy:
- The first task was to load data from a URL.
- The AI generated Pandas code using
pd.read_csv(). - The code executed without errors and displayed a DataFrame with the expected shape (768 rows × 9 columns).
However, inspecting the DataFrame revealed implausible values:
- In the first row, the Pregnancies column showed
148. Typical values are0, 1, 2, 6, 8, etc. - Other anomalies:
- Pregnancies contained values like
148, 85, 183. - Age contained values like
0and1. - The Outcome column was empty.
- Pregnancies contained values like
- The entire DataFrame appeared shifted.
Root Cause
The CSV file had a subtle formatting issue: an extra comma in the header row. Consequently, Pandas interpreted the first value of each row as the DataFrame index, causing all columns to shift left:
- The value
148was actually the Glucose measurement, not the number of pregnancies. - The Glucose column appeared under Pregnancies, Outcome appeared under Age, and the real Outcome column was empty.
Lessons Learned
- AI‑generated code can look correct: the notebook runs without errors, and the DataFrame displays nicely.
- Correct execution does not guarantee correct data: subtle data‑format issues can produce misleading results that Pandas does not flag.
- Output verification is essential. An LLM that inspects the generated output can catch suspicious values, missing columns, and odd statistics that a human might miss at first glance.
Recommended Workflow
- Ask AI to generate code.
- Execute the code.
- Display the output.
- Let AI inspect the output (e.g., check basic statistics, detect anomalies).
- Human review: apply common sense and domain knowledge to confirm the results.
This loop—code generation → execution → AI‑driven verification → human validation—helps ensure that the analysis is both fast and reliable.
Key question in data analysis:
Did the code run? → No, the more important question is: Does the output make sense?