AI-Generated Code Looked Right, but the Data Was Wrong

Published: 1 hour ago (May 5, 2026 at 05:08 AM EDT)

2 min read

Source: Dev.to

Background

I’m working on an AI Data Analyst in MLJAR Studio.
The idea is simple: you ask a question in natural language, the AI writes Python code, executes it, and shows the result.

Issue Encountered

While testing a medical data‑analysis use case with a diabetes CSV file, I noticed a discrepancy:

The first task was to load data from a URL.
The AI generated Pandas code using pd.read_csv().
The code executed without errors and displayed a DataFrame with the expected shape (768 rows × 9 columns).

However, inspecting the DataFrame revealed implausible values:

In the first row, the Pregnancies column showed 148. Typical values are 0, 1, 2, 6, 8, etc.
Other anomalies:
- Pregnancies contained values like 148, 85, 183.
- Age contained values like 0 and 1.
- The Outcome column was empty.
The entire DataFrame appeared shifted.

Root Cause

The CSV file had a subtle formatting issue: an extra comma in the header row. Consequently, Pandas interpreted the first value of each row as the DataFrame index, causing all columns to shift left:

The value 148 was actually the Glucose measurement, not the number of pregnancies.
The Glucose column appeared under Pregnancies, Outcome appeared under Age, and the real Outcome column was empty.

Lessons Learned

AI‑generated code can look correct: the notebook runs without errors, and the DataFrame displays nicely.
Correct execution does not guarantee correct data: subtle data‑format issues can produce misleading results that Pandas does not flag.
Output verification is essential. An LLM that inspects the generated output can catch suspicious values, missing columns, and odd statistics that a human might miss at first glance.

Recommended Workflow

Ask AI to generate code.
Execute the code.
Display the output.
Let AI inspect the output (e.g., check basic statistics, detect anomalies).
Human review: apply common sense and domain knowledge to confirm the results.

This loop—code generation → execution → AI‑driven verification → human validation—helps ensure that the analysis is both fast and reliable.

Key question in data analysis:
Did the code run? → No, the more important question is: Does the output make sense?

AI-Generated Code Looked Right, but the Data Was Wrong

Background

Issue Encountered

Root Cause

Lessons Learned

Recommended Workflow

Related posts

AI on Legacy Systems - What the Integration Layer Actually Looks Like

What (un)exactly do you mean by semantic search?

The New AI Tools Quietly Replacing Half Your Dev Workflow (And What To Do About It)

Compute Arbitrage: Why API Routing Is the Next Big Infrastructure Play