Scaffolded Test-First Prompting: Get Correct Code From the First Run

Published: (March 19, 2026 at 06:19 PM EDT)
7 min read
Source: Dev.to

Source: Dev.to

If you use AI to help with coding, the most common failure mode isn’t that the model is lazy—it’s that the target is fuzzy.

You ask for a fix, the assistant guesses what “correct” means, and you get something that looks plausible but is slightly off: wrong edge case, wrong file, wrong abstraction, wrong dependency, or the right idea implemented far too broadly.

A simple way to reduce that failure rate is to stop asking for the fix first.
Ask for the test first.

Why This Works

Most prompting problems in code work are really specification problems.

When you say:

Fix the currency formatter for German locales.

there are a dozen hidden questions:

  • What file owns the behavior?
  • What format is expected exactly?
  • Which runtime or test framework is in play?
  • Are external libraries allowed?
  • Should the solution be minimal or architectural?
  • What counts as “done”?

A test answers those questions in a way prose usually does not. It gives the assistant an observable target instead of a vague intention.

That creates three immediate benefits:

  1. Correctness becomes executable. You can run the result instead of debating it.
  2. Scope stays smaller. The assistant is less likely to refactor half the repo when one assertion would do.
  3. Review gets easier. The change is justified by a failing case, not by hand‑wavy explanations.

The Workflow in 3 Steps

1. Give Minimal Runnable Context

Start with just enough information for the assistant to place the work.

Include:

  • the file or module involved
  • the language/runtime
  • the test framework
  • one concrete input/output example
  • any relevant constraint like “no new dependencies”

Example

I have formatCurrency(amount, locale) in src/format.js.
Runtime: Node 22.
Tests use Jest.
Expected behavior: formatCurrency(12.5, 'de-DE') should return '12,50 €'.
No new dependencies.

That is enough. You do not need a three‑paragraph backstory.

2. Ask for a Failing Test Only

This is the key move. Do not ask for the implementation yet. Ask for one focused test file that uses the existing project conventions and only checks the behavior you care about.

Example prompt

Create a Jest test for src/format.js that asserts
formatCurrency(12.5, 'de-DE') returns '12,50 €'.
Only return the test file contents and any minimal config change required.
Do not implement the function yet.

Why separate the steps?
Because it forces the assistant to reason about expected behavior before it starts inventing code. That alone removes a surprising amount of drift. It also gives you an artifact you can run immediately. If the test is broken, ambiguous, or not aligned with your conventions, you catch that early before any implementation gets layered on top.

3. Ask for the Smallest Fix that Makes the Test Pass

Run the test. If it fails, paste the failure output and ask for the smallest possible code change that satisfies this exact case.

Example prompt

This test fails with the following output:
[paste stack trace]

Implement the smallest change in src/format.js that makes this test pass.
Avoid unrelated refactors.
Return a unified diff.

That framing matters. “Smallest change” and “avoid unrelated refactors” are not decoration—they steer the assistant away from broad rewrites and toward reviewable patches.

A Concrete Example

Problem: CSV export bug—fields containing newlines break when opened in spreadsheet apps.

A vague prompt would be:

Fix CSV export escaping.

That is almost guaranteed to produce a mushy answer.

A scaffolded version looks like this instead.

Step 1 – Context

File: src/export/csv.ts
Runtime: Node 22
Tests: Vitest
Bug: values containing newlines are split into multiple rows in spreadsheet apps
Constraint: preserve the newline; do not replace it with spaces

Step 2 – Test First

import { describe, it, expect } from 'vitest';
import { toCsvRow } from '../../src/export/csv';

describe('toCsvRow', () => {
  it('quotes fields containing newlines', () => {
    const row = toCsvRow(['hello\nworld', 'ok']);
    expect(row).toBe('"hello\nworld",ok');
  });
});

Step 3 – Minimal Implementation Request

Now the assistant has a precise target:

  • preserve the newline
  • quote the field
  • keep other fields unchanged

That is a much tighter problem than “fix CSV export.” You are far more likely to get a correct, local patch instead of a speculative rewrite.

Why This Often Beats “Write the Code” Prompts

Direct implementation prompts encourage the model to jump straight to a solution shape. Sometimes that works, but often the assistant commits too early.

Test‑first prompting delays that commitment. It makes the assistant define success in visible behavior before it chooses structure. That tends to improve outcomes in a few specific ways:

  • fewer invented abstractions
  • fewer unnecessary dependencies
  • better preservation of existing conventions
  • easier debugging, because the failing assertion narrows the search space

It also helps you think better. Writing or reviewing the test forces clarity. You may notice that the expected output is wrong, the edge case is underspecified, or the real problem is slightly different than you thought.

Good Prompts for This Pattern

Below are a few template prompts you can adapt:

GoalPrompt
Provide context“I have a function parseDate(str) in src/date.js. Runtime: Node 20. Tests use Mocha. Expected: parseDate('2023‑01‑01') returns a Date representing midnight UTC. No new dependencies.”
Request a failing test“Write a Mocha test that asserts parseDate('2023‑01‑01') returns a Date with getTime() === 1672531200000. Return only the test file content. Do not modify the implementation.”
Ask for the smallest fix“The test above fails with: expected 1672531200000 but got 0. Provide the smallest change to src/date.js that makes the test pass. Return a unified diff and avoid unrelated changes.”

Feel free to mix and match the pieces to suit your language, runtime, and testing framework. The key is always: context → failing test → minimal fix.

Consistent Guidance

  • “Write one focused failing test first.”
  • “Use existing project conventions.”
  • “Do not implement yet.”
  • “Return only the test file contents.”
  • “Implement the smallest change that makes this test pass.”
  • “Avoid unrelated refactors.”
  • “Return a unified diff.”

These small constraints, taken together, make the workflow much more reliable.

Common Mistakes

Asking for Too Much at Once

One bug, one test, one patch.
If you request three edge cases, a refactor, and updated docs in a single prompt, you lose the main benefit: tight feedback.

Giving Environment‑Free Prompts

“Write a test” without naming Jest, Vitest, pytest, or the file layout leads to correct ideas in the wrong format.

Accepting a Test You Did Not Run

A generated test is still code—run it. A broken test file is not a contract; it’s just another hallucination with nicer formatting.

Letting the Implementation Grow

Once the test exists, hold the line. Ask for the smallest passing change, not the “cleanest long‑term redesign.” You can always refactor after correctness is locked in.

When Not to Use It

This pattern shines when behavior is known and correctness matters. It is less useful for:

  • exploratory prototyping
  • open‑ended design work
  • large refactors where the desired behavior is still moving

In those cases, a spec‑first or plan‑first prompt may fit better.

Closing

Scaffolded Test‑First Prompting works because it replaces ambiguous intent with a runnable target.

Instead of asking AI to guess what “fixed” means, you:

  1. Define a failing example.
  2. Make that example executable.
  3. Ask for the smallest code change that satisfies it.

That habit usually buys you faster convergence, cleaner patches, and fewer weird detours.

If you use AI for coding every day, this is one of the easiest workflow upgrades to adopt: test first, patch second, run everything.

0 views
Back to Blog

Related posts

Read more »