I stopped writing prompts and started writing Python

Published: 3 weeks ago (April 9, 2026 at 05:27 AM EDT)

3 min read

Source: Dev.to

Source: Dev.to

The Prompt Chaos

For a year I treated LLMs like a command line: type instructions, pray for output, tweak wording, add “IMPORTANT:”, move sentences around like a ritual. I ended up with folders of prompts:

v1.txt
v2_final.txt
v2_final_REALLY_final.txt

None of them documented why they worked. When something broke, I couldn’t tell if the issue was the prompt, the model, or the data. There was no version control, no tests—just vibes.

Enter DSPy

DSPy (from Stanford NLP) flips the model: you don’t write prompts, you write Python.

class AnalyzeStartup(dspy.Signature):
    """Analyze a startup pitch."""

    pitch: str = dspy.InputField()
    viability_score: int = dspy.OutputField()
    strengths: list[str] = dspy.OutputField()
    weaknesses: list[str] = dspy.OutputField()
    verdict: str = dspy.OutputField()

That’s it—no “You are an expert startup analyst…”, no “Respond in JSON format…”. DSPy compiles this signature into a prompt. When you need better prompts, you run an optimizer; DSPy rewrites them based on examples that work.

From Prompt Tricks to Signatures

Before:

“If I put examples before instructions, it works better. Sometimes. Unless it’s GPT‑4o.”

After:
I write a signature. DSPy figures out the best prompt format. The prompt becomes an implementation detail; I care about inputs, outputs, and behavior—not phrasing.

Testable LLM Code

Before:
Manually checking output.

After:

def test_startup_analyzer():
    result = startup_analyzer(pitch="We're building AI for dog grooming...")
    assert 1  0
    assert len(result.weaknesses) > 0

Real tests live in my test suite with assertions.

Swapping Models in One Line

Before:
Each model required its own prompt tuning (GPT‑4, Claude, Gemini, etc.).

After:

# Swap models
lm = dspy.LM("openai/gpt-4o-mini")
# lm = dspy.LM("anthropic/claude-3-sonnet")
# lm = dspy.LM("gemini/gemini-2.0-flash")

dspy.configure(lm=lm)

Same code, different model. DSPy handles the prompt translation.

Optimizers Do the Tuning

Instead of manually tweaking prompts, I give DSPy examples of good outputs and let it figure out the best prompt:

optimizer = BootstrapFewShot(metric=my_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(StartupAnalyzer(), trainset=train_examples)

DSPy runs experiments, finds examples that work, builds the prompt, and I simply review the results.

The Paradigm Shift

Old way: LLMs are magic boxes you talk to in English; success depends on prompting skill.
DSPy way: LLMs are function calls. You declare the interface, and the framework handles the implementation.

It’s the difference between scattering raw SQL queries across a codebase and using an ORM: one is brittle, untyped, and hard to refactor; the other is structured, testable, and maintainable.

Going Deeper

I wrote a full guide on building with DSPy—practical chapters, real code, and the hard‑won lessons. It’s called Harmless DSPy; Chapter 1 is free if you want to see if it’s your thing.

DSPy is developed by Omar Khattab and the Stanford NLP team. It’s open source, actively maintained, and has genuinely changed how I build with LLMs.

I stopped writing prompts and started writing Python

The Prompt Chaos

Enter DSPy

From Prompt Tricks to Signatures

Testable LLM Code

Swapping Models in One Line

Optimizers Do the Tuning

The Paradigm Shift

Going Deeper

Related posts

The Anatomy of an Effective Prompt: Key Techniques from Google’s Guide

I Built a Tool to Test Whether Multiple LLMs Working Together Can Beat a Single Model

Stop Wasting Tokens: How to Cut Your LLM Costs by 97%

Context Engineering: Why Your Prompt Is the Smallest Problem