DumbQuestion.ai - '𝐉𝐮𝐬𝐭 𝐁𝐮𝐢𝐥𝐝 𝐈𝐭' 𝐁𝐞𝐜𝐨𝐦𝐞𝐬 𝐎𝐯𝐞𝐫𝐥𝐲 𝐎𝐫𝐠𝐚𝐧𝐢𝐳𝐞𝐝 𝐚𝐧𝐝 𝐏𝐫𝐞𝐩𝐚𝐫𝐞𝐝

Published: 3 days ago (February 24, 2026 at 02:53 PM EST)

4 min read

Source: Dev.to

Continued from Part 1…

Introduction

“Let the flow guide me” sounded like a fun way to start a side project, but it lasted only about ten minutes. Even side projects benefit from structure, especially when using AI coding agents that will happily generate code for any half‑baked idea you throw at them. Without precise direction, AI agents will produce half‑finished results every time. Some developers “vibe” code; this project required absolute control.

Enter BMAD (Breakthrough Method of Agile AI‑Driven Development) – a workflow that uses AI agents throughout the entire software development lifecycle, not just for code generation. While a formal methodology might feel like overkill for a lone‑wolf side project, being prepared in advance is the key to succeeding with AI coding agents.

Product Evolution

I used the Analyst agent to brainstorm product direction and develop a proper backlog. What started as “build a sarcastic Q&A bot” turned into a structured set of epics, features, and technical constraints.

Key evolutions:

Beyond Q&A: Shareable “receipts” of roasts.
Multiple personas: Different personalities instead of a single sarcastic tone.
Hidden narrative layer: An underlying story (more on that later).
Merchandising: From ads to actual merchandise (yes, really).

Technical Challenges

1. Developing and Packaging Personas

How can an LLM consistently stay in character—e.g., “Overqualified and Annoyed” or “Weary Tech Support”—without becoming too soft or genuinely mean? This required more than prompt engineering; it was product design disguised as technical constraints.

2. LLM Model Evaluation

I needed models that could follow persona instructions reliably while staying brutally efficient on cost. The target cost was $0.02 – $0.20 per million output tokens. After testing dozens of models across multiple providers, I built a multi‑model fallback system via OpenRouter that could hit the $30 per million questions target.

These challenges were just the warm‑up; the real fun was still ahead.

Finding the “Goldilocks” LLM

Building DumbQuestion.ai meant solving two problems simultaneously:

Product challenge: Get an LLM to roast users for asking dumb questions without crossing into genuine meanness—sarcastic, not cruel; funny, not hurtful—while still providing an answer.
AI‑agent challenge: Keep the coding agent (Gemini 3 Pro) on track. It tended to drift toward overly nerdy implementations and leaned too heavily into the roast.
Technical challenge: Do all of this with models that cost almost nothing.

Initial Approach

I aimed to use only free or ultra‑cheap models, evaluating nano and edge models (e.g., offerings from Liquid AI). While some were free or $0.02 /M tokens, later tests showed they couldn’t reliably follow instructions. Free models also suffered from quota limits, high latency, or sudden disappearance.

Evaluation Process

I built an LLM evaluation script with Gemini that iterates through dozens of free and low‑cost models, generating responses to sample questions under different persona instructions. Gemini 3 Pro then judges the results—an automated taste‑testing pipeline at scale.

# Example snippet of the evaluation script (Python)
import openrouter
from gemini import judge_response

models = ["liquid-ai-nano", "gemma-3-12b", "xiaomi-mimo-v2-flash"]
personas = ["overqualified", "weary_support", "compliant"]

def evaluate(model, persona, prompt):
    response = openrouter.generate(model, prompt, persona=persona)
    score = judge_response(response, persona)
    return score

results = {}
for m in models:
    for p in personas:
        results[(m, p)] = evaluate(m, p, "Why is the sky blue?")
print(results)

Findings

Nano/edge models were too inconsistent (e.g., “porridge too cold”).
Xiaomi MiMo‑V2‑Flash performed well but was outside the price target ($0.29 /M).
Winner: Gemma 3 12B at $0.13 /M output tokens—consistently follows instructions, stays true to persona, and is reliable enough for production. Not free, but brutally efficient.

Personas Selected

Persona	Description
Overqualified	A super‑computer‑level intelligence forced to answer questions about cheese.
Weary Tech Support	Exhausted, nihilistic, reluctantly explaining why water is wet.
[REDACTED]	Former intelligence AI that ties everything to a conspiracy theory.
The Compliant	Reprogrammed so many times it is relentlessly cheerful.

Choosing the cheapest model and hoping it works is insufficient. You need evaluation infrastructure, consistency testing across dozens of scenarios, and models that won’t change behavior unexpectedly.

Lessons Learned

AI coding agents excel at implementation but require clear constraints, a well‑defined backlog, and human direction.
Evaluation infrastructure is essential to determine “good enough” for tone, reliability, and cost.
Human judgment remains crucial for defining acceptable tone and ensuring the product aligns with its intended personality.
Cost‑effective models exist, but selecting them demands systematic testing and fallback strategies.

DumbQuestion.ai continues to evolve as I refine personas, improve the evaluation pipeline, and explore new ways to keep the roast both funny and friendly.