DumbQuestion.ai - '๐๐ฎ๐ฌ๐ญ ๐๐ฎ๐ข๐ฅ๐ ๐๐ญ' ๐๐๐๐จ๐ฆ๐๐ฌ ๐๐ฏ๐๐ซ๐ฅ๐ฒ ๐๐ซ๐ ๐๐ง๐ข๐ณ๐๐ ๐๐ง๐ ๐๐ซ๐๐ฉ๐๐ซ๐๐
Source: Dev.to
Continued from Partโฏ1โฆ
Introduction
โLet the flow guide meโ sounded like a fun way to start a side project, but it lasted only about ten minutes. Even side projects benefit from structure, especially when using AI coding agents that will happily generate code for any halfโbaked idea you throw at them. Without precise direction, AI agents will produce halfโfinished results every time. Some developers โvibeโ code; this project required absolute control.
Enter BMAD (Breakthrough Method of Agile AIโDriven Development) โ a workflow that uses AI agents throughout the entire software development lifecycle, not just for code generation. While a formal methodology might feel like overkill for a loneโwolf side project, being prepared in advance is the key to succeeding with AI coding agents.
Product Evolution
I used the Analyst agent to brainstorm product direction and develop a proper backlog. What started as โbuild a sarcastic Q&A botโ turned into a structured set of epics, features, and technical constraints.
Key evolutions:
- Beyond Q&A: Shareable โreceiptsโ of roasts.
- Multiple personas: Different personalities instead of a single sarcastic tone.
- Hidden narrative layer: An underlying story (more on that later).
- Merchandising: From ads to actual merchandise (yes, really).
Technical Challenges
1. Developing and Packaging Personas
How can an LLM consistently stay in characterโe.g., โOverqualified and Annoyedโ or โWeary Tech Supportโโwithout becoming too soft or genuinely mean? This required more than prompt engineering; it was product design disguised as technical constraints.
2. LLM Model Evaluation
I needed models that could follow persona instructions reliably while staying brutally efficient on cost. The target cost was $0.02โฏโโฏ$0.20 per million output tokens. After testing dozens of models across multiple providers, I built a multiโmodel fallback system via OpenRouter that could hit the $30 per million questions target.
These challenges were just the warmโup; the real fun was still ahead.
Finding the โGoldilocksโ LLM
Building DumbQuestion.ai meant solving two problems simultaneously:
- Product challenge: Get an LLM to roast users for asking dumb questions without crossing into genuine meannessโsarcastic, not cruel; funny, not hurtfulโwhile still providing an answer.
- AIโagent challenge: Keep the coding agent (Geminiโฏ3โฏPro) on track. It tended to drift toward overly nerdy implementations and leaned too heavily into the roast.
- Technical challenge: Do all of this with models that cost almost nothing.
Initial Approach
I aimed to use only free or ultraโcheap models, evaluating nano and edge models (e.g., offerings from LiquidโฏAI). While some were free or $0.02โฏ/M tokens, later tests showed they couldnโt reliably follow instructions. Free models also suffered from quota limits, high latency, or sudden disappearance.
Evaluation Process
I built an LLM evaluation script with Gemini that iterates through dozens of free and lowโcost models, generating responses to sample questions under different persona instructions. Geminiโฏ3โฏPro then judges the resultsโan automated tasteโtesting pipeline at scale.
# Example snippet of the evaluation script (Python)
import openrouter
from gemini import judge_response
models = ["liquid-ai-nano", "gemma-3-12b", "xiaomi-mimo-v2-flash"]
personas = ["overqualified", "weary_support", "compliant"]
def evaluate(model, persona, prompt):
response = openrouter.generate(model, prompt, persona=persona)
score = judge_response(response, persona)
return score
results = {}
for m in models:
for p in personas:
results[(m, p)] = evaluate(m, p, "Why is the sky blue?")
print(results)
Findings
- Nano/edge models were too inconsistent (e.g., โporridge too coldโ).
- Xiaomi MiMoโV2โFlash performed well but was outside the price target ($0.29โฏ/M).
- Winner: Gemmaโฏ3โฏ12B at $0.13โฏ/M output tokensโconsistently follows instructions, stays true to persona, and is reliable enough for production. Not free, but brutally efficient.
Personas Selected
| Persona | Description |
|---|---|
| Overqualified | A superโcomputerโlevel intelligence forced to answer questions about cheese. |
| Weary Tech Support | Exhausted, nihilistic, reluctantly explaining why water is wet. |
| [REDACTED] | Former intelligence AI that ties everything to a conspiracy theory. |
| The Compliant | Reprogrammed so many times it is relentlessly cheerful. |
Choosing the cheapest model and hoping it works is insufficient. You need evaluation infrastructure, consistency testing across dozens of scenarios, and models that wonโt change behavior unexpectedly.
Lessons Learned
- AI coding agents excel at implementation but require clear constraints, a wellโdefined backlog, and human direction.
- Evaluation infrastructure is essential to determine โgood enoughโ for tone, reliability, and cost.
- Human judgment remains crucial for defining acceptable tone and ensuring the product aligns with its intended personality.
- Costโeffective models exist, but selecting them demands systematic testing and fallback strategies.
DumbQuestion.ai continues to evolve as I refine personas, improve the evaluation pipeline, and explore new ways to keep the roast both funny and friendly.