I built an AI model comparison tool after 12 hours wasted on LLM integration in project. Launching on Product Hunt today.

Published: 2 months ago (February 24, 2026 at 09:34 AM EST)

2 min read

Source: Dev.to

Source: Dev.to

The solution

Test AI Models lets you paste your actual prompt and see quality, speed, and cost across nine LLM models side‑by‑side in about 30 seconds.

No API keys required.
Benchmarks are focused on real‑world use cases (e.g., debugging code) rather than generic tasks like “write a poem.”
You can test your production prompts to discover which model truly wins for your specific needs.

Built the first version in under a week for a Bubble/Contra hackathon (4 LLMs, basic comparison).
Won “Best Use of AI” and a $5 K award.
Noticed developers endlessly debating “ChatGPT vs Claude” on Reddit without concrete evidence—this highlighted a broader problem.

Launched BETA on Feb 24 2026 (Product Hunt).
Pricing: 50 free test model selections, then $9 /mo + API credits (1:1, no markup).

Tested alternatives on a tiny feature and switched to Grok, saving $45 / year.

Lesson: “Use the cheapest model” isn’t reliable if it fails to meet requirements. You need to test with your actual workload.

Roadmap decisions will be driven by user feedback. Possible directions:

Try it at testaimodels.com – run a single test with your prompt and share any issues or confusing aspects.

I’m building this in public; every decision is shaped by early users.

Is $9 /mo too expensive for indie developers? (API credits are 1:1, no markup)
Is the “test selections” pricing confusing? (50 selections = 5–25 full tests depending on the number of models compared)
Which modality matters most after text? (Image, audio, video?)

Drop a comment—I read and reply to everything.