I built an AI model comparison tool after 12 hours wasted on LLM integration in project. Launching on Product Hunt today.

Published: (February 24, 2026 at 09:34 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

The solution

Test AI Models lets you paste your actual prompt and see quality, speed, and cost across nine LLM models side‑by‑side in about 30 seconds.

  • No API keys required.
  • Benchmarks are focused on real‑world use cases (e.g., debugging code) rather than generic tasks like “write a poem.”
  • You can test your production prompts to discover which model truly wins for your specific needs.

How it started

  • Built the first version in under a week for a Bubble/Contra hackathon (4 LLMs, basic comparison).
  • Won “Best Use of AI” and a $5 K award.
  • Noticed developers endlessly debating “ChatGPT vs Claude” on Reddit without concrete evidence—this highlighted a broader problem.

Current status

  • Launched BETA on Feb 24 2026 (Product Hunt).
  • Pricing: 50 free test model selections, then $9 /mo + API credits (1:1, no markup).

Real‑world example

  • Tested alternatives on a tiny feature and switched to Grok, saving $45 / year.

Lesson: “Use the cheapest model” isn’t reliable if it fails to meet requirements. You need to test with your actual workload.

Tech stack

  • Platform: Bubble.io (no‑code)

What’s next

Roadmap decisions will be driven by user feedback. Possible directions:

  • A) Add sub‑models (e.g., GPT‑4o vs GPT‑4o‑mini, Claude Opus vs Sonnet vs Haiku)
  • B) … (other ideas can be suggested)

The ask

Try it at testaimodels.com – run a single test with your prompt and share any issues or confusing aspects.

  • Feedback needed: What’s missing? What would make the tool 10× more useful?
  • Roadmap input: Choose A, B, C, or D (or propose a new feature).

I’m building this in public; every decision is shaped by early users.

Questions I have

  1. Is $9 /mo too expensive for indie developers? (API credits are 1:1, no markup)
  2. Is the “test selections” pricing confusing? (50 selections = 5–25 full tests depending on the number of models compared)
  3. Which modality matters most after text? (Image, audio, video?)

Drop a comment—I read and reply to everything.

0 views
Back to Blog

Related posts

Read more »

How Claude Code Claude Codes

On The Vergecast: How vibe coding took off, how carefully you should guard your email, and how soon you should upgrade your phone. Claude Code is a developer to...