I built an AI model comparison tool after 12 hours wasted on LLM integration in project. Launching on Product Hunt today.

Published: (February 24, 2026 at 09:34 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

The solution

Test AI Models lets you paste your actual prompt and see quality, speed, and cost across nine LLM models side‑by‑side in about 30 seconds.

  • No API keys required.
  • Benchmarks are focused on real‑world use cases (e.g., debugging code) rather than generic tasks like “write a poem.”
  • You can test your production prompts to discover which model truly wins for your specific needs.

How it started

  • Built the first version in under a week for a Bubble/Contra hackathon (4 LLMs, basic comparison).
  • Won “Best Use of AI” and a $5 K award.
  • Noticed developers endlessly debating “ChatGPT vs Claude” on Reddit without concrete evidence—this highlighted a broader problem.

Current status

  • Launched BETA on Feb 24 2026 (Product Hunt).
  • Pricing: 50 free test model selections, then $9 /mo + API credits (1:1, no markup).

Real‑world example

  • Tested alternatives on a tiny feature and switched to Grok, saving $45 / year.

Lesson: “Use the cheapest model” isn’t reliable if it fails to meet requirements. You need to test with your actual workload.

Tech stack

  • Platform: Bubble.io (no‑code)

What’s next

Roadmap decisions will be driven by user feedback. Possible directions:

  • A) Add sub‑models (e.g., GPT‑4o vs GPT‑4o‑mini, Claude Opus vs Sonnet vs Haiku)
  • B) … (other ideas can be suggested)

The ask

Try it at testaimodels.com – run a single test with your prompt and share any issues or confusing aspects.

  • Feedback needed: What’s missing? What would make the tool 10× more useful?
  • Roadmap input: Choose A, B, C, or D (or propose a new feature).

I’m building this in public; every decision is shaped by early users.

Questions I have

  1. Is $9 /mo too expensive for indie developers? (API credits are 1:1, no markup)
  2. Is the “test selections” pricing confusing? (50 selections = 5–25 full tests depending on the number of models compared)
  3. Which modality matters most after text? (Image, audio, video?)

Drop a comment—I read and reply to everything.

0 views
Back to Blog

Related posts

Read more »

DevOps and Vibe Coding: A Journey

Things to Do Map Your Application - Map your application on paper, in a spreadsheet, or using graphics/flowcharts. This is the first step. - Understanding the...

OpenAI just raised $110 billion. Wow

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as we...