Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

Published: (April 27, 2026 at 08:35 AM EDT)
1 min read

Source: Hacker News

Results

  • Scored 65.2% vs Google’s official 47.8%, and the existing top closed‑source model Junie CLI’s 64.3%.

Clarifications

  1. No {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever.
  2. The CLI agent was run in a leaderboard‑compliant way (no modification of resources or timeouts).
  3. The full TerminalBench run was done using the fully open‑source version of the agent; there is no difference between what is on GitHub and what was run.

Context

I was originally going to wait for it to appear on the leaderboard, but after 8 days without a response from the maintainers (a large backlog of pull requests on their Hugging Face repo), I decided to post anyway.

References

  • Hugging Face PR:
  • Cheating reports on TerminalBench 2.0:
  • Hacker News discussion:

Points: 101 Comments: 32

0 views
Back to Blog

Related posts

Read more »