Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
Source: Hacker News
Results
- Scored 65.2% vs Google’s official 47.8%, and the existing top closed‑source model Junie CLI’s 64.3%.
Clarifications
- No
{agents/skills}.mdfiles were inserted at any point. No cheating mechanisms whatsoever. - The CLI agent was run in a leaderboard‑compliant way (no modification of resources or timeouts).
- The full TerminalBench run was done using the fully open‑source version of the agent; there is no difference between what is on GitHub and what was run.
Context
I was originally going to wait for it to appear on the leaderboard, but after 8 days without a response from the maintainers (a large backlog of pull requests on their Hugging Face repo), I decided to post anyway.
References
- Hugging Face PR:
- Cheating reports on TerminalBench 2.0:
- Hacker News discussion:
Points: 101 Comments: 32