The leaderboard “you can’t game,” funded by the companies it ranks

Published: 1 month ago (March 18, 2026 at 12:30 PM EDT)

2 min read

Source: TechCrunch

Overview

Artificial intelligence models are multiplying fast, and competition is stiff. With so many players crowding the space, which one will be the best — and who decides that? Arena, formerly LM Arena, has emerged as the de facto public leaderboard for frontier LLMs, influencing funding, launches, and PR cycles. In just seven months, the startup went from a UC Berkeley PhD research project to being valued at $1.7 billion.

Interview with Arena Co‑founders

Equity host Rebecca Bellan catches up with Arena co‑founders Anastasios Angelopoulos and Wei‑Lin Chiang about how their platform became the go‑to leaderboard for frontier AI models, and how they’re trying to build a neutral benchmark even as companies like OpenAI, Google, and Anthropic back the project.

How Arena Works

Harder to game than static benchmarks – Arena’s dynamic evaluation makes it difficult for participants to over‑optimize for a fixed test set.
Structural neutrality – The team explains what “structural neutrality” means in practice and how it helps keep the leaderboard unbiased.
Current performance leaders – Claude is presently topping expert leaderboards in legal and medical use cases.
Product expansion – Arena is moving beyond chat to benchmark agents, coding, and real‑world tasks with a new enterprise product.

The leaderboard “you can’t game,” funded by the companies it ranks

Overview

Interview with Arena Co‑founders

How Arena Works

Related posts

Anthropic is giving Claude the ability to use your Mac for you

Chat GPT 5.2 cannot explain the German word 'geschniegelt'

I tuned Hindsight for long conversations

The 5 LLM Architecture Patterns That Scale (And 2 That Do Not)