Google releases Gemini 3.1 Pro: Benchmark performance, how to try it

Published: 3 days ago (February 19, 2026 at 01:26 PM EST)

5 min read

Source: Mashable Tech

Google’s Most Advanced Reasoning Model Outperforms Claude and ChatGPT on “Humanity’s Last Exam”

By Timothy Beck Werth
headshot of Timothy Beck Werth, a handsome journalist with great hair

Tech Editor, Mashable

Timothy Beck Werth is the Tech Editor at Mashable, where he leads coverage and assignments for the Tech and Shopping verticals. He has over 15 years of experience as a journalist and editor, covering consumer technology, smart‑home gadgets, and men’s grooming and style products. Previously he was Managing Editor and then Site Director of SPY.com, a men’s product‑review site, and has written for GQ, The Daily Beast, Gear Patrol, and The Awl.

Read Full Bio

Published on February 19, 2026

Gemini 3.1 Pro banner image from Google
Credit: Google

Google released its latest core‑reasoning model, Gemini 3.1 Pro, on Thursday. According to the company, Gemini 3.1 Pro achieved twice the verified performance of Gemini 3 Pro on ARC‑AGI‑2, a benchmark that measures logical reasoning.

Google originally released Gemini 3 and Gemini 3 Pro in November, and this new version shows how quickly AI firms are iterating on their models. Gemini 3.1 Pro now powers Gemini and a suite of Google AI tools, such as Gemini 3 Deep Think. Google says the model is designed to deliver more creative solutions.

“3.1 Pro is designed for tasks where a simple answer isn’t enough, taking advanced reasoning and making it useful for your hardest challenges,” the company wrote in a Google blog post.
“This improved intelligence can help in practical applications — whether you’re looking for a clear, visual explanation of a complex topic, a way to synthesize data into a single view, or bringing a creative project to life.”

What’s New with Gemini 3.1 Pro?

Availability: Starting today, Gemini 3.1 Pro is rolling out in the Gemini app, the Gemini API, and Notebook LM.
Free tier: Free users can try 3.1 Pro in the Gemini app.
Paid tiers: Users on Google AI Pro and AI Ultra plans receive higher usage limits. Within Notebook LM, only these paid users have access for now.
Enterprise & developer access: Available via AI Studio, Antigravity, Vertex AI, Gemini Enterprise, Gemini CLI, and Android Studio.

Mashable Light Speed – Gemini 3.1 Pro was already available for Mashable editors. To try it yourself, go to the Gemini desktop app or open the Gemini mobile app.

Sample Output

Screenshot showing animation from Gemini 3 Pro

Left: Two results of the same animation prompt. Credit: Google

Right: Credit: Google

Why Gemini 3.1 Pro Matters

When Google released Gemini 3 Pro in November, the model was so impressive that it allegedly caused OpenAI CEO Sam Altman to declare a “code red.”¹ As Gemini 3 Pro surged to the top of AI leaderboards, OpenAI reportedly began losing ChatGPT users to the new competitor.

The latest core ChatGPT model, GPT‑5.2, has slipped down the rankings on leaderboards such as Arena (formerly LMArena)², losing significant ground to Google, Anthropic, and xAI.

Key Takeaways

Performance Gap: Gemini 3 Pro already outperforms GPT‑5.2 on many benchmarks.
Momentum: With a more advanced “thinking” model, Gemini could widen the lead even further.
User Shift: Early reports suggest a migration of ChatGPT users toward Gemini‑based services.

Gemini 3.1 Pro: Benchmark Performance

Google released benchmark data showing that Gemini 3.1 Pro outperforms earlier Gemini models, Claude Sonnet 4.6, Claude Opus 4.6, and GPT‑5.2. However, OpenAI’s new coding model GPT‑5.3‑Codex beat Gemini 3.1 Pro on the verified SWE‑Bench Pro benchmark, according to Google itself.

Notable Highlights

44.4 % on Humanity’s Last Exam – vs. 40.0 % (Claude Opus 4.6) and 34.5 % (GPT‑5.2)
77.1 % on ARC‑AGI‑2 – vs. 31.1 % (Gemini 3 Pro), 68.8 % (Claude Opus 4.6) and 52.9 % (GPT‑5.2)
94.3 % on GPQA Diamond – vs. 91.9 % (Gemini 3 Pro), 91.3 % (Claude Opus 4.6) and 92.4 % (GPT‑5.2)
80.6 % on SWE‑Bench Verified – vs. 76.2 % (Gemini 3 Pro), 80.8 % (Claude Opus 4.6) and 80.0 % (GPT‑5.2)
54.2 % on SWE‑Bench Pro (Public) – vs. 43.3 % (Gemini 3 Pro), 55.6 % (GPT‑5.2) and 56.8 % (GPT‑5.3‑Codex)
92.6 % on MMLU – vs. 91.1 % (Claude Opus 4.6) and 89.6 % (GPT‑5.2)

Google also shared an image of the full benchmark results for Gemini 3.1 Pro (the original tweet is no longer available).

Disclosure: In April 2025, Ziff Davis (Mashable’s parent company) filed a lawsuit against OpenAI, alleging that OpenAI infringed Ziff Davis copyrights in training and operating its AI systems.

Author

Timothy Beck Werth – Tech Editor at Mashable

Timothy Beck Werth – Tech Editor, Mashable
Tim leads coverage and assignments for the Tech and Shopping verticals. He has 15+ years of experience covering consumer technology, smart‑home gadgets, and men’s grooming and style products. Formerly Managing Editor and Site Director of SPY.com, he has also written for GQ, The Daily Beast, Gear Patrol, and The Awl.

Education: Print journalism, University of Southern California
Residences: Brooklyn, NY & Charleston, SC
Current project: A science‑fiction novel (second book)

Mashable branding image

These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Mashable, “OpenAI code red reaction to Google Gemini 3,” https://mashable.com/article/openai-code-red-reaction-to-google-gemini-3 ↩
Arena AI Leaderboard, https://arena.ai/leaderboard ↩

Google releases Gemini 3.1 Pro: Benchmark performance, how to try it

Google’s Most Advanced Reasoning Model Outperforms Claude and ChatGPT on “Humanity’s Last Exam”

You May Also Like