The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI

Published: 1 month ago (December 10, 2025 at 06:00 PM EST)

1 min read

Source: VentureBeat

Overview

There’s no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction following to agentic web browsing and tool use. But many of these benchmarks have one major shortcoming: they …

Back to Blog

BNY builds “AI for everyone, everywhere” with OpenAI

BNY is using OpenAI technology to expand AI adoption enterprise-wide. Through its Eliza platform, 20,000+ employees are building AI agents that enhance efficien...

OpenAI's GPT-5.2 is here: what enterprises need to know

The rumors were true, and the 'Code Red' is over.: OpenAI today announced the release of its new frontier large language model LLM family: GPT-5.2. It comes at...

Advancing science and math with GPT-5.2

GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post s...

Why your AI assistant lies to you (and how to fix it)

You ask your AI assistant a simple history question about the 184th president of the United States. The model does not hesitate or pause to consider that there...

Overview

Related posts

BNY builds “AI for everyone, everywhere” with OpenAI

OpenAI's GPT-5.2 is here: what enterprises need to know

Advancing science and math with GPT-5.2

Why your AI assistant lies to you (and how to fix it)