Databricks' OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs

Published: (December 9, 2025 at 11:00 AM EST)
1 min read

Source: VentureBeat

AI Benchmark Landscape

There is no shortage of AI benchmarks in the market today, with popular options like Humanity’s Last Exam (HLE), ARC‑AGI‑2 and GDPval, among numerous others. AI agents excel at solving abstract math problems and passing PhD‑level exams that most benchmarks are based on, but Databricks has a question…

Back to Blog

Related posts

Read more »