Databricks' OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs
Source: VentureBeat
AI Benchmark Landscape
There is no shortage of AI benchmarks in the market today, with popular options like Humanity’s Last Exam (HLE), ARC‑AGI‑2 and GDPval, among numerous others. AI agents excel at solving abstract math problems and passing PhD‑level exams that most benchmarks are based on, but Databricks has a question…