AI benchmarks

2周前 · ai

2026 年开发者将被问及的可持续 AI 基准

封面图：Sustainable AI Benchmarks 开发者将在 2026 年被问及 https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=aut...

#sustainable AI #AI benchmarks #model evaluation #AI ethics #carbon footprint #AI development #2026 trends
1个月前 · ai

利用 GPT-5.2 推动科学与数学

GPT-5.2 是 OpenAI 迄今为止在数学和科学领域最强大的模型，在 GPQA Diamond 和 FrontierMath 等基准上实现了新的最先进成果。本文…

#GPT-5.2 #OpenAI #math AI #scientific research #GPQA Diamond #FrontierMath #large language models #AI benchmarks
1个月前 · ai

70% 真实性上限：为何谷歌全新‘FACTS’基准是对企业 AI 的警钟

并不缺乏用于衡量特定模型在完成各种有益企业任务时的性能和准确性的生成式 AI 基准……

#AI benchmarks #factuality #enterprise AI #Google FACTS #generative AI evaluation #model accuracy
1个月前 · ai

亚马逊押注 AI 基准不重要

Rohit Prasad，亚马逊的 SVP of AGI。这是 Alex Heath 的《Sources》摘录，这是一份关于 AI 和科技行业的 newsletter，仅为 The Verge 订阅者 syndicate。

#Amazon #AI benchmarks #model evaluation #AGI #machine learning #industry perspective