evaluation metrics

2周前 · ai

机器学习 “Advent Calendar” Bonus 1：AUC in Excel

AUC 衡量模型将正例排在负例之上的能力，独立于任何选择的阈值。文章《Machine Learning “Advent Calendar” Bonus 1: AUC...》。

#AUC #machine learning #Excel #evaluation metrics #data science
2周前 · ai

曲线下的代理 (AUC)

为了了解你的 agentic solution 是否真的更好，文章《Agents Under the Curve AUC》首次发表于 Towards Data Science....

#reinforcement learning #evaluation metrics #agents #AUC #machine learning
1个月前 · ai

如何将 System prompts 用作评估的 Ground Truth

问题：缺乏明确的 ground truth 大多数团队在评估其 AI 代理时遇到困难，因为他们没有明确定义的 ground truth。典型工作流程：...

#system prompts #ground truth #AI evaluation #prompt engineering #LLM evaluation #evaluation metrics
1个月前 · ai

衡量关键：图像生成评估的客观指标

使用最先进模型生成高质量视觉内容正变得越来越容易。开源模型可以在笔记本电脑上运行，云服务将 tex...

#image generation #evaluation metrics #generative AI #computer vision #quality assessment #Pruna #P-image #AI model benchmarking