evaluation

1周前 · ai

无任务的 LLM 智能测试

请提供您希望翻译的具体摘录或摘要文本，我才能为您进行简体中文翻译。

#LLM #intelligence testing #evaluation #benchmark #language models
0个月前 · ai

如何在臃肿的 RAG 流水线上进行评估

在数据集和模型之间比较指标这篇题为《How to Do Evals on a Bloated RAG Pipeline》的文章首次发表于 Towards Data Science....

#RAG #retrieval-augmented generation #evaluation #model metrics #datasets #LLM #pipeline optimization #NLP
1个月前 · ai

在生产环境中构建 RAG 系统的六个经验教训

在生产环境中的 RAG 系统的数据质量、检索设计和评估的最佳实践该帖子《构建生产 RAG 系统的六个经验教训》...

#retrieval-augmented generation #RAG #production systems #data quality #evaluation
1个月前 · ai

为什么 AI Alignment 从更好的评估开始

你无法对未评估的事物进行对齐。文章《Why AI Alignment Starts With Better Evaluation》首次发表于 Towards Data Science....

#AI alignment #evaluation #AI safety #machine learning #LLM