evaluation

1 week ago · ai

Task-free intelligence testing of LLMs

Article URL: https://www.marble.onl/posts/tapping/index.html Comments URL: https://news.ycombinator.com/item?id=46545587 Points: 11 Comments: 1...

#LLM #intelligence testing #evaluation #benchmark #language models
0 month ago · ai

How to Do Evals on a Bloated RAG Pipeline

Comparing metrics across datasets and models The post How to Do Evals on a Bloated RAG Pipeline appeared first on Towards Data Science....

#RAG #retrieval-augmented generation #evaluation #model metrics #datasets #LLM #pipeline optimization #NLP
1 month ago · ai

Six Lessons Learned Building RAG Systems in Production

Best practices for data quality, retrieval design, and evaluation in production RAG systems The post Six Lessons Learned Building RAG Systems in Production appe...

#retrieval-augmented generation #RAG #production systems #data quality #evaluation
1 month ago · ai

Why AI Alignment Starts With Better Evaluation

You can’t align what you don’t evaluate The post Why AI Alignment Starts With Better Evaluation appeared first on Towards Data Science....

#AI alignment #evaluation #AI safety #machine learning #LLM