[Paper] MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communi...