EUNO.NEWS EUNO.NEWS
  • All (2364) +206
  • AI (546) +17
  • DevOps (142) +2
  • Software (996) +129
  • IT (675) +57
  • Education (5) +1
  • Notice
  • All (2364) +206
    • AI (546) +17
    • DevOps (142) +2
    • Software (996) +129
    • IT (675) +57
    • Education (5) +1
  • Notice
  • All (2364) +206
  • AI (546) +17
  • DevOps (142) +2
  • Software (996) +129
  • IT (675) +57
  • Education (5) +1
  • Notice
Sources Tags Search
한국어 English 中文
  • 1周前 · ai

    【论文】EvilGenie:奖励劫持基准

    我们介绍 EvilGenie,一个用于编程环境中 reward hacking 的基准。我们从 LiveCodeBench 获取问题,并创建一个环境,使得 agents …

    #reward hacking #code generation #benchmark #LLM evaluation #AI safety
EUNO.NEWS
RSS GitHub © 2025