EUNO.NEWS EUNO.NEWS
  • All (2544) +222
  • AI (576) +17
  • DevOps (149) +1
  • Software (1083) +148
  • IT (730) +55
  • Education (6) +1
  • Notice
  • All (2544) +222
    • AI (576) +17
    • DevOps (149) +1
    • Software (1083) +148
    • IT (730) +55
    • Education (6) +1
  • Notice
  • All (2544) +222
  • AI (576) +17
  • DevOps (149) +1
  • Software (1083) +148
  • IT (730) +55
  • Education (6) +1
  • Notice
Sources Tags Search
한국어 English 中文
  • 1 week ago · ai

    [Paper] EvilGenie: A Reward Hacking Benchmark

    We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents ...

    #reward hacking #code generation #benchmark #LLM evaluation #AI safety
EUNO.NEWS
RSS GitHub © 2025