AI safety — Page 16 | EUNO.NEWS

Sort:

3 months ago · ai · - · -

[Paper] EvilGenie: A Reward Hacking Benchmark

We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents ...

#reward hacking #code generation #benchmark #LLM evaluation #AI safety
3 months ago · ai · - · -

[Paper] A Unified Understanding of Offline Data Selection and Online Self-refining Generation for Post-training LLMs

Offline data selection and online self-refining generation, which enhance the data quality, are crucial steps in adapting large language models (LLMs) to specif...

#LLM fine-tuning #bilevel optimization #data selection #self-refining generation #AI safety

Newer posts