· ai
Why AI safety should be enforced structurally, not trained in
Most current AI safety work assumes an unsafe system and tries to train better behavior into it. - We add more data. - We add more constraints. - We add more fi...
Most current AI safety work assumes an unsafe system and tries to train better behavior into it. - We add more data. - We add more constraints. - We add more fi...
We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents ...