[Paper] How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation
The use of Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But their reliabili...