[Paper] Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls
Large Language Models are increasingly deployed as judges (LaaJ) in code generation pipelines. While attractive for scalability, LaaJs tend to overlook domain s...