[Paper] Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations
Over the years, automatic MT metrics have hillclimbed benchmarks and presented strong and sometimes human-level agreement with human ratings. Yet they remain bl...