[Paper] ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning
Code LLMs still struggle with code execution reasoning, especially in smaller models. Existing methods rely on supervised fine-tuning (SFT) with teacher-generat...