[Paper] On the Effectiveness of Training Data Optimization for LLM-based Code Generation: An Empirical Study
Large language models (LLMs) have achieved remarkable progress in code generation, largely driven by the availability of high-quality code datasets for effectiv...