TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference
Source: VentureBeat
Overview
Researchers from Stanford, Nvidia, and Together AI have developed a new technique that can discover new solutions to very complex problems. For example, they managed to optimize a critical GPU kernel to run 2× faster than the previous state‑of‑the‑art written by human experts. Their technique, called TTT‑Discover, leverages a combination of transformer‑based models and reinforcement learning to explore the space of possible kernel implementations automatically.
(The remainder of the article continues with details on the methodology, experimental results, and potential implications for future hardware‑software co‑design.)