[Paper] Theoretical Foundations of GPU-Native Compilation for Rapid Code Iteration
Current AI code generation systems suffer from significant latency bottlenecks due to CPU-GPU data transfers during compilation, execution, and testing phases. ...
Current AI code generation systems suffer from significant latency bottlenecks due to CPU-GPU data transfers during compilation, execution, and testing phases. ...
BBVA is expanding its work with OpenAI through a multi-year AI transformation program, rolling out ChatGPT Enterprise to all 120,000 employees. Together, the co...
We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without expl...
Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physicall...
The success of foundation models in language and vision motivated research in fully end-to-end robot navigation foundation models (NFMs). NFMs directly map mono...
Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However...
We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation pri...
Normalizing Flows (NFs) have been established as a principled framework for generative modeling. Standard NFs consist of a forward process and a reverse process...
In this work, we explore an untapped signal in diffusion model inference. While all previous methods generate images independently at inference, we instead ask ...
The success of modern machine learning hinges on access to high-quality training data. In many real-world scenarios, such as acquiring data from public reposito...
Self-supervised pre-training has revolutionized foundation models for languages, individual 2D images and videos, but remains largely unexplored for learning 3D...
Reinforcement learning (RL), earlier proven to be effective in large language and multi-modal models, has been successfully extended to enhance 2D image generat...