[Paper] Stronger Normalization-Free Transformers
Although normalization layers have long been viewed as indispensable components of deep learning architectures, the recent introduction of Dynamic Tanh (DyT) ha...
Although normalization layers have long been viewed as indispensable components of deep learning architectures, the recent introduction of Dynamic Tanh (DyT) ha...
We establish a precise correspondence between decision-making agents in partially observable Markov decision processes (POMDPs) and one-input process functions,...
The construction of adversarial attacks for neural networks appears to be a crucial challenge for their deployment in various services. To estimate the adversar...
We present Any4D, a scalable multi-view transformer for metric-scale, dense feed-forward 4D reconstruction. Any4D directly generates per-pixel motion and geomet...
Autonomous drone navigation in confined tubular environments remains a major challenge due to the constraining geometry of the conduits, the proximity of the wa...
Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities and safety, but it also ma...
We develop a framework for learning from noisy quantum experiments, focusing on fault-tolerant devices accessing uncharacterized systems through noisy couplings...
Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrappi...
Overview Participating in the Kaggle AI Agents Intensive was a completely new and exciting experience for me. When I joined, I wasn’t fully confident about how...
Modern LLM pre-training consumes vast amounts of compute and training data, making the scaling behavior, or scaling laws, of different models a key distinguishi...
Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineerin...
Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application. Thi...