[Paper] In Pursuit of Pixel Supervision for Visual Pre-training
At the most basic level, pixels are the source of the visual information through which we perceive the world. Pixels contain information at all levels, ranging ...
At the most basic level, pixels are the source of the visual information through which we perceive the world. Pixels contain information at all levels, ranging ...
In recent multimodal research, the diffusion paradigm has emerged as a promising alternative to the autoregressive paradigm (AR), owing to its unique decoding a...
Interpreting the internal activations of neural networks can produce more faithful explanations of their behavior, but is difficult due to the complex structure...
We present Gaussian Pixel Codec Avatars (GPiCA), photorealistic head avatars that can be generated from multi-view images and efficiently rendered on mobile dev...
This paper proposes a dual-engine AI architectural method designed to address the complex problem of exploring potential trajectories in the evolution of art. W...
Foundation models are vital tools in various Computer Vision applications. They take as input a single RGB image and output a deep feature representation that i...
Active Speaker Detection (ASD) aims to identify who is currently speaking in each frame of a video. Most state-of-the-art approaches rely on late fusion to comb...
In a mathematical model of interacting biological organisms, where external interventions may alter behavior over time, traditional models that assume fixed par...
Early-Exit (EE) is a Large Language Model (LLM) architecture that accelerates inference by allowing easier tokens to be generated using only a subset of the mod...
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent w...
Evaluations of image compression performance which include human preferences have generally found that naive distortion functions such as MSE are insufficiently...
We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science, designed and reviewed by experts, including CS PhDs an...