[Paper] End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent w...
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent w...
Evaluations of image compression performance which include human preferences have generally found that naive distortion functions such as MSE are insufficiently...
We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science, designed and reviewed by experts, including CS PhDs an...
The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors...
Prevailing Vision-Language-Action Models (VLAs) for robotic manipulation are built upon vision-language backbones pretrained on large-scale, but disconnected st...
Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for app...
Future AI agents might run autonomously with elevated privileges. If these agents are misaligned, they might abuse these privileges to cause serious damage. The...
Reinforcement learning has become essential for strengthening the reasoning abilities of large language models, yet current exploration mechanisms remain fundam...
This paper presents a unified framework, for the detection, classification, and preliminary localization of anomalies in water distribution networks using multi...
Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. ...
This paper proposes a training data augmentation pipeline that combines synthetic image data with neural style transfer in order to address the vulnerability of...
Large language model (LLM) activations are notoriously difficult to understand, with most existing techniques using complex, specialized methods for interpretin...