[Paper] Bellman Calibration for V-Learning in Offline Reinforcement Learning
We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov ...
We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov ...
We present a method and dataset for fine-tuning language models with preference supervision using feedback-driven improvement chains. Given a model response, an...
Automatic Speech Recognition (ASR) in professional settings faces challenges that existing benchmarks underplay: dense domain terminology, formal register varia...
Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to docum...
Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web fra...
We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard a...
We present an online method for guaranteeing calibration of quantile forecasts at multiple quantile levels simultaneously. A sequence of α-level quantile foreca...
We introduce a training-efficient framework for time-series learning that combines random features with controlled differential equations (CDEs). In this approa...
Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Rec...
The primary research questions of this paper center on defining the amount of context that is necessary and/or appropriate when investigating the relationship b...
Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion sy...
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to ...