[Paper] Many Minds from One Model: Bayesian Transformers for Population Intelligence
Despite their scale and success, modern transformers are almost universally trained as single-minded systems: optimization produces one deterministic set of par...
Despite their scale and success, modern transformers are almost universally trained as single-minded systems: optimization produces one deterministic set of par...
The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different a...
Modern ML training and inference now span tens to tens of thousands of GPUs, where network faults can waste 10--15% of GPU hours due to slow recovery. Common ne...
This study presents a conceptual framework and a prototype assessment for Large Language Model (LLM)-based Building Energy Management System (BEMS) AI agents to...
Retrieval-augmented generation (RAG) is highly sensitive to the quality of selected context, yet standard top-k retrieval often returns redundant or near-duplic...
Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail even under minor distribution shift. This failure mode stem...
Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens. Yet, by relying primarily on surface-level co-occ...
Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over ...
The aim of this article is to provide a firm mathematical foundation for the application of deep gradient flow methods (DGFMs) for the solution of (high-dimensi...
Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide...
We introduce basic inequalities for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit ...
Classifying legal documents is a challenge, besides their specialized vocabulary, sometimes they can be very long. This means that feeding full documents to a T...