[Paper] Do Generalisation Results Generalise?
A large language model's (LLM's) out-of-distribution (OOD) generalisation ability is crucial to its deployment. Previous work assessing LLMs' generalisation per...
5856 posts from this source
A large language model's (LLM's) out-of-distribution (OOD) generalisation ability is crucial to its deployment. Previous work assessing LLMs' generalisation per...
Recent video generation models demonstrate impressive synthesis capabilities but remain limited by single-modality conditioning, constraining their holistic wor...
Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, ...
This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web e...
The escalating sophistication and variety of cyber threats have rendered static honeypots inadequate, necessitating adaptive, intelligence-driven deception. In ...
The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based ...
Crowdsourcing platforms, such as Stack Overflow, have changed and impacted the software development practice. In these platforms, developers share and reuse the...
Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively ...
We present a novel graph-based learning of EEG representations with gradient alignment (GEEGA) that leverages multi-domain information to learn EEG representati...
Why do modern language models, trained to do well on next-word prediction, appear to generate coherent documents and capture long-range structure? Here we show ...
Large language models for code (LLM4Code) have greatly improved developer productivity but also raise privacy concerns due to their reliance on open-source repo...
Future AI systems could conceal their capabilities ('sandbagging') during evaluations, potentially misleading developers and auditors. We stress-tested sandbagg...
Qubit readout is a critical operation in quantum computing systems, which maps the analog response of qubits into discrete classical states. Deep neural network...
Embedding a language field in a 3D representation enables richer semantic understanding of spatial environments by linking geometry with descriptive meaning. Th...
We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds...
We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two famil...
Storytelling in real-world videos often unfolds through multiple shots -- discontinuous yet semantically connected clips that together convey a coherent narrati...
LLM-based agents are rapidly being plugged into expert decision-support, yet in messy, high-stakes settings they rarely make the team smarter: human-AI teams of...
Carbon-aware schedulers aim to reduce the operational carbon footprint of data centers by running flexible workloads during periods of low carbon intensity. Mos...
We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today's large language models (LLMs). We desc...
Large language models (LLMs) are increasingly deployed in settings where reasoning, such as multi-step problem solving and chain-of-thought, is essential. Yet, ...
Stream processing is a computing paradigm that supports real-time data processing for a wide variety of applications. At Meta, it's used across the company for ...
Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training tr...
Most visual generative models compress images into a latent space before applying diffusion or autoregressive modelling. Yet, existing approaches such as VAEs a...
Leveraging a dataset of paired narratives, we investigate the extent to which large language models (LLMs) can reliably separate incoherent and coherent stories...
Many operational cloud systems use one or more machine learning models that help them achieve better efficiency and performance. But operators do not have tools...
In pre-market drug safety review, grouping related adverse event terms into standardised MedDRA queries or the FDA Office of New Drugs Custom Medical Queries (O...
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language understanding tasks. While these models often produce ling...
Online incivility has emerged as a widespread and persistent problem in digital communities, imposing substantial social and psychological burdens on users. Alt...
Large Language Models (LLMs) have demonstrated remarkable performance in code intelligence tasks such as code generation, summarization, and translation. Howeve...
As AI agents built on large language models (LLMs) become increasingly embedded in society, issues of coordination, control, delegation, and accountability are ...
Spiking neural networks excel at event-driven sensing yet maintaining task-relevant context over long timescales. However building these networks in hardware re...
Network topology is critical for efficient parameter synchronization in distributed learning over networks. However, most existing studies do not account for ba...
The rapid development of autonomous vehicles has led to a surge in testing demand. Traditional testing methods, such as virtual simulation, closed-course, and p...
Automatically synthesizing verifiable code from natural language requirements ensures software correctness and reliability while significantly lowering the barr...
We investigate how large language models (LLMs) fail when operating as autonomous agents with tool-use capabilities. Using the Kamiwaza Agentic Merit Index (KAM...
DreamerV3 is a state-of-the-art online model-based reinforcement learning (MBRL) algorithm known for remarkable sample efficiency. Concurrently, Kolmogorov-Arno...
Combinations of active automata learning, model-based testing and model checking have been successfully used in numerous applications, e.g., for spotting bugs i...
Despite the effectiveness of large language models (LLMs) for code generation, they often output incorrect code. One reason is that model output probabilities a...
Otus is a high-performance computing cluster that was launched in 2025 and is operated by the Paderborn Center for Parallel Computing (PC2) at Paderborn Univers...
We consider the problem of recovering an unknown low-dimensional vector from noisy, underdetermined observations. We focus on the Generalized Projected Gradient...
The damage caused by cybercrime makes the development of secure software inevitable. Although many tools and frameworks exist to support the development of secu...
Analog computing at the edge is an emerging strategy to limit data storage and transmission requirements, as well as energy consumption, and its practical imple...
Video diffusion models (VDMs) perform attention computation over the 3D spatio-temporal domain. Compared to large language models (LLMs) processing 1D sequences...
Vision-language models (VLMs) have demonstrated impressive multimodal comprehension capabilities and are being deployed in an increasing number of online video ...
The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicat...
Process mining traditionally assumes centralized event data collection and analysis. However, modern Industrial Internet of Things systems increasingly operate ...
While surrogate backpropagation proves useful for training deep spiking neural networks (SNNs), incorporating biologically inspired local signals on a large sca...