[Paper] From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows
Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning...
5752 posts from this source
Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning...
Non-Markovian dynamics are commonly found in real-world environments due to long-range dependencies, partial observability, and memory effects. The Bellman equa...
The classification performance of deep neural networks relies strongly on access to large, accurately annotated datasets. In medical imaging, however, obtaining...
Realistic sound propagation is essential for immersion in a virtual scene, yet physically accurate wave-based simulations remain computationally prohibitive for...
Grassroots Logic Programs (GLP) is a concurrent logic programming language with variables partitioned into paired readers and writers, conjuring both linear log...
Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on 'world models' -- ca...
Hallucinations in large language models remain a persistent challenge, particularly in multilingual and generative settings where factual consistency is difficu...
Vision capabilities in vision large language models (VLLMs) have consistently lagged behind their linguistic capabilities. In particular, numerous benchmark stu...
Fully unsupervised segmentation pipelines naively seek the most salient object, should this be present. As a result, most of the methods reported in the literat...
Bayesian optimal experimental design (BOED) seeks to maximize the expected information gain (EIG) of experiments. This requires a likelihood estimate, which in ...
Multimodal Diffusion Transformers (MMDiTs) for text-to-image generation maintain separate text and image branches, with bidirectional information flow between t...
The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. Ho...
While large-scale text-to-image diffusion models continue to improve in visual quality, their increasing scale has widened the gap between state-of-the-art mode...
Large Language Models (LLMs) often generate code with subtle but critical bugs, especially for complex tasks. Existing automated repair methods typically rely o...
Instructional video editing applies edits to an input video using only text prompts, enabling intuitive natural-language control. Despite rapid progress, most m...
We study a persistent failure mode in multi-objective alignment for large language models (LLMs): training improves performance on only a subset of objectives w...
Multi-turn jailbreaks capture the real threat model for safety-aligned chatbots, where single-turn attacks are merely a special case. Yet existing approaches br...
A central question in cognitive science is whether conceptual representations converge onto a shared manifold to support generalization, or diverge into orthogo...
Ensuring software quality in embedded firmware is critical, especially in safety-critical domains where compliance with functional safety standards (ISO 26262) ...
Ambiguity poses persistent challenges in natural language understanding for large language models (LLMs). To better understand how lexical ambiguity can be reso...
ISAC enables pervasive monitoring, but modern sensing algorithms are often too complex for energy-constrained edge devices. This motivates the development of le...
Recent progress in ML and LLMs has improved vulnerability detection, and recent datasets have reduced label noise and unrelated code changes. However, most exis...
Structural bias (SB) refers to systematic preferences of an optimisation algorithm for particular regions of the search space that arise independently of the ob...
CI/CD pipeline failure management is time-consuming when performed manually. Automating this process is non-trivial because the information required for effecti...
Fixpoint iteration constitutes the algorithmic core of static analyzers. Parallelizing the fixpoint engine can significantly reduce analysis times. Previous app...
Summarizing source code into natural language descriptions (code summarization) helps developers better understand program functionality and reduce the burden o...
Over the last years, Ethereum has evolved into a public platform that safeguards the savings of hundreds of millions of people and secures more than $650 billio...
Addressing real-world optimization challenges requires not only advanced metaheuristics but also continuous refinement of their internal mechanisms. This paper ...
This paper presents a principled framework for designing energy-aware metaheuristics that operate under fixed energy budgets. We introduce a unified operator-le...
Software development agents powered by large language models (LLMs) have shown great promise in automating tasks like environment setup, issue solving, and prog...
Centralized training is the standard paradigm in deep learning, enabling models to learn from a unified dataset in a single location. In such setup, isotropic f...
We present a framework for dynamic management of structured parallel processing skeletons on serverless platforms. Our goal is to bring HPC-like performance and...
In LLM serving, reusing the KV cache of prompts across requests is critical for reducing TTFT and serving costs. Cache-affinity scheduling, which co-locates req...
Training billion-parameter models requires distributing model states across GPUs using fully sharded data parallel (i.e., ZeRO-3). While ZeRO-3 succeeds on clus...
In Federated Learning (FL), multiple parties collaboratively train a shared Machine Learning model to encapsulate all private knowledge without exchange of info...
Insect vision supports complex behaviors including associative learning, navigation, and object detection, and has long motivated computational models for under...
Since most countries are coming up with online privacy regulations, such as GDPR in the EU, online publishers need to find a balance between revenue from target...
Understanding TTPs (Tactics, Techniques, and Procedures) in malware binaries is essential for security analysis and threat intelligence, yet remains challenging...
With the rapid rise of AI coding agents, the fundamental premise of what it means to be a software engineer is in question. In this vision paper, we re-examine ...
Mobile applications in large-scale distributed systems are susceptible to backend service failures, yet traditional chaos engineering approaches cannot scale mo...
The Moore-Penrose Pseudo-inverse (PInv) serves as the fundamental solution for linear systems. In this paper, we propose a natural generalization of PInv to the...
Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment but remains challenging due to catastrophic forge...
Multi-image spatial reasoning remains challenging for current multimodal large language models (MLLMs). While single-view perception is inherently 2D, reasoning...
Multi-agent systems built from prompted large language models can improve multi-round reasoning, yet most existing pipelines rely on fixed, trajectory-wide comm...
Multimodal Large Language Models (MLLMs) have made remarkable progress in multimodal perception and reasoning by bridging vision and language. However, most exi...
To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, ...
Recent progress in spatial reasoning with Multimodal Large Language Models (MLLMs) increasingly leverages geometric priors from 3D encoders. However, most exist...
Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GP...