[Paper] Shields to Guarantee Probabilistic Safety in MDPs
Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes...
Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes...
Open-world object counting remains brittle: despite rapid advances in vision-language models (VLMs), reliably counting the objects a user intends is far from so...
Recent GPU generations deliver significantly higher FLOPs using lower-precision arithmetic, such as FP8. While successfully applied to large language models (LL...
Cross-domain few-shot medical image segmentation (CD-FSMIS) requires a model to generalise simultaneously to novel anatomical categories and unseen imaging doma...
Automated question answering (QA) over electronic health records (EHRs) demands precise evidence retrieval, faithful answer generation, and explicit grounding o...
Recent advances in machine learning and large-scale biological data collections have revived the prospect of building a virtual cell, a computational model of c...
Efficient LLM inference research has largely focused on reducing the cost of each decoding step (e.g., using quantization, pruning, or sparse attention), typica...
Recovering editable CAD programs from images or 3D observations is central to AI-assisted design, but progress is difficult to measure because existing evaluati...
Industrial Computer-Aided Design (CAD) code generation requires models to produce executable parametric programs from visual or textual inputs. Beyond recognizi...
Although Large Language Models (LLMs) have made remarkable progress, current preference optimization methods still struggle to align directional consistency whi...
This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in d...
Current LLM agents are proficient at calling isolated APIs but struggle with the 'last mile' of commercial software automation. In real-world scenarios, tools a...
Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either due to a lack of caus...
Spiking Neural Networks (SNNs) can reduce energy consumption compared to conventional Artificial Neural Networks (ANNs) when spiking activity is sparse and the ...
Rejection Fine-Tuning (RFT) is a standard method for training LLM agents, where unsuccessful trajectories are discarded from the training set. In the context of...
The integration of Artificial Intelligence (AI) with Distributed Ledger Technology (DLT) has become a growing research area, yet contributions tend to cluster a...
We propose a game-theoretic framework for adaptive multi-agent intelligent systems. Unlike classical game theory, which often treats strategies as primitive obj...
Neural networks have proved an effective means of learning control policies for autonomous systems, but these learned policies are difficult to understand due t...
Agentic artificial intelligence (AI) is a natural fit for Internet of Things (IoT) and edge systems, but edge deployments are often constrained to models around...
Existing Meta-Black-Box Optimization (MetaBBO) methods focus on how to search when controlling optimizers, but largely overlook where to search. We propose Meta...
Adaptive behavior requires the brain to transition between distinct contexts while maintaining representations of prior experience. The ability to reconfigure n...
A core challenge in program synthesis is online library learning: the incremental acquisition of reusable abstractions under uncertainty about future task deman...
Millimeter-wave (mmWave) sensing enables privacy-preserving, always-on edge perception, but its measurements are often sparse, temporally irregular, and corrupt...
“You are entering the world at an extraordinary moment,” NVIDIA founder and CEO Jensen Huang told graduates as he delivered the keynote address at Carnegie Mell...
Large Language Models exhibit mode collapse, producing homogeneous outputs that fail to explore valid solution spaces. We present QD-LLM, a framework for parame...
Gradient-based preference optimization methods for large language model (LLM) alignment suffer from preference collapse, converging to narrow behavioral modes w...
Spike-based encodings are sparse and energy-efficient, but have largely been formulated probabilistically, disconnected from most signal processing literature. ...
Back to Articleshttps://huggingface.co/blog !https://huggingface.co/avatars/021630067522a1a4af2122a1c1fbb50e.svghttps://huggingface.co/sarmaddev - The Problem...
AI agents choose tools from shared registries by matching natural-language descriptions. But no human is verifying whether those descriptions are true. I discov...
any time in the data engineering world, you’ve likely encountered this debate at least once. Maybe twice. Ok, probably a dozen times😉 “Should we process our da...
Back to Articleshttps://huggingface.co/blog !https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/7FIiXrNrKPcFgx1O79h-q.jpeghttps://huggingface.co/M...
Here is a scenario that should concern every enterprise architect shipping autonomous AI systems right now: An observability agent is running in production. Its...
!Electricity transmission towershttps://cdn.mos.cms.futurecdn.net/KHUetaXQbsmm6z9m5g5Pne.jpg Image credit: Shutterstock Complaint to FERC The Maryland Office of...
Background Voice agents have been expensive to run and painful to orchestrate, not because the models can't handle conversation, but because context ceilings f...
The Rise and Fallout of AI Data Centers Massive new data centers are the physical foundation for tech companies’ hopes and dreams for AI. But the rush to expan...
The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely un...
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. H...
Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...
Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide reliable cov...
Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subject...
Conformal prediction (CP) provides a distribution-free approach to uncertainty quantification with finite-sample guarantees. However, applying CP to graph neura...
Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementa...
L_1-Approximating polynomials, i.e., polynomials that approximate indicator functions in L_1-norm under certain distributions, are widely used in computational ...
A standard technique for scaling inference-time reasoning is Self-Consistency, whereby multiple candidate answers are sampled from an LLM and the most common an...
Spatial intelligence in vision-language models (VLMs) attracts research interest with the practical demand to reason in the 3D world.Despite promising results, ...
Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued r...
We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead ...
Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. ...