[Paper] ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models
Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deploym...
Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deploym...
The numerical optimization of continuous functions is a fundamental task in many scientific and engineering domains, ranging from mechanical design to training ...
Prompt specifications for multi-agent large language model (LLM) systems carry data contracts and integration logic across many interdependent files but are rar...
Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks,...
We present Curated Industrial Developer Repository (CIDR), a large-scale dataset of real-world software repositories collected through direct collaboration with...
This paper experimentally analyzes how the level of harness engineering affects the operational performance of small language models (SLMs, 2-3B parameters). Th...
Agentic AI failures need post-hoc reconstruction: what the agent did, on whose authority, against which policy, and from what reasoning. Cross-regime feasibilit...
Cortical neurons are complex, multi-timescale processors wired into recurrent circuits, shaped by long evolutionary pressure under stringent biological constrai...
Power capping is the standard GPU energy lever in LLM serving, and it appears to work: throughput drops, power readings fall, and energy budgets are met. We sho...
Agentic systems deployed across the compute continuum need discovery mechanisms that remain effective across cloud, edge, and intermittently connected domains. ...
Spiking neural networks (SNNs) promise low-power event-driven computation for temporally rich tasks, but commonly used neuron models often trade off gradient-ba...
The spatial and functional organization of the primate visual cortex is a fundamental problem in neuroscience. While recent computational frameworks like the To...
Rebuilding engineering for speed, scale, and complexity AutoScout24 Groupopens in a new windowhttps://www.autoscout24.com/ is the largest pan‑European and Can...
We decompose an evolutionary mixture-of-LoRA system on a from-scratch ~150M-parameter widened-D substrate (D=1536, V=32000; D/V approx 0.048; the 'widened-1536'...
!Google's logo in front of its headquarters.https://www.engadget.com/img/gallery/google-announces-its-first-ever-discovery-of-a-zero-day-exploit-made-with-ai/in...
Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success ha...
Modelling extreme events and heavy-tailed phenomena is central to building reliable predictive systems in domains such as finance, climate science, and safety-c...
While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant st...
Transformers with self-attention modules as their core components have become an integral architecture in modern large language and foundation models. In this p...
Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond...
We consider anonymous multi-agent path finding (MAPF) where a set of robots is tasked to travel to a set of targets on a finite, connected graph. We show that M...
Recognition of handwritten Bangla compound characters remains a challenging problem due to complex character structures, large intra-class variation, and limite...
We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Le...
We consider the problem of synthesizing Clifford quantum circuits for devices with all-to-all qubit connectivity. We approach this task as a reinforcement learn...
This work revisits standard policy gradient methods used on restricted policy classes, which are known to get stuck in suboptimal critical points. We identify a...
The dominant paradigm for AI agents is an 'on-the-fly' loop in which agents synthesize plans and execute actions within seconds or minutes in response to user p...
As model families, training recipes, and compute budgets become increasingly standardized, further gains in machine learning systems depend increasingly on data...
Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no formal guarant...
Training deep research agents, namely systems that plan, search, evaluate evidence, and synthesize long-form reports, pushes reinforcement learning beyond the r...
On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is bene...
Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes...
Recent GPU generations deliver significantly higher FLOPs using lower-precision arithmetic, such as FP8. While successfully applied to large language models (LL...
Recent advances in machine learning and large-scale biological data collections have revived the prospect of building a virtual cell, a computational model of c...
Efficient LLM inference research has largely focused on reducing the cost of each decoding step (e.g., using quantization, pruning, or sparse attention), typica...
Recovering editable CAD programs from images or 3D observations is central to AI-assisted design, but progress is difficult to measure because existing evaluati...
Industrial Computer-Aided Design (CAD) code generation requires models to produce executable parametric programs from visual or textual inputs. Beyond recognizi...
Current LLM agents are proficient at calling isolated APIs but struggle with the 'last mile' of commercial software automation. In real-world scenarios, tools a...
Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either due to a lack of caus...
Rejection Fine-Tuning (RFT) is a standard method for training LLM agents, where unsuccessful trajectories are discarded from the training set. In the context of...
The integration of Artificial Intelligence (AI) with Distributed Ledger Technology (DLT) has become a growing research area, yet contributions tend to cluster a...
Neural networks have proved an effective means of learning control policies for autonomous systems, but these learned policies are difficult to understand due t...
Agentic artificial intelligence (AI) is a natural fit for Internet of Things (IoT) and edge systems, but edge deployments are often constrained to models around...
Comments Privacy – May 11, 2026 8:07 AM To hide text, try white text on a white background. The human eye won’t see it but the computer will. If you want to te...
Adaptive behavior requires the brain to transition between distinct contexts while maintaining representations of prior experience. The ability to reconfigure n...
A core challenge in program synthesis is online library learning: the incremental acquisition of reusable abstractions under uncertainty about future task deman...
Large Language Models exhibit mode collapse, producing homogeneous outputs that fail to explore valid solution spaces. We present QD-LLM, a framework for parame...
Gradient-based preference optimization methods for large language model (LLM) alignment suffer from preference collapse, converging to narrow behavioral modes w...
!pichttps://media2.dev.to/dynamic/image/width=256,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farti...