[Paper] REPOT: Recoverable Program-of-Thought via Checkpoint Repair
One-shot Program-of-Thought (PoT) emits a Python program that prints a primitive-action plan; a single invalid action silently invalidates the trajectory. We in...
1659 posts from this source
One-shot Program-of-Thought (PoT) emits a Python program that prints a primitive-action plan; a single invalid action silently invalidates the trajectory. We in...
The transition toward Software-Defined Vehicles (SDVs) represents a major paradigm shift in vehicle design, transforming traditional hardware-centric systems in...
We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary d-dimensional tori effectively in MPI. Given a factorization...
The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Rand...
Consensus protocols form the backbone of distributed systems and blockchains, where implementation bugs can cause data corruption and financial losses. While LL...
QEM is widely regarded as a plausible bridge from NISQ devices to FTQC. Yet the empirical studies used to assess the effectiveness of QEM techniques on concrete...
Context: Technical debt (TD) is a widely studied metaphor that helps to explain how sub-optimal decisions that can harm software maintainability over time. Alth...
Centralised biometric identity systems expose users to single points of failure, opaque verification processes, and irreversible biometric compromise. Decentral...
Large language models (LLMs) have become integral to modern software development, enabling automated code generation at scale. However, validating the correctne...
In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimizatio...
Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while r...
Pipeline parallelism is essential for large-scale model training, but existing asynchronous approaches often degrade convergence due to parameter mismatch betwe...
Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graph...
Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, p...
The trend of increasing cluster sizes of supercomputers leads to a growing susceptibility to Silent Data Corruption (SDC) that can invalidate program results. A...
Small and medium-sized enterprises (SMEs) represent the majority of firms in most economies and often face financial constraints and higher vulnerability to fin...
Recently, the runtime analysis of multi-valued estimation-of-distribution algorithms in the framework of Ben Jedidia et al. (TCS 2024) has made significant adva...
Evolutionary model merging provides a powerful framework for the automated, training-free composition of LLMs through parameter-space search. However, existing ...
LLM-guided evolutionary search (Evolve systems) has reached state-of-the-art results on mathematical and combinatorial tasks, yet most existing systems report o...
We prove real-rootedness for the Poincaré polynomial [ P_n(t)=sum_{i=0}^{n-3} dim H^{2i}(overline{mathcal M}_{0,n};mathbb{Q})t^i ] of the Deligne--Mumford modul...
Current vision-language models (VLMs) typically stitch together separate image encoders and language decoders via multi-stage alignment, a modular framework tha...
Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy...
Large language models (LLMs) have become increasingly useful computational models of human language processing, but it remains unclear whether vision-language l...
World models for interactive video generation have largely focused on single-agent settings, where future observations are generated from a single control signa...
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inferen...
A primary bottleneck in contact-rich manipulation is the difficulty of collecting real-world data. Sim-to-real reinforcement learning offers a scalable alternat...
We present a method for harmonizing the lighting of a foreground video to match a target background scene, adjusting shadows, color tone, and illumination inten...
Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by t...
Class-Incremental Learning (CIL) is important in building real-world learning systems. In CLIP-based CIL, the model performs classification by comparing similar...
Agentic AI systems capable of autonomous planning and extended environmental interaction pose a fundamental control problem: how can humans maintain meaningful ...
Long-term memory is increasingly important for personalized AI agents, yet existing benchmarks and methods remain largely text-centric. Even when images are inc...
Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist fou...
Vision-Language-Action (VLA) models unify perception, reasoning, and control within a single policy, yet their multi-billion-parameter backbones and diffusion-b...
Free-text explanations extend human label variation (HLV) beyond label disagreement by revealing the reasoning and preferences behind annotators' decisions. We ...
Electroencephalography (EEG) is a critical, non-invasive method to monitor electrical brain activity. EEGs can span anywhere from a couple seconds to multiple h...
On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-leve...
In the era of autonomous agents, machine-actionable data is critical for data-driven workflows. For more than a decade, semantic metadata like schema.org has an...
Discourse particles, such as well and kind of, are crucial components that enable LLMs to ``speak'' more like humans. They are used to convey emotions, intentio...
Vision classifiers can exploit spurious correlations, achieving high in-distribution accuracy yet failing under distribution shift. Existing approaches to bias ...
Vision-language models (VLMs) generate fluent causal explanations, but current evaluations cannot distinguish linguistic plausibility from faithful causal reaso...
LLMs' linguistically expressed confidence should faithfully reflect their intrinsic uncertainty. While recent work shows LLMs struggle to use epistemic markers ...
Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open...
Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle ...
Vast quantities of compute (GPU cycles on personal workstations, idle inference servers, and edge devices between jobs) go unused because no incentive-aligned p...
Interactive 3D assets used in games and simulation are typically decomposed into specific semantic parts to support animation, physics, and scripted behaviors, ...
This paper studies preference-shaped expected improvement criteria for Bayesian multiobjective optimization. We consider two indicator families which are often ...
This paper studies preference-shaped expected improvement criteria for Bayesian multiobjective optimization. We consider two indicator families which are often ...
Large Vision-Language Models (LVLMs) are rapidly evolving toward true multimodal reasoning, with visual search representing a concrete instantiation of the thin...