[Paper] From Weights to Activations: Is Steering the Next Frontier of Adaptation?
Post-training adaptation of language models is commonly achieved through parameter updates or input-based methods such as fine-tuning, parameter-efficient adapt...
5586 posts from this source
Post-training adaptation of language models is commonly achieved through parameter updates or input-based methods such as fine-tuning, parameter-efficient adapt...
We present UMI-3D, a multimodal extension of the Universal Manipulation Interface (UMI) for robust and scalable data collection in embodied manipulation. While ...
On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally...
We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistag...
Semantic Multi-Object Tracking (SMOT) extends multi-object tracking with semantic outputs such as video summaries, instance-level captions, and interaction labe...
Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into t...
Human-Object Interaction (HOI) detection is a longstanding computer vision problem concerned with predicting the interaction between humans and objects. Current...
We propose novel particle swarm optimization (PSO) variants incorporated with deep neural networks (DNNs) for particles to pursue globally optimal positions in ...
Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse r...
Fairness in algorithmic decision-making is often defined in the predictive space, where predictive performance - used as a proxy for decision-maker (DM) utility...
Recent advances in Generative Artificial Intelligence, particularly Large Language Models (LLMs), have stimulated growing interest in automating or assisting Bu...
Modern distributed systems generate large volumes of logs that can be analyzed to support essential AIOps tasks such as fault diagnosis, which plays a crucial r...
Commit signing is widely promoted as a foundation of software supply-chain security, yet prior work has studied it through the lens of individual repositories o...
Diffusion language models have recently emerged as a leading alternative to standard language models, due to their ability for bidirectional attention and paral...
The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data leakage. We pr...
According to constructivist theory, students learn software security more effectively when examples are grounded in their own code. Generic examples often fail ...
Automated code generation remains a persistent challenge in software engineering, as conventional multi-agent frameworks are often constrained by static plannin...
Software engineering research has focused on automating maintenance and evolution processes to reduce costs and improve reliability. The emergence of foundation...
Sentiment analysis in software engineering focuses on understanding emotions expressed in software artifacts. Previous research highlighted the limitations of a...
Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the pr...
Deep Learning (DL) is becoming more and more widespread in clone detection, motivated by achieving near-perfect performance for this task. In particular in case...
To date, various paradigms of soft-Computing have been used to solve many modern problems. Among them, a self organizing combination of fuzzy systems and neural...
Disaggregated storage systems improve resource utilization and enable independent scaling of storage and compute resources by separating storage resources from ...
Self-sustained neural activity in the absence of ongoing external input is a fundamental feature of nervous system dynamics, yet the conditions under which it c...
This study examines the impact of additive and multiplicative noise on both a single leaky integrate-and-fire (LIF) neuron and a trained spiking neural network ...
SAKURAONE is a managed high performance computing (HPC) cluster developed and operated by the SAKURA Internet Research Center. It builds on the KOKARYOKU PHY ba...
Brain digital twins aim to provide faithful, individualized computational representations of brains as dynamical systems, enabling mechanistic understanding and...
The growth of compute-intensive AI tasks highlights the need to mitigate the processing costs and improve performance and energy efficiency. This necessitates t...
We propose a novel end-to-end privacy-preserving framework, instantiated by three efficient protocols for different deployment scenarios, covering both input an...
The travelling thief problem (TTP) is a well-known multi-component optimisation problem that captures the interdependence between two components: the tour acros...
We propose a new sparse matrix format, PackSELL, designed to support diverse data representations and enable efficient sparse matrix-vector multiplication (SpMV...
Open-pit mine scheduling is a complex real world optimization problem that involves uncertain economic values and dynamically changing resource capacities. Evol...
Modern GPU workloads, especially large language model (LLM) inference, suffer from kernel launch overheads and coarse synchronization that limit inter-kernel pa...
Humans and artificial agents must often learn and switch between multiple tasks in dynamic environments. Success in such settings requires cognitive flexibility...
Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lift...
Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene gra...
Snowflake revolutionized data warehousing with an elastic architecture that decouples compute and storage, enabling scalable solutions for diverse data analytic...
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of diff...
The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Exist...
Urban areas are increasingly vulnerable to thermal extremes driven by rapid urbanization and climate change. Traditionally, thermal extremes have been monitored...
The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead...
The Energy Conserving Descent (ECD) algorithm was recently proposed (De Luca & Silverstein, 2022) as a global non-convex optimization method. Unlike gradien...
Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best s...
Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but...
Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task compr...
AI-driven education platforms have made some progress in personalisation, yet most remain constrained to static adaptation--predefined quizzes, uniform pacing, ...
On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This...
This paper tackles the Electric Capacitated Vehicle Routing Problem (E-CVRP) through a bilevel optimization framework that handles routing and charging decision...