[논문] 강화학습 흐름 정책의 테스트 시 그래디언트 가이드
Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and ...
1077 posts from this source
Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and ...
Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwid...
Adopting a single measure can improve the usability, modularity and accountability of software: a commitment to explicit meaning. This entails constructing and ...
Instruction-tuned LLMs are increasingly converted into reasoning models through post-training to improve multi-step task performance. This conversion is usually...
Distributed Software-Defined Networking (SDN) clusters replicate flow state asynchronously between a master node and its backups, leaving a window during which ...
Recent efforts to extend large language models (LLMs) to speech inputs typically rely on cascaded ASR-LLM pipelines, end-to-end speech-language models, or bridg...
Existing deep learning models for Positron Emission Tomography (PET) image denoising often suffer from severe performance degradation under distribution shifts,...
Sequential recommendation aims to predict users' next interaction with items by analyzing their historical behavior. However, the limited quality of item repres...
Measuring subjective constructs in naturally occurring social media text requires annotation procedures that are theoretically grounded, empirically validated, ...
Background and purpose: oART enables daily plan adaptation to interfraction anatomical variations, but cumulative dose estimation remains limited by DIR, segmen...
Large language models are increasingly used to adapt math word problems for personalized learning at scale, but it remains an open question whether those adapta...
OpenClaw has rapidly emerged as a transformative artificial intelligence (AI) agent framework, and its ability to autonomously execute complex, multi-step tasks...
Zinc-based alloys are indispensable emerging absorbable metallic biomaterials, and their macroscopic performance is governed by microstructural characteristics....
While recent advancements in generative AI have substantially accelerated static 3D model creation workflows, the synthesis of category-agnostic 3D animations r...
Visual in-context learning has been proposed as a pathway towards dynamic models that can generate predictions based on a provided context and thereby can adapt...
The deployment of Large Language Model (LLM) agents for computer automation is accelerating, yet their ability to navigate complex, professional-grade productiv...
Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale post-training...
Long-document question answering (QA) requires large language models (LLMs) to reason over evidence scattered across lengthy documents, where answers often depe...
Existing automated RESTful API testing approaches commonly rely on simple checks (e.g., HTTP status codes, schema conformance), which are insufficient for detec...
Traditional test adequacy metrics measure a system's implementation, not whether it adheres to its expected behaviour. While developers rely heavily on code cov...
Communication skills are increasingly recognized as essential in Software Engineering, yet discussions about them remain fragmented across academic and gray lit...
Developers often struggle to design truly accessible digital solutions in corporate environments. In these environments, accessibility is usually treated as a c...
Level-4+ autonomous driving systems (ADS) must run dozens of heterogeneous deep neural networks (DNNs) as end-to-end (E2E) pipelines under a strict latency cons...
The ioctl system call is Linux's catch-all device-control interface. A userspace program opens a device node and hands the driver a numeric command code and an ...
Many data systems admit multiple admissible outcomes for the same input: concurrent transactions may serialize in one of many orders; a logic program may have m...
While traditional techniques, such as symbolic execution, provide a principled foundation for precise constraint reasoning in program analysis, they struggle to...
The high efficiency of domain-specific hardware has sparked substantial interest in adopting accelerators in data analytics systems. Among many choices, GPUs an...
Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This des...
Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. Howeve...
Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single firs...
Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial compu...
Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to l...
Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of trainin...
We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the deriva...
Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analy...
Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodi...
World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors in...
AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is inter...
We introduce Topological Neural Operators (TNOs), a principled framework for operator learning on cell complexes that lifts neural operators (NOs) from function...
We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first fram...
We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having i...
Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, sys...
View-dependent appearance modeling remains a challenging problem in novel-view synthesis and reconstruction. Accurately representing complex angular effects oft...
End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism char...
Large-scale document processing requires contextually aware table extraction (TE) that is both accurate and efficient. Yet current approaches require billions o...
This study addresses the challenges composers and sound designers face in creating and refining tools to achieve their musical goals. Using evolutionary process...
Hard safety filters are increasingly placed downstream of learned controllers to guarantee constraint satisfaction at run time. Yet a filtered controller that n...
Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain s...