Source

arXiv

5785 posts from this source

Sort:

3 months ago · ai · - · -

[Paper] ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

While Chain-of-Thought (CoT) significantly enhances the performance of Large Language Models (LLMs), explicit reasoning chains introduce substantial computation...

#research #paper #ai #nlp
3 months ago · ai · - · -

[Paper] JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and JDs

We introduce JobResQA, a multilingual Question Answering benchmark for evaluating Machine Reading Comprehension (MRC) capabilities of LLMs on HR-specific tasks ...

#research #paper #ai #nlp
3 months ago · software · - · -

[Paper] Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG

Open Source Software (OSS) sustainability relies on newcomers transitioning to core contributors, but this pipeline is broken, with most newcomers becoming inac...

#research #paper #software
3 months ago · software · - · -

[Paper] From Monolith to Microservices: A Comparative Evaluation of Decomposition Frameworks

Software modernisation through the migration from monolithic architectures to microservices has become increasingly critical, yet identifying effective service ...

#research #paper #software
3 months ago · software · - · -

[Paper] Automated Testing of Prevalent 3D User Interactions in Virtual Reality Applications

Virtual Reality (VR) technologies offer immersive user experiences across various domains, but present unique testing challenges compared to traditional softwar...

#research #paper #software
3 months ago · software · - · -

[Paper] Evaluating the Effectiveness of OpenAI's Parental Control System

We evaluate how effectively platform-level parental controls moderate a mainstream conversational assistant used by minors. Our two-phase protocol first builds ...

#research #paper #software
3 months ago · ai · - · -

[Paper] On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study

Large Language Models (LLMs) are increasingly relevant in Software Engineering research and practice, with Automated Bug Fixing (ABF) being one of their key app...

#research #paper #ai #machine-learning
3 months ago · software · - · -

[Paper] Uncovering Hidden Inclusions of Vulnerable Dependencies in Real-World Java Projects

Open-source software (OSS) dependencies are a dominant component of modern software code bases. Using proven and well-tested OSS components lets developers redu...

#research #paper #software
3 months ago · software · - · -

[Paper] SolAgent: A Specialized Multi-Agent Framework for Solidity Code Generation

Smart contracts are the backbone of the decentralized web, yet ensuring their functional correctness and security remains a critical challenge. While Large Lang...

#research #paper #software
3 months ago · ai · - · -

[Paper] TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI

Agentic AI systems act through tools and evolve their behavior over long, stochastic interaction traces. This setting complicates assurance, because behavior de...

#research #paper #ai #machine-learning
3 months ago · devops · - · -

[Paper] ERA: Epoch-Resolved Arbitration for Duelling Admins in Group Management CRDTs

Conflict-Free Replicated Data Types (CRDTs) are used in a range of fields for their coordination-free replication with strong eventual consistency. By prioritis...

#research #paper #devops
3 months ago · ai · - · -

[Paper] AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation

The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelera...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks

Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved. Howev...

#research #paper #ai #machine-learning #computer-vision
3 months ago · devops · - · -

[Paper] CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control

Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe throughput degradation we...

#research #paper #devops
3 months ago · ai · - · -

[Paper] COBRA++: Enhanced COBRA Optimizer with Augmented Surrogate Pool and Reinforced Surrogate Selection

The optimization problems in realistic world present significant challenges onto optimization algorithms, such as the expensive evaluation issue and complex con...

#research #paper #ai
3 months ago · ai · - · -

[Paper] HetCCL: Accelerating LLM Training with Heterogeneous GPUs

The rapid growth of large language models is driving organizations to expand their GPU clusters, often with GPUs from multiple vendors. However, current deep le...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Detect and Act: Automated Dynamic Optimizer through Meta-Black-Box Optimization

Dynamic Optimization Problems (DOPs) are challenging to address due to their complex nature, i.e., dynamic environment variation. Evolutionary Computation metho...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Fairness-Aware Performance Evaluation for Multi-Party Multi-Objective Optimization

In multiparty multiobjective optimization problems, solution sets are usually evaluated using classical performance metrics, aggregated across DMs. However, suc...

#research #paper #ai
3 months ago · devops · - · -

[Paper] Coordinating Power Grid Frequency Regulation Service with Data Center Load Flexibility

AI/ML data center growth have led to higher energy consumption and carbon emissions. The shift to renewable energy and growing data center energy demands can de...

#research #paper #devops
3 months ago · ai · - · -

[Paper] AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism

Data and pipeline parallelism are key strategies for scaling neural network training across distributed devices, but their high communication cost necessitates ...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Towards Resiliency in Large Language Model Serving with KevlarFlow

Large Language Model (LLM) serving systems remain fundamentally fragile, where frequent hardware faults in hyperscale clusters trigger disproportionate service ...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] SAIR: Cost-Efficient Multi-Stage ML Pipeline Autoscaling via In-Context Reinforcement Learning

Multi-stage ML inference pipelines are difficult to autoscale due to heterogeneous resources, cross-stage coupling, and dynamic bottleneck migration. We present...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Learning Provably Correct Distributed Protocols Without Human Knowledge

Provably correct distributed protocols, which are a critical component of modern distributed systems, are highly challenging to design and have often required d...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Investigating the Interplay of Parameterization and Optimizer in Gradient-Free Topology Optimization: A Cantilever Beam Case Study

Gradient-free black-box optimization (BBO) is widely used in engineering design and provides a flexible framework for topology optimization (TO), enabling the d...

#research #paper #ai
3 months ago · ai · - · -

[Paper] RedSage: A Cybersecurity Generalist LLM

Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary API...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] One-step Latent-free Image Generation with Pixel Mean Flows

Modern diffusion/flow-based models for image generation typically exhibit two core characteristics: (i) using multi-step sampling, and (ii) operating in a laten...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Discovering Hidden Gems in Model Repositories

Public repositories host millions of fine-tuned models, yet community usage remains disproportionately concentrated on a small number of foundation checkpoints....

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Hybrid Transformer architectures, which combine softmax attention blocks and recurrent neural networks (RNNs), have shown a desirable performance-throughput tra...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] Exploring Reasoning Reward Model for Agents

Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods sti...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] UEval: A Benchmark for Unified Multimodal Generation

We introduce UEval, a benchmark to evaluate unified models, i.e., models capable of generating both images and text. UEval comprises 1,000 expert-curated questi...

#research #paper #ai #nlp #computer-vision
3 months ago · ai · - · -

[Paper] DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Manipulating dynamic objects remains an open challenge for Vision-Language-Action (VLA) models, which, despite strong generalization in static manipulation, str...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing

Neural networks have been successfully applied in various resource-constrained edge devices, where usually central processing units (CPUs) instead of graphics p...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Do VLMs Perceive or Recall? Probing Visual Perception vs. Memory with Classic Visual Illusions

Large Vision-Language Models (VLMs) often answer classic visual illusions 'correctly' on original images, yet persist with the same responses when illusion fact...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] DynaWeb: Model-Based Reinforcement Learning of Web Agents

The development of autonomous web agents, powered by Large Language Models (LLMs) and reinforcement learning (RL), represents a significant step towards general...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

Due to limited supervised training data, large language models (LLMs) are typically pre-trained via a self-supervised 'predict the next word' objective on a vas...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Audio-Visual Foundation Models, which are pretrained to jointly generate sound and visual content, have recently shown an unprecedented ability to model multi-m...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

In pruning, the Lottery Ticket Hypothesis posits that large networks contain sparse subnetworks, or winning tickets, that can be trained in isolation to match t...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers

Reasoning-oriented Large Language Models (LLMs) have achieved remarkable progress with Chain-of-Thought (CoT) prompting, yet they remain fundamentally limited b...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] PRISM: Distribution-free Adaptive Computation of Matrix Functions for Accelerating Neural Network Training

Matrix functions such as square root, inverse roots, and orthogonalization play a central role in preconditioned gradient methods for neural network training. T...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] StepShield: When, Not Whether to Intervene on Rogue Agents

Existing agent safety benchmarks report binary accuracy, conflating early intervention with post-mortem analysis. A detector that flags a violation at step 8 en...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] PI-Light: Physics-Inspired Diffusion for Full-Image Relighting

Full-image relighting remains a challenging problem due to the difficulty of collecting large-scale structured paired data, the difficulty of maintaining physic...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Early and Prediagnostic Detection of Pancreatic Cancer from Computed Tomography

Pancreatic ductal adenocarcinoma (PDAC), one of the deadliest solid malignancies, is often detected at a late and inoperable stage. Retrospective reviews of pre...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Pay for Hints, Not Answers: LLM Shepherding for Cost-Efficient Inference

Large Language Models (LLMs) deliver state-of-the-art performance on complex reasoning tasks, but their inference costs limit deployment at scale. Small Languag...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] SMOG: Scalable Meta-Learning for Multi-Objective Bayesian Optimization

Multi-objective optimization aims to solve problems with competing objectives, often with only black-box access to a problem and a limited budget of measurement...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems

Frontier large language models (LLMs) excel as autonomous agents in many domains, yet they remain untested in complex enterprise systems where hidden workflows ...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents

Test-time scaling has been widely adopted to enhance the capabilities of Large Language Model (LLM) agents in software engineering (SWE) tasks. However, the sta...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers

Current generative video models excel at producing novel content from text and image prompts, but leave a critical gap in editing existing pre-recorded videos, ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Creative Image Generation with Diffusion Model

Creative image generation has emerged as a compelling area of research, driven by the need to produce novel and high-quality images that expand the boundaries o...

#research #paper #ai #computer-vision

Newer posts

Older posts