Source

arXiv

1621 posts from this source

Sort:

1 week ago · ai · - · -

[Paper] When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following

Large reasoning models (LRMs) often improve math and coding performance, but their effect on instruction following is unclear. We study IFEval with Qwen3 models...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] End-to-End Context Compression at Scale

Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall shor...

#research #paper #ai #machine-learning #nlp
1 week ago · ai · - · -

[Paper] Beyond Accuracy: Community Perspectives on Machine Translation

Despite remarkable progress in machine translation (MT), non-AI communities have raised growing concerns about MT systems, suggesting a noticeable gap between t...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

We study whether pretrained video foundation models encode intuitive-physics information in their frozen representations, and how this information varies across...

#research #paper #ai #machine-learning #computer-vision
1 week ago · software · - · -

[Paper] Modeling Components and Connections in Cyber-Physical Systems

Text based configuration files for cyber-physical systems show the hierarchy of component modules well but often hide the details of connections and interfaces ...

#research #paper #software
1 week ago · ai · - · -

[Paper] Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving

Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer accuracy alone does not indicate whether a model reli...

#research #paper #ai #nlp #computer-vision
1 week ago · ai · - · -

[Paper] FMplex: Model Virtualization for Serving Extensible Foundation Models

Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing ...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] FMplex: Model Virtualization for Serving Extensible Foundation Models

Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing ...

#research #paper #ai #machine-learning
1 week ago · software · - · -

[Paper] Agentic Persona Generation with Critique-Refinement: An Industrial Evaluation

Personas are widely used in software engineering to support requirements elicitation, design, and validation, but their manual creation is costly, time-consumin...

#research #paper #software
1 week ago · ai · - · -

[Paper] Gradient-Guided Reward Optimization for Inference-time Alignment

Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adaptation. While inference-time alignment methods suc...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] Civil Court Simulation with Large Language Models

Court simulation bridges legal education and judicial practice, yet human-based simulations are costly and difficult to scale. Large language models (LLMs) offe...

#research #paper #ai #nlp
1 week ago · devops · - · -

[Paper] Parent-Hash DAG: A Cost Analysis of Constant-Time Append for On-Chain Registries

Provenance trees are append-only directed acyclic graphs of artifact registrations anchored on a public blockchain, recently introduced as the data substrate of...

#research #paper #devops
1 week ago · devops · - · -

[Paper] Parent-Hash DAG: A Cost Analysis of Constant-Time Append for On-Chain Registries

Provenance trees are append-only directed acyclic graphs of artifact registrations anchored on a public blockchain, recently introduced as the data substrate of...

#research #paper #devops
1 week ago · ai · - · -

[Paper] Code Is More Than Text: Uncertainty Estimation for Code Generation

Large language models (LLMs) are increasingly deployed as code generators, where silently wrong programs pose real safety and reliability risks. Reliable uncert...

#research #paper #ai #machine-learning #nlp
1 week ago · ai · - · -

[Paper] Hybrid Metaheuristic Combining the Dragonfly Algorithm and Tabu Search for the Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a classical NP-hard combinatorial optimization problem that aims to find the shortest Hamiltonian cycle visiting each ci...

#research #paper #ai
1 week ago · ai · - · -

[Paper] Hybrid Metaheuristic Combining the Dragonfly Algorithm and Tabu Search for the Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a classical NP-hard combinatorial optimization problem that aims to find the shortest Hamiltonian cycle visiting each ci...

#research #paper #ai
1 week ago · software · - · -

[Paper] Relocate and Emulate: Re-Hosting Android's Application Layer

Dynamic analysis of Android's application layer typically relies on physical devices, limiting scalability and reproducibility. To compensate, we introduce a sy...

#research #paper #software
1 week ago · ai · - · -

[Paper] Local Search on Vertex Coloring for Bipartite Graphs

Local search is a well-known heuristic method used in optimization. In this thesis, we explore its capabilities on the vertex coloring problem, an NP-hard probl...

#research #paper #ai
1 week ago · ai · - · -

[Paper] Local Search on Vertex Coloring for Bipartite Graphs

Local search is a well-known heuristic method used in optimization. In this thesis, we explore its capabilities on the vertex coloring problem, an NP-hard probl...

#research #paper #ai
1 week ago · ai · - · -

[Paper] Harness Engineering for Physical AI: Robot Middleware Is the Harness Layer

Robot middleware faces a new role in the era of Physical AI. Learned policies, planners, and vision-language-action (VLA) models now enter deployed robots as ca...

#research #paper #ai #machine-learning
1 week ago · software · - · -

[Paper] Empirical Study for Structured Output Control in LLMs for Software Engineering

LLM-generated outputs in software engineering rarely exist in isolation. They must plug into toolchains, APIs, and data pipelines that impose strict, often orga...

#research #paper #software
1 week ago · devops · - · -

[Paper] Coupling Complementary Simulations for Combined Performance and Energy Optimization

Polymer simulations are among the most computationally demanding workloads in soft-matter research, often requiring days of execution and high energy consumptio...

#research #paper #devops
1 week ago · devops · - · -

[Paper] Coupling Complementary Simulations for Combined Performance and Energy Optimization

Polymer simulations are among the most computationally demanding workloads in soft-matter research, often requiring days of execution and high energy consumptio...

#research #paper #devops
1 week ago · devops · - · -

[Paper] Engineering Scalable Distributed List Ranking

The list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving ot...

#research #paper #devops
1 week ago · devops · - · -

[Paper] Engineering Scalable Distributed List Ranking

The list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving ot...

#research #paper #devops
1 week ago · ai · - · -

[Paper] Quantitative Performance Analysis of Stopping Criteria for CMA-ES

Covariance matrix adaptation evolution strategy (CMA-ES) is a state-of-the-art black-box optimization algorithm. In general, CMA-ES uses a portfolio of multiple...

#research #paper #ai
1 week ago · ai · - · -

[Paper] Quantitative Performance Analysis of Stopping Criteria for CMA-ES

Covariance matrix adaptation evolution strategy (CMA-ES) is a state-of-the-art black-box optimization algorithm. In general, CMA-ES uses a portfolio of multiple...

#research #paper #ai
1 week ago · ai · - · -

[Paper] Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads

The rapid growth of large-scale machine learning (ML) has made distributed training across multiple GPUs a fundamental component of modern ML systems. As model ...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads

The rapid growth of large-scale machine learning (ML) has made distributed training across multiple GPUs a fundamental component of modern ML systems. As model ...

#research #paper #ai #machine-learning
1 week ago · software · - · -

[Paper] Understanding How Enterprises Adopt the Model Context Protocol for LLM-Driven Software Engineering

Large Language Models (LLMs) are increasingly used in AI-based software engineering, but their limitations in complex task execution and multi-tool coordination...

#research #paper #software
1 week ago · ai · - · -

[Paper] CANS: Accelerating Multiuser Collaborative Edge Inference via Cooperative Autodidactic NeuroSurgeon

Recently, mobile edge computing (MEC)-enabled collaborative deep neural network (DNN) inference has emerged as a promising approach for delivering intelligent s...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] OpenOpt: An Open-Source SRAM Optimizer Based on Equivalent Circuit Model

This paper proposes a co-optimization framework that jointly optimizes SRAM architecture and transistor sizing using equivalent circuit models. The framework si...

#research #paper #ai
1 week ago · ai · - · -

[Paper] OpenOpt: An Open-Source SRAM Optimizer Based on Equivalent Circuit Model

This paper proposes a co-optimization framework that jointly optimizes SRAM architecture and transistor sizing using equivalent circuit models. The framework si...

#research #paper #ai
1 week ago · ai · - · -

[Paper] Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations

Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace with the vol...

#research #paper #ai #machine-learning
1 week ago · devops · - · -

[Paper] AutoPilot: Learning to Steer High Speed Robust BFT

Recent Byzantine Fault Tolerant (BFT) protocols achieve strong performance by combining the low-latency advantages of leader-based BFT protocols with the high-t...

#research #paper #devops
1 week ago · devops · - · -

[Paper] Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface

The official C++ MPI bindings were removed from the standard in 2008, leaving a gap that numerous third-party libraries have attempted to fill. However, existin...

#research #paper #devops
1 week ago · devops · - · -

[Paper] Chimera: Protocol-Aware Recovery for Confidential BFT Consensus

Trusted Execution Environments (TEEs) have enabled confidential Byzantine Fault-Tolerant (BFT) consensus systems with confidentiality and improved scalability. ...

#research #paper #devops
1 week ago · ai · - · -

[Paper] Context Rot in AI-Assisted Software Development: Repurposing Documentation Consistency for AI Configuration Artifacts

Developers increasingly provide AI coding assistants with persistent context through configuration files such as CLAUDE.md, AGENTS.md, and .cursorrules. These f...

#research #paper #ai #machine-learning
1 week ago · software · - · -

[Paper] Security-First Approach to API Pipeline Development with Zero-Trust Architecture

Modern enterprises face an accelerating onslaught of API-targeted threats amid a rapidly expanding attack surface. Record volumes of software vulnerabilities co...

#research #paper #software
1 week ago · devops · - · -

[Paper] Fairness-Aware and Latency-Controllable Scheduling for Chunked-Prefill LLM Serving

As large language models (LLMs) are increasingly deployed with highly heterogeneous workloads, chunked-prefill execution has emerged as a mainstream serving arc...

#research #paper #devops
1 week ago · software · - · -

[Paper] ATTAIN: Automated Exploit Failure Analysis through Trace-Driven Diff Analysis

Exploits are widely used to check whether library vulnerabilities appear in different versions and to mark affected version ranges. Exploit-based checks sometim...

#research #paper #software
1 week ago · devops · - · -

[Paper] When More Cores Hurts: The Vector Database Scaling Paradox in HPC

Vector databases have been designed and optimized for cloud environments; however, emerging scientific AI workloads (e.g., molecular search, meteorological traj...

#research #paper #devops
1 week ago · devops · - · -

[Paper] A Low-Latency Semantic State Estimator using Latent Predictive Learning for Dynamic Network Monitoring and Orchestration

Closed-loop network monitoring and orchestration increasingly require semantic interpretations of live telemetry beyond raw counter collection. However, dynamic...

#research #paper #devops
1 week ago · devops · - · -

[Paper] Parallel SMT Solving via Dynamic Partitioning, Core-Guided Pruning, and Online Backbone Detection

Exploiting parallelism in modern CPU architectures remains a longstanding challenge in optimizing SMT solvers. We introduce a novel parallel framework that dyna...

#research #paper #devops
1 week ago · ai · - · -

[Paper] Beyond Pass Rate: A Multilingual, Execution-Grounded Evaluation of Open Code LLMs

Code generation models are typically compared using compact execution benchmarks and aggregate pass rates, but such summaries obscure how performance varies acr...

#research #paper #ai #machine-learning
1 week ago · software · - · -

[Paper] Syntax-driven Incremental Program Verification of Matching Logic Properties

Incrementality is a fundamental design principle to master the complexity of large, long-lived software systems. This principle has been embraced by agile devel...

#research #paper #software
1 week ago · ai · - · -

[Paper] Aperon Technical Report: Hierarchical No-Pointer Tangent-Local Search for High-Dimensional Approximate Nearest Neighbors

We present HNTL (Hierarchical No-pointer Tangent-Local), the core vector indexing and candidate generation framework of the Aperon vector memory system. Proximi...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Governance Controls for AI-Generated Test Artifacts in Autonomous Software Testing

Artificial Intelligence (AI) and Large Language Models (LLMs) are increasingly used in autonomous software testing; however, AI-generated test artifacts often s...

#research #paper #ai #machine-learning

Newer posts

Older posts