Source

arXiv

1603 posts from this source

Sort:

3 weeks ago · ai · - · -

[Paper] DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established scienc...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's digital world. Yet curre...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] VeriTrace: Evolving Mental Models for Deep Research Agents

Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate representations shoul...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Automated Benchmark Auditing for AI Agents and Large Language Models

Modern AI benchmarks operate at a complexity that outpaces traditional verification methods. Tasks authored by domain experts often contain implicit assumptions...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning

Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distrib...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] StakeBench: Evaluating Language Understanding Grounded in Market Commitment

Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what speakers have committ...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Active Query Synthesis for Preference Learning

Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces th...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification

Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and cultur...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient ...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

Activation oracles aim to make the activations of other models legible to humans and yield promising results compared to white-box interpretability techniques. ...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Peak-Then-Collapse and the Four Interface Channels of Knowledge-Graph Tool Use

We test the standard RLVR tool-use recipe -- GRPO on Qwen2.5-7B-Instruct -- on a deliberately minimal knowledge-graph tool API: four Freebase navigation verbs o...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both wheth...

#research #paper #ai #machine-learning #nlp
3 weeks ago · software · - · -

[Paper] Trustworthy Software Project Generation : a Case Study with an Interactive Theorem Prover

Generating code from natural-language requirements has become a primary route for LLM-assisted software development. Although LLMs can successfully complete sma...

#research #paper #software
3 weeks ago · software · - · -

[Paper] Uncovering multi-channel magnetic hopfion annihilation via a single-node, billion-spin-scale atomistic framework

Modern atomistic spin simulations combine long stochastic trajectories, thermodynamic sampling, static optimization and multi-image transition-path workflows, a...

#research #paper #software
3 weeks ago · software · - · -

[Paper] CelerLog: Fast Log Parsing via Dynamic Routing

Log parsing is a fundamental step for automated log analysis, which transforms raw log messages into structured formats. Existing syntax-based parsers struggle ...

#research #paper #software
3 weeks ago · software · - · -

[Paper] From Early Adoption to Sustained Use: Understanding GenAI Usage Among Software Developers in Italian SMEs

Generative AI tools are rapidly transforming software development practice, prompting unprecedented research interest. However, existing studies have predominan...

#research #paper #software
3 weeks ago · ai · - · -

[Paper] Joint Optimization of Training and Inference in Federated Edge Learning via Constrained Multi-Objective Deep Reinforcement Learning

Federated edge learning (FEEL) has recently emerged as a promising paradigm for achieving edge intelligence (EI) via enabling collaborative model training acros...

#research #paper #ai #machine-learning
3 weeks ago · software · - · -

[Paper] How Agentic AI Coding Assistants Become the Attacker's Shell

Agentic AI coding assistants can edit files, run commands, and access the internet on behalf of developers. However, their reliance on unvetted external artifac...

#research #paper #software
3 weeks ago · devops · - · -

[Paper] Proof of Useful Attestation: A Consensus Primitive for Attestation-Native Chains

Validators on generic Proof of Stake chains earn the same fees whether they handle attestation work correctly or selectively censor it. For chains whose main ac...

#research #paper #devops
3 weeks ago · ai · - · -

[Paper] A Scalable Benchmark Test Suite for Dynamic Multi-Objective Optimization with a Changing Number of Objectives

Dynamic multi-objective optimization with a changing number of objectives has recently attracted increasing attention due to its relevance to real-world problem...

#research #paper #ai
3 weeks ago · devops · - · -

[Paper] An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG

Retrieval-Augmented Generation (RAG) empowers LLMs with external knowledge, making cross-institutional domain-specific knowledge base integration a highly promi...

#research #paper #devops
3 weeks ago · ai · - · -

[Paper] Neural Router: Semantic Content Matching for Agentic AI

Large language models (LLMs) can serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computin...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical benefits on real hardware...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insufficient for productio...

#research #paper #ai #machine-learning
3 weeks ago · devops · - · -

[Paper] Bandwidth-Aware LLM Inference on Heterogeneous Many-Core Supercomputers

Large language model (LLM) inference is limited by high computational cost and memory bandwidth demands, making deployment on heterogeneous many-core processors...

#research #paper #devops
3 weeks ago · devops · - · -

[Paper] When Agents Control Robots: A Zero Trust Policy Model for Agentic Cyber-Physical Systems

Multi-agent systems powered by large foundation models (LFMs) are increasingly deployed to control industrial robots through natural language, creating deployme...

#research #paper #devops
3 weeks ago · ai · - · -

[Paper] Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines

We present the first end-to-end demonstration of fine-tuning and serving Google's Gemma 4 31B model on TPU hardware, providing an empirical comparison of TPU an...

#research #paper #ai #machine-learning
3 weeks ago · devops · - · -

[Paper] DisagFusion: Asynchronous Pipeline Parallelism and Elastic Scheduling for Disaggregated Diffusion Serving

Diffusion-based generation is increasingly powering production content pipelines; however, deploying these models at scale remains a significant challenge. Mode...

#research #paper #devops
3 weeks ago · ai · - · -

[Paper] A Tertiary Review of Large Language Model-Based Code Generating Tasks: Trends, Challenges, and Future Directions

Context. Large language models (LLMs) are increasingly applied to code-generating tasks (CGTs) in software engineering. While reported results are promising, th...

#research #paper #ai #machine-learning
3 weeks ago · software · - · -

[Paper] A Heuristic Approach to Localize CSS Properties for Responsive Layout Failures

Responsive Layout Failures (RLFs) typically arise from CSS properties that hinder proper layout behavior in different screen sizes. To find an accurate and effe...

#research #paper #software
3 weeks ago · devops · - · -

[Paper] Bandwidth-Aware and Cost-Efficient Pipeline Parallel Scheduling in Geo-Distributed LLM Training

The rapid evolution of large language models (LLMs) has made geographically distributed training necessary due to GPU scarcity within a single cloud region. In ...

#research #paper #devops
3 weeks ago · ai · - · -

[Paper] Positivity in classical enumerative geometry: a case study in synchronized AI-assisted mathematics

We study the symmetric polynomial prod_{αin A_{n,d}}bigl(1+α_1 x_1+cdots+α_n x_nbigr) where A_{n,d}:={αinmathbb{Z}_{ge 0}^n:|α|=d}, which is the total Chern cla...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Growing a Neural Network in Breadth, Depth, and Time

Spatial and temporal resource constraints are critical for both biological and artificial intelligent systems. Here we define differentiable cost terms for brea...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Anarchy in the swarm: Testing informed and uninformed diversity-enhancing mechanisms within PSO framework

Particle Swarm Optimization (PSO) frequently suffers from premature convergence. This paper introduces a family of problem-informed diversity-enhancing strategi...

#research #paper #ai
3 weeks ago · ai · - · -

[Paper] Cultivating Machine Intelligence: The OMEGA Shift from Top-Down Optimization to Autopoietic Cognitive Ecologies

The dominant artificial intelligence paradigm trains neural architectures via gradient descent against proxy objectives and reinforcement learning from human fe...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Convex-Neural RRT*: Fast and Reliable Learning-Guided Sampling for High-Quality Robot Path Planning

Sampling-based algorithms for robot path planning offer probabilistic completeness and strong empirical convergence properties across environments with diverse ...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Memory Uncertainty Relation and Harmonic Memory in Random Recurrent Networks

We present an inequality that bounds the short-term memory capability of dynamical systems from below. It can be interpreted as an uncertainty relation between ...

#research #paper #ai
3 weeks ago · ai · - · -

[Paper] SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimiz...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Geo-Align: Video Generation Alignment via Metric Geometry Reward

Camera-controlled video generation has achieved remarkable progress in recent years. However, existing video-to-video re-rendering methods primarily rely on Sup...

#research #paper #ai #computer-vision
3 weeks ago · ai · - · -

[Paper] PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a ...

#research #paper #ai #computer-vision
3 weeks ago · ai · - · -

[Paper] LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophi...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Language agents increasingly improve by reusing skills -- structured procedural artifacts distilled from past experience. In particular, domain-level and model-...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce numerical outputs such as action magnitudes and spatia...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] ETCHR: Editing To Clarify and Harness Reasoning

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grai...

#research #paper #ai #machine-learning #nlp #computer-vision
3 weeks ago · ai · - · -

[Paper] From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse ...

#research #paper #ai #computer-vision
3 weeks ago · ai · - · -

[Paper] Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models

We propose Complete-muE, a framework which targets hyperparameter transfer across dense FFN and any Mixture-of-Experts (MoE) setups in transformer blocks. Exist...

#research #paper #ai #machine-learning
3 weeks ago · ai · - · -

[Paper] Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-...

#research #paper #ai #machine-learning #computer-vision
3 weeks ago · ai · - · -

[Paper] Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework

Mask-free video object insertion has emerged as a challenging task, requiring harmonious integration of reference objects into source videos. However, existing ...

#research #paper #ai #computer-vision

Newer posts

Older posts