Source

arXiv

5644 posts from this source

Sort:

1 month ago · ai · - · -

[Paper] MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem

MOOSEnger is a tool-enabled AI agent tailored to the Multiphysics Object-Oriented Simulation Environment (MOOSE). MOOSE cases are specified in HIT '.i' input fi...

#domain-specific AI #retrieval-augmented generation #simulation software #MOOSE #LLM evaluation
1 month ago · software · - · -

[Paper] Behaviour Driven Development Scenario Generation with Large Language Models

This paper presents an evaluation of three LLMs, GPT-4, Claude 3, and Gemini, for automated Behaviour-Driven Development (BDD) scenarios generation. To support ...

#behavior-driven development #large language models #prompt engineering #software testing #LLM evaluation
1 month ago · ai · - · -

[Paper] Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development

Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete 'zero-to-one' proce...

#research #paper #ai #machine-learning #nlp
1 month ago · ai · - · -

[Paper] SimpliHuMoN: Simplifying Human Motion Prediction

Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been develope...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification

Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial conditi...

#data assimilation #latent space #uncertainty quantification #autoencoder #weather forecasting
1 month ago · ai · - · -

[Paper] SELDON: Supernova Explosions Learned by Deep ODE Networks

The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C. Rubin Observatory's Legacy Survey of Space and Time...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-session forget...

#agentic AI #AI governance #WebGIS #LLM reliability #software engineering
1 month ago · ai · - · -

[Paper] ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and π^3 have a computational cost that scales...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] AgentIR: Reasoning-Aware Retrival for Deep Research Agents

Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without documenting ...

#retrieval #reasoning-aware #deep research agents #dense retriever #synthetic training data
1 month ago · ai · - · -

[Paper] Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparency and e...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species...

#research #paper #ai #nlp #computer-vision
1 month ago · ai · - · -

[Paper] Helios: Real Real-Time Long Video Generation Model

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching ...

#video generation #diffusion models #real-time AI #computer vision #Helios
1 month ago · ai · - · -

[Paper] Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability wh...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] $τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specific knowl...

#benchmark #conversational agents #retrieval-augmented generation #LLM evaluation #tool use
1 month ago · ai · - · -

[Paper] Low-Resource Guidance for Controllable Latent Audio Diffusion

Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time controls ...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-stream arc...

#multimodal agents #adversarial training #reinforcement learning #web security #cross-modal attacks
1 month ago · ai · - · -

[Paper] Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights

The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterization of the...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] Dissecting Quantization Error: A Concentration-Alignment Perspective

Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop. Recently, function-preserving t...

#quantization #post-training quantization #large language models #model compression #machine learning research
1 month ago · ai · - · -

[Paper] RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains diffi...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] Efficient Refusal Ablation in LLM through Optimal Transport

Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent activation-based jail...

#large-language-models #jailbreak-attack #optimal-transport #model-safety #activation-analysis
1 month ago · ai · - · -

[Paper] FocusGraph: Graph-Structured Frame Selection for Embodied Long Video Question Answering

The ability to understand long videos is vital for embodied intelligent agents, because their effectiveness depends on how well they can accumulate, organize, a...

#video-question-answering #multimodal-llm #frame-selection #graph-structured-captions #embodied-ai
1 month ago · ai · - · -

[Paper] RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation

Pathology report generation remains a relatively under-explored downstream task, primarily due to the gigapixel scale and complex morphological heterogeneity of...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Underrepresented in Foundation Model Pretraining Data? A One-Shot Probe

Large-scale Vision-Language Foundation Models (VLFMs), such as CLIP, now underpin a wide range of computer vision research and applications. VLFMs are often ada...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Enhancing Authorship Attribution with Synthetic Paintings

Attributing authorship to paintings is a historically complex task, and one of its main challenges is the limited availability of real artworks for training com...

#authorship attribution #synthetic data #diffusion models #computer vision #stable diffusion
1 month ago · ai · - · -

[Paper] Hold-One-Shot-Out (HOSO) for Validation-Free Few-Shot CLIP Adapters

In many CLIP adaptation methods, a blending ratio hyperparameter controls the trade-off between general pretrained CLIP knowledge and the limited, dataset-speci...

#few-shot learning #CLIP adapters #validation-free training #computer vision #machine learning
1 month ago · ai · - · -

[Paper] Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study

Deep learning in cardiac MRI (CMR) is fundamentally constrained by both data scarcity and privacy regulations. This study systematically benchmarks three genera...

#synthetic data #diffusion models #medical imaging #privacy preservation #cardiac MRI
1 month ago · ai · - · -

[Paper] Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Constructing computer-aided design (CAD) models is labor-intensive but essential for engineering and manufacturing. Recent advances in Large Language Models (LL...

#research #paper #ai #nlp #computer-vision
1 month ago · ai · - · -

[Paper] PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade aggregat...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

We present a winning three-stage system for SemEval 2026 Task~12: Abductive Event Reasoning that combines graph-based retrieval, LLM-driven abductive reasoning ...

#abductive reasoning #graph retrieval #reflective prompting #large language models #SemEval-2026
1 month ago · ai · - · -

[Paper] World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

Recent work interprets the linear recoverability of geographic and temporal variables from large language model (LLM) hidden states as evidence for world-like i...

#research #paper #ai #machine-learning #nlp
1 month ago · ai · - · -

[Paper] $V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating multiple s...

#research #paper #ai #nlp
1 month ago · ai · - · -

[Paper] The Company You Keep: How LLMs Respond to Dark Triad Traits

Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encour...

#research #paper #ai #nlp
1 month ago · ai · - · -

[Paper] Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models

As large language models (LLMs) transition from research prototypes to real-world systems, customization has emerged as a central bottleneck. While text prompts...

#research #paper #ai #nlp
1 month ago · ai · - · -

[Paper] LikeThis! Empowering App Users to Submit UI Improvement Suggestions Instead of Complaints

User feedback is crucial for the evolution of mobile apps. However, research suggests that users tend to submit uninformative, vague, or destructive feedback. U...

#research #paper #ai #machine-learning
1 month ago · ai · - · -

[Paper] FeedAIde: Guiding App Users to Submit Rich Feedback Reports by Asking Context-Aware Follow-Up Questions

User feedback is essential for the success of mobile apps, yet what users report and what developers need often diverge. Research shows that users often submit ...

#multimodal-llm #user-feedback #mobile-app-sdk #context-aware #human‑ai interaction
1 month ago · it · - · -

[Paper] 2-Coloring Cycles in One Round

We show that there is a one-round randomized distributed algorithm that can 2-color cycles such that the expected fraction of monochromatic edges is less than 0...

#distributed-algorithms #formal-verification #Lean4 #LLM-assisted-research
1 month ago · ai · - · -

[Paper] Code Fingerprints: Disentangled Attribution of LLM-Generated Code

The rapid adoption of Large Language Models (LLMs) has transformed modern software development by enabling automated code generation at scale. While these syste...

#code attribution #LLM #machine learning #software provenance #research
1 month ago · ai · - · -

[Paper] CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Large language model (LLM) coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt. Human ...

#code-refactoring #large-language-models #benchmark #software-engineering
1 month ago · ai · - · -

[Paper] VietNormalizer: An Open-Source, Dependency-Free Python Library for Vietnamese Text Normalization in TTS and NLP Applications

We present VietNormalizer1, an open-source, zero-dependency Python library for Vietnamese text normalization targeting Text-to-Speech (TTS) and Natural Language...

#research #paper #ai #nlp
1 month ago · software · - · -

[Paper] Efficient Time-Aware Partitioning of Quantum Circuits for Distributed Quantum Computing

To overcome the physical limitations of scaling monolithic quantum computers, distributed quantum computing (DQC) interconnects multiple smaller-scale quantum p...

#quantum computing #distributed quantum computing #circuit partitioning #beam search heuristic #quantum compiler
1 month ago · ai · - · -

[Paper] Lyapunov Stability of Stochastic Vector Optimization: Theory and Numerical Implementation

The use of stochastic differential equations in multi-objective optimization has been limited, in practice, by two persistent gaps: incomplete stability analyse...

#research #paper #ai
1 month ago · ai · - · -

[Paper] An Adaptive KKT-Based Indicator for Convergence Assessment in Multi-Objective Optimization

Performance indicators are essential tools for assessing the convergence behavior of multi-objective optimization algorithms, particularly when the true Pareto ...

#multi-objective optimization #KKT convergence indicator #quantile normalization #benchmark evaluation
1 month ago · devops · - · -

[Paper] Performance Optimization in Stream Processing Systems: Experiment-Driven Configuration Tuning for Kafka Streams

Configuring stream processing systems for efficient performance, especially in cloud-native deployments, is a challenging and largely manual task. We present an...

#research #paper #devops
1 month ago · software · - · -

[Paper] A Core Calculus for Type-safe Product Lines of C Programs

In this paper we: (1) propose Lightweight C (LC), namely a core calculus that formalizes a proper subset of the ANSI C without preprocessor directives; (2) defi...

#type systems #C programming #software product lines #programming language theory #research
1 month ago · devops · - · -

[Paper] Lambdas at the Far Edge: a Tale of Flying Lambdas and Lambdas on Wheels

Aggregate Programming (AP) is a paradigm for programming the collective behaviour of sets of distributed devices, possibly situated at the network far edge, by ...

#research #paper #devops
1 month ago · ai · - · -

[Paper] LoRA-MME: Multi-Model Ensemble of LoRA-Tuned Encoders for Code Comment Classification

Code comment classification is a critical task for automated software documentation and analysis. In the context of the NLBSE'26 Tool Competition, we present Lo...

#research #paper #ai #machine-learning
1 month ago · devops · - · -

[Paper] A framework to reason about consistency and atomicity guarantees in a sparsely-connected, partially-replicated peer-to-peer system

For an offline-first collaborative application to operate in true peer-to-peer fashion, its collaborative features must function even in environments where inte...

#consistency models #peer-to-peer #partial replication #distributed systems
1 month ago · ai · - · -

[Paper] Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization...

#research #paper #ai #machine-learning

Newer posts

Older posts