Source

arXiv

1364 posts from this source

Sort:

5 days ago · ai · - · -

[Paper] PhantomBench: Benchmarking the Non-existential Threat of Language Models

Hallucinations, where language models (LMs) generate factually ungrounded responses, pose serious risks, as users tend to blindly rely on them. This is particul...

#research #paper #ai #machine-learning #nlp
5 days ago · ai · - · -

[Paper] Limitations of Learning Tanh Neural Networks with Finite Precision

We investigate limitations of learning tanh neural networks from point evaluations under finite-precision computations and L^p accuracy guarantees, building on ...

#research #paper #ai #machine-learning
5 days ago · ai · - · -

[Paper] Do Transformers Actually Help Intrusion Detection? A Temporal Sequence Evaluation on CIC-IDS2017

Recent deep learning approaches for network intrusion detection increasingly incorporate temporal architectures such as recurrent networks and Transformers, oft...

#research #paper #ai #machine-learning
5 days ago · ai · - · -

[Paper] IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

Built on pretrained vision foundation models (VFMs), representation autoencoders (RAEs) have recently emerged as a promising approach for constructing semantica...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] RoboNaldo: Accurate, Stable and Powerful Humanoid Soccer Shooting via Motion-Guided Curriculum Reinforcement Learning

Elite humanoid soccer shooting requires whole-body stability, high-impulse whole-body interactions, and accuracy to targets. Motion tracking-driven reinforcemen...

#research #paper #ai #machine-learning
5 days ago · ai · - · -

[Paper] Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and ...

#research #paper #ai #machine-learning
5 days ago · ai · - · -

[Paper] The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

This study investigates cross-lingual distributional skew (the Shibboleth Effect) in frontier large language models (LLMs) subjected to sustained adversarial co...

#research #paper #ai #nlp
5 days ago · ai · - · -

[Paper] Unifying Local Communications and Local Updates for LLM Pretraining

Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwid...

#research #paper #ai #machine-learning
5 days ago · ai · - · -

[Paper] VISTA: A Versatile Interactive User Simulation Toolkit for Agent Evaluation

Evaluation remains a critical bottleneck for interactive agent development. Existing evaluation methods often rely on static benchmarks, which fail to capture t...

#research #paper #ai #nlp
5 days ago · ai · - · -

[Paper] A History-Aware Visually Grounded Critic for Computer Use Agents

Various test-time interventions for Computer Use Agents (CUAs), including critic models, have been developed to improve performance through pre-execution action...

#research #paper #ai #machine-learning #nlp #computer-vision
5 days ago · ai · - · -

[Paper] Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models

With the widespread deployment of Multimodal Large Language Models (MLLMs) in social interaction, understanding and controlling their behavior under complex per...

#research #paper #ai #machine-learning #nlp
5 days ago · ai · - · -

[Paper] T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains

Recent advances in reasoning and tool-calling capabilities of large language models (LLMs) have enabled increasingly capable agentic systems. However, existing ...

#research #paper #ai #machine-learning #nlp
5 days ago · ai · - · -

[Paper] Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context rec...

#research #paper #ai #nlp
5 days ago · software · - · -

[Paper] Making Software Meaningful

Adopting a single measure can improve the usability, modularity and accountability of software: a commitment to explicit meaning. This entails constructing and ...

#research #paper #software
5 days ago · ai · - · -

[Paper] Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models

Instruction-tuned LLMs are increasingly converted into reasoning models through post-training to improve multi-step task performance. This conversion is usually...

#research #paper #ai #nlp
5 days ago · software · - · -

[Paper] GapFuzz: Cross-Plane Divergence Fuzzing for Distributed SDN Controllers

Distributed Software-Defined Networking (SDN) clusters replicate flow state asynchronously between a master node and its backups, leaving a window during which ...

#research #paper #software
5 days ago · ai · - · -

[Paper] A Spiking Neural Architecture for Coordinating Arm and Locomotor Control

Spiking Neural Networks (SNNs) coupled with neuromorphic hardware offer energy-efficient solutions for humanoid robot control. However, existing SNN-based motor...

#research #paper #ai
5 days ago · ai · - · -

[Paper] AuRA: Internalizing Audio Understanding into LLMs as LoRA

Recent efforts to extend large language models (LLMs) to speech inputs typically rely on cascaded ASR-LLM pipelines, end-to-end speech-language models, or bridg...

#research #paper #ai #machine-learning #nlp
5 days ago · ai · - · -

[Paper] U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training

Existing deep learning models for Positron Emission Tomography (PET) image denoising often suffer from severe performance degradation under distribution shifts,...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Generative Archetype-Grounded Item Representations for Sequential Recommendation

Sequential recommendation aims to predict users' next interaction with items by analyzing their historical behavior. However, the limited quality of item repres...

#research #paper #ai #machine-learning #nlp
5 days ago · ai · - · -

[Paper] Measuring Human Value Expression in Social Media Texts: Calibrated LLM Annotation and Encoder Transfer

Measuring subjective constructs in naturally occurring social media text requires annotation procedures that are theoretically grounded, empirically validated, ...

#research #paper #ai #nlp
5 days ago · ai · - · -

[Paper] An Uncertainty Estimation Framework for Dose Accumulation in Adaptive Radiotherapy: Application to CBCT-Guided Radiotherapy for Cervical Cancer

Background and purpose: oART enables daily plan adaptation to interfraction anatomical variations, but cumulative dose estimation remains limited by DIR, segmen...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Who Brought Easter Eggs to Eid? Auditing Cultural Translation of Math Word Problems Across Diverse Languages and Regions

Large language models are increasingly used to adapt math word problems for personalized learning at scale, but it remains an open question whether those adapta...

#research #paper #ai #nlp
5 days ago · ai · - · -

[Paper] Understanding and mitigating the risks of OpenClaw for non-technical users: A practical guide with Skill

OpenClaw has rapidly emerged as a transformative artificial intelligence (AI) agent framework, and its ability to autonomously execute complex, multi-step tasks...

#research #paper #ai #machine-learning
5 days ago · ai · - · -

[Paper] IPSM-Bench: A New Intermediate Phase Segmentation Benchmark in Microstructure Images of Zinc-Based Absorbable Biomaterials

Zinc-based alloys are indispensable emerging absorbable metallic biomaterials, and their macroscopic performance is governed by microstructural characteristics....

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Analog Quantum Asynchronous Event-Based Graph Neural Network

Asynchronous, event-based graph neural networks (AEGNNs) have recently emerged as an efficient paradigm for processing the sparse and high-temporal-resolution d...

#research #paper #ai #machine-learning
5 days ago · ai · - · -

[Paper] AnimaSpark: A Feed-Forward Method for Animating Arbitrary 3D Objects

While recent advancements in generative AI have substantially accelerated static 3D model creation workflows, the synthesis of category-agnostic 3D animations r...

#research #paper #ai #computer-vision
5 days ago · devops · - · -

[Paper] FairWave : A Fairness-Aware Asynchronous DAG-BFT Consensus

Combining asynchronous Byzantine Fault Tolerant (BFT) consensus with Proof-of-Stake (PoS) creates a trilemma between Sybil resistance, reward distribution fairn...

#research #paper #devops
5 days ago · devops · - · -

[Paper] FairWave : A Fairness-Aware Asynchronous DAG-BFT Consensus

Combining asynchronous Byzantine Fault Tolerant (BFT) consensus with Proof-of-Stake (PoS) creates a trilemma between Sybil resistance, reward distribution fairn...

#research #paper #devops
5 days ago · ai · - · -

[Paper] Quo Vadis, Visual In-Context Learning? A Unified Benchmark Across Domains and Tasks

Visual in-context learning has been proposed as a pathway towards dynamic models that can generate predictions based on a provided context and thereby can adapt...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?

The deployment of Large Language Model (LLM) agents for computer automation is accelerating, yet their ability to navigate complex, professional-grade productiv...

#research #paper #ai #machine-learning #nlp
5 days ago · software · - · -

[Paper] Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications

AI-powered code generation systems have transformed software development but introduce critical inference-time security vulnerabilities. This research presents ...

#research #paper #software
5 days ago · ai · - · -

[Paper] It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale post-training...

#research #paper #ai #nlp
5 days ago · software · - · -

[Paper] Early Comparative Evaluation of Transformer Models for Multilingual Software Vulnerability Detection

Software vulnerability detection is increasingly important as modern applications combine multiple programming languages. This paper presents an early comparati...

#research #paper #software
5 days ago · ai · - · -

[Paper] Trace Only What You Need: Structure-Aware On-Demand Hypergraph Memory for Long-Document Question Answering

Long-document question answering (QA) requires large language models (LLMs) to reason over evidence scattered across lengthy documents, where answers often depe...

#research #paper #ai #nlp
5 days ago · devops · - · -

[Paper] Dynamic Software Updates using CRDTs

This paper investigates how Conflict-free Replicated Data Types (CRDTs) can be used for dynamic software updates of distributed applications. We propose to mode...

#research #paper #devops
5 days ago · software · - · -

[Paper] From Quality Properties to Practice: A Guideline and Workflow for Explainability Requirements

Explainability is increasingly required in AI-enabled software systems to support transparency, user trust, and compliance. Yet, explainability requirements are...

#research #paper #software
5 days ago · software · - · -

[Paper] Writing Better Software Explanations: A Guideline-Based Approach

As software systems increasingly rely on natural-language explanations to address user-reported explanation needs in requirements communication and support, ens...

#research #paper #software
5 days ago · ai · - · -

[Paper] From Perception to Action: Can UI Interventions Foster Sustainable LLM Chatbot

LLM-powered chatbots are increasingly embedded in everyday workflows, raising sustainability concerns due to their energy use. Most mitigation strategies emphas...

#research #paper #ai #machine-learning
5 days ago · software · - · -

[Paper] Modular2Simple: A Tool for Modular Scenario Creation Based on the OpenSCENARIO Format

The rapid advancement of autonomous driving systems (ADS) has introduced significant challenges, particularly in the creation of realistic and complex scenarios...

#research #paper #software
5 days ago · software · - · -

[Paper] Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models

Code Language Models (CodeLMs) have become integral to software engineering, significantly advancing code intelligence tasks. However, their widespread adoption...

#research #paper #software
5 days ago · ai · - · -

[Paper] Inverse Probability Weighting and Age-of-Information Aggregation for Decentralized Federated Learning under Partial Reception

Decentralized Federated Learning (DFL) over lossy wireless networks faces two key challenges: selection bias, where updates from poor-quality links are systemat...

#research #paper #ai #machine-learning
5 days ago · software · - · -

[Paper] DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward arch...

#research #paper #software
5 days ago · software · - · -

[Paper] Watts and Debts of Agentic Frameworks: An Empirical Study (Registered Report)

Context: Every agentic AI system shipped to production carries two hidden risks: accumulated Technical Debt (TD) and unmonitored runtime energy costs. While fun...

#research #paper #software
5 days ago · devops · - · -

[Paper] Generalizing LCL Complexity Gaps to Unbounded Degree via Monadic Second-Order Properties

The last decade of research on the LOCAL model has seen tremendous progress in understanding locally checkable labeling (LCL) problems, culminating in an almost...

#research #paper #devops
5 days ago · software · - · -

[Paper] Exploring and Complementing End Users' Requirements in IoT enabled System

End users create IoT automation rules via trigger action programming, but their expressions are often fragmented, capturing device operations rather than high l...

#research #paper #software
5 days ago · devops · - · -

[Paper] A Hybrid Edge-Cloud Architecture for Low-Latency Entitlement Verification in Resource-Constrained Devices

As digital media consumption shifts toward large-scale Over-the-Top (OTT) platforms, the efficiency of the control plane, specifically entitlement and identity ...

#research #paper #devops
5 days ago · ai · - · -

[Paper] Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

Local deployment of large Mixture-of-Experts (MoE) models falls short of the service quality achieved in cloud-scale environments, even under low-concurrency wo...

#research #paper #ai #machine-learning

Newer posts

Older posts