Source

arXiv

1659 posts from this source

Sort:

1 week ago · ai · - · -

[Paper] Detection and Interpretability Analysis of Quotation Errors by Large Language Models

Purpose - Quotation error refers to the inconsistency between cited information and its original source. This phenomenon leads to a series of negative impacts, ...

#research #paper #ai #nlp
1 week ago · software · - · -

[Paper] LLM vs. Human Unit Tests: Fault Detection on Real Python Bugs

Large language models (LLMs) have shown considerable promise for automated unit test generation, yet their practical effectiveness relative to human-written tes...

#research #paper #software
1 week ago · ai · - · -

[Paper] Titans-as-a-Layer: Test-Time Memory for Conversational Speech Emotion Recognition

Speech emotion recognition (SER) is commonly formulated as utterance-level classification, although conversational emotion depends on a speaker's usual vocal ra...

#research #paper #ai #machine-learning #nlp
1 week ago · ai · - · -

[Paper] Calibration of Structured Ignorance Certificates for Diagnosing Unknown Unknowns in Reasoning Models

Large language models frequently fail in a characteristic way: rather than acknowledging ignorance, they produce fluent but incorrect answers to questions that ...

#research #paper #ai #machine-learning #nlp
1 week ago · ai · - · -

[Paper] EinSort: Sorting is All We Need for Tensorizing LLM

Tensor networks provide efficient representations for compressing large neural networks. By carefully designing shapes and topologies, they can significantly re...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Inside the LLM Word Factory

Transformer language models process input provided as subword fragments, but natural language semantics usually rely on word-level concepts. Detokenization is t...

#research #paper #ai #nlp
1 week ago · software · - · -

[Paper] FusionVul: A Multimodal Feature Fusion Framework for Source Code Vulnerability Detection

Source code vulnerability detection remains a long-standing challenge due to the increasing scale, structural complexity, and semantic diversity of modern codeb...

#research #paper #software
1 week ago · ai · - · -

[Paper] Quantitative Promise Theory: Intentionality and Inference in Autonomous Agents

I discuss some quantitative representations of Promise Theory for processes involving autonomous agents. Agent models are common in software systems, machine le...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Quantitative Promise Theory: Intentionality and Inference in Autonomous Agents

I discuss some quantitative representations of Promise Theory for processes involving autonomous agents. Agent models are common in software systems, machine le...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Ishigaki-IDS: An Open-Weight Verifier-Aware Model for Information Delivery Specification Drafting in Building Information Modeling

Building Information Modeling (BIM) projects require information requirements to be described as machine-checkable Information Delivery Specification (IDS) file...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] PAEC: Position-Aware Entropy Calibration for LLM Reasoning in RLVR

Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the po...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA

Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For example, a robot pulls a locked cabinet drawer, f...

#research #paper #ai #machine-learning #computer-vision
1 week ago · ai · - · -

[Paper] AgentTrust: A Self-Improving Trust Layer for AI-Agent Actions

AI agents increasingly take consequential actions -- shell commands, cloud operations, and arbitrary tool-calls -- so a trust layer must decide, per action, whe...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Scaffold Effects on GAIA: A Controlled Comparison

Published agent capability scores conflate what a model can do with what its scaffold lets it do, and the magnitude of this elicitation gap is not well characte...

#research #paper #ai #machine-learning #nlp
1 week ago · ai · - · -

[Paper] A Joint Finite-Sample Certificate for Adaptive Selective Conformal Risk Control

Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-b...

#research #paper #ai #machine-learning #nlp
1 week ago · devops · - · -

[Paper] Unifying von-Neumann HPC and Neuromorphic Acceleration via the EBRAINS Research Infrastructure: A Framework for High-Performance Workflows

Modern scientific workflows increasingly span diverse computing architectures, yet executing a single computational model across disparate systems often forces ...

#research #paper #devops
1 week ago · ai · - · -

[Paper] Friend or Foe? Language as an ideological switch in open-weight LLMs under Russian disinformation stress

As Russia's war against Ukraine extends into generative AI, large language models (LLMs) adapted for local post-Soviet languages are deployed in contested infor...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] Back on Track: Aligning Rewards and States for Reasoning in Diffusion Large Language Models

Reinforcement learning (RL) holds immense promise for enhancing the reasoning capabilities of diffusion large language models (dLLMs). However, progress is fund...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey

Software engineering agents (SWE agents) increasingly work through tool-mediated trajectories in real repositories, yet their behavior remains difficult to char...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Explaining Black-Box Language Models: Learning to Optimize Linguistically-Structured Word Subsets

As deep language models (DLMs) are increasingly deployed in high-stakes domains such as healthcare, understanding their decision rationale becomes paramount for...

#research #paper #ai #machine-learning #nlp
1 week ago · ai · - · -

[Paper] PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A d...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] FlashCP: Load-Balanced Communication-Efficient Context Parallelism for LLM Training

Context parallelism (CP) is essential for training large-scale, long-context language models, as it partitions sequences to reduce memory overhead. However, exi...

#research #paper #ai #machine-learning
1 week ago · software · - · -

[Paper] An Empirical Comparison of General Context-Free Parsers

Parsing underpins a vast range of software engineering tasks, from compilers and static analyzers to language servers and fuzz testing tools. Yet most parsers d...

#research #paper #software
1 week ago · software · - · -

[Paper] When LLMs Invent Rust Crates: An Empirical Study of Hallucination Patterns and Mitigation

Large Language Models (LLMs) have become powerful tools for code generation, yet they remain prone to hallucinations-producing plausible but incorrect or fabric...

#research #paper #software
1 week ago · ai · - · -

[Paper] Impacts of Histories and Models on LLM Grading: A Study in Advanced Software Engineering Courses

Graduate-level research reading report assessment creates a substantial labor burden for educators. While large language models (LLMs) hold great potential for ...

#research #paper #ai #machine-learning #nlp
1 week ago · software · - · -

[Paper] Minimum Complete MR Subsets under Semantic-Mutation Fault Models: A Support-Set Domination Boundary

This paper asks when MR-subset selection is a real mutant-level requirement for minimum complete evidence in metamorphic testing rather than a coarse fault-clas...

#research #paper #software
1 week ago · devops · - · -

[Paper] Quantifying and Defending against the Privacy Risk in Logit-based Federated Learning

Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among clients. Unlike traditional parameter-bas...

#research #paper #devops
1 week ago · ai · - · -

[Paper] AlignFed: Alignment-Aware Asynchronous Federated Fine-Tuning for Large Language Models in Heterogeneous Edge Environments

Large Language Models (LLMs) have significantly propelled the advancement of edge intelligence and have been widely deployed across various scenarios, including...

#research #paper #ai #nlp
1 week ago · software · - · -

[Paper] TICoder: A Repository-Level Code Generation Framework with Test-Driven Planning and Implementation-Aware Reuse

Repository-level code generation with Large Language Models (LLMs) remains challenging, primarily due to complex dependencies and limited context windows. Recen...

#research #paper #software
1 week ago · ai · - · -

[Paper] Gray-Box Optimization and the Vertex Coloring Problem

Gray-box optimization is an approach for making some problem-specific information available to the algorithm while still relying on fitness information as the m...

#research #paper #ai
1 week ago · ai · - · -

[Paper] Gray-Box Optimization and the Vertex Coloring Problem

Gray-box optimization is an approach for making some problem-specific information available to the algorithm while still relying on fitness information as the m...

#research #paper #ai
1 week ago · software · - · -

[Paper] Identifying unique developers in OSS projects: A family of models

Organizational and logical coupling metrics require reliable identification of unique developers. In OSS, commit metadata is limited to names and emails, and th...

#research #paper #software
1 week ago · devops · - · -

[Paper] Demand-Driven Vulnerability Detection for Cloud Security Posture Management: Removing Human Rule Authoring from the Disclosure-to-Protection Critical Path

Cloud Security Posture Management (CSPM) systems detect known vulnerabilities by maintaining a rule set, distributing it to customers, and evaluating it against...

#research #paper #devops
1 week ago · ai · - · -

[Paper] Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method

LLM-agent workflows chain model calls and tool invocations, and spend most of their wall-clock time waiting on upstream operations before downstream ones can st...

#research #paper #ai #machine-learning
1 week ago · ai · - · -

[Paper] Representational Similarity and Model Behavior in Multi-Agent Interaction

Researchers have shown that neural similarity among humans predicts social closeness and cooperative success, whereas innovation often emerges from interactions...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] Representational Similarity and Model Behavior in Multi-Agent Interaction

Researchers have shown that neural similarity among humans predicts social closeness and cooperative success, whereas innovation often emerges from interactions...

#research #paper #ai #nlp
1 week ago · devops · - · -

[Paper] Large-Scale Regularized Matching on GPU Clusters

Production decision systems such as ad allocation or content matching involve millions of users and thousands of items, reducing to large-scale linear programs ...

#research #paper #devops
1 week ago · ai · - · -

[Paper] How reliable are LLMs when it comes to playing dice?

We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability problems. We co...

#research #paper #ai #machine-learning #nlp
1 week ago · ai · - · -

[Paper] UniSHARP: Universal Sharp Monocular View Synthesis

In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera syst...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] UniSHARP: Universal Sharp Monocular View Synthesis

In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera syst...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Humans learn from social life. Simulating this process with LLM-powered agents represents a promising research direction, raising a natural question: whether LL...

#research #paper #ai #nlp
1 week ago · ai · - · -

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention...

#research #paper #ai #machine-learning #nlp #computer-vision
1 week ago · ai · - · -

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention...

#research #paper #ai #machine-learning #nlp #computer-vision
1 week ago · ai · - · -

[Paper] Streaming Video Generation with Streaming Force Control

We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video mo...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Streaming Video Generation with Streaming Force Control

We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video mo...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Differences in Detection: Explainability Where it Matters

We propose Differences in Detection (DnD), an intuitive method to compare two object detection models. Based on the same matching algorithm, it complements the ...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Differences in Detection: Explainability Where it Matters

We propose Differences in Detection (DnD), an intuitive method to compare two object detection models. Based on the same matching algorithm, it complements the ...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. However, they struggle to function as off-the-shelf emb...

#research #paper #ai #nlp

Newer posts

Older posts