NLP — Page 5 | EUNO.NEWS

Sort:

3 weeks ago · ai · - · -

[Paper] CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs ...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verif...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] Context Over Content: Exposing Evaluation Faking in Automated Judges

The LLM-as-a-judge paradigm has become the operational backbone of automated AI evaluation pipelines, yet rests on an unverified assumption: that judges evaluat...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Learning to Think Like a Cartoon Captionist: Incongruity-Resolution Supervision for Multimodal Humor Understanding

Humor is one of the few cognitive tasks where getting the reasoning right matters as much as getting the answer right. While recent work evaluates humor underst...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

Machine learning in high-stakes domains such as healthcare requires not only strong predictive performance but also reliable uncertainty quantification (UQ) to ...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation

Simulating group-level user behavior enables scalable counterfactual evaluation of merchant strategies without costly online experiments. However, building a tr...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] AdaSplash-2: Faster Differentiable Sparse Attention

Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context training. A promising line of ...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Fabricator or dynamic translator?

LLMs are proving to be adept at machine translation although due to their generative nature they may at times overgenerate in various ways. These overgeneration...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution

This beta technical report asks how reusable experience should be represented so that it can function as effective test-time control and as a substrate for iter...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

Contract intelligence & AI-powered document automation system

Overview Most document workflows break at the same place: a contract is uploaded, then downloaded, and a human spends 30–60 minutes scanning for key clauses, r...

#contract intelligence #document automation #AI-powered extraction #NLP #unstructured data #risk detection #workflow automation
3 weeks ago · ai · - · -

[Paper] SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

Spatial reasoning over three-dimensional scenes is a core capability for embodied intelligence, yet continuous model improvement remains bottlenecked by the cos...

#research #paper #ai #nlp #computer-vision
3 weeks ago · ai · - · -

[Paper] From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its potentia...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs

Evaluating LLMs is challenging, as benchmark scores often fail to capture models' real-world usefulness. Instead, users often rely on ``vibe-testing'': informal...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Rhetorical Questions in LLM Representations: A Linear Probing Study

Rhetorical questions are asked not to seek information but to persuade or signal stance. How large language models internally represent them remains unclear. We...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis

LLM reasoning traces suffer from complex flaws -- *Step Internal Flaws* (logical errors, hallucinations, etc.) and *Step-wise Flaws* (overthinking, underthinkin...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

GUI grounding, which localizes interface elements from screenshots given natural language queries, remains challenging for small icons and dense layouts. Test-t...

#research #paper #ai #machine-learning #nlp #computer-vision
3 weeks ago · ai · - · -

[Paper] Interpretable Stylistic Variation in Human and LLM Writing Across Genres, Models, and Decoding Strategies

Large Language Models (LLMs) are now capable of generating highly fluent, human-like text. They enable many applications, but also raise concerns such as large ...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] From Weights to Activations: Is Steering the Next Frontier of Adaptation?

Post-training adaptation of language models is commonly achieved through parameter updates or input-based methods such as fine-tuning, parameter-efficient adapt...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse r...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] Diffusion Language Models for Speech Recognition

Diffusion language models have recently emerged as a leading alternative to standard language models, due to their ability for bidirectional attention and paral...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation

Automated code generation remains a persistent challenge in software engineering, as conventional multi-agent frameworks are often constrained by static plannin...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene gra...

#research #paper #ai #nlp #computer-vision
3 weeks ago · ai · - · -

[Paper] Toward Autonomous Long-Horizon Engineering for ML Research

Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task compr...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness when trivially constrained? We show that simpl...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to compre...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] Accelerating Speculative Decoding with Block Diffusion Draft Trees

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then ve...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts

Optical character recognition (OCR) has advanced rapidly with the rise of vision-language models, yet evaluation has remained concentrated on a small cluster of...

#research #paper #ai #nlp #computer-vision
3 weeks ago · ai · - · -

[Paper] MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

Speech-to-speech language models have recently emerged to enhance the naturalness of conversational AI. In particular, full-duplex models are distinguished by t...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] MetFuse: Figurative Fusion between Metonymy and Metaphor

Metonymy and metaphor often co-occur in natural language, yet computational work has studied them largely in isolation. We introduce a framework that transforms...

#research #paper #ai #nlp
3 weeks ago · ai · - · -

[Paper] Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss

Multilingual benchmarks guide the development of frontier models. Yet multilingual evaluations reported by frontier models are structured similar to popular rea...

#research #paper #ai #machine-learning #nlp
3 weeks ago · ai · - · -

[Paper] CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation

Large language models (LLMs) can generate code from natural language, but the extent to which they capture intended program behavior remains unclear. Executable...

#research #paper #ai #nlp
0 month ago · ai · - · -

[Paper] Detecting Safety Violations Across Many Agent Traces

To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and som...

#research #paper #ai #machine-learning #nlp
0 month ago · ai · - · -

[Paper] Saar-Voice: A Multi-Speaker Saarbrücken Dialect Speech Corpus

Natural language processing (NLP) and speech technologies have made significant progress in recent years; however, they remain largely focused on standardized l...

#research #paper #ai #nlp
0 month ago · ai · - · -

[Paper] Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. Wh...

#research #paper #ai #nlp
0 month ago · ai · - · -

[Paper] CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation

With the recent progress of Large Language Models (LLMs), there is a growing interest in applying these models to solve complex and challenging problems. Modern...

#research #paper #ai #nlp
0 month ago · ai · - · -

[Paper] C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also in...

#research #paper #ai #machine-learning #nlp
0 month ago · ai · - · -

[Paper] ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes...

#research #paper #ai #machine-learning #nlp #computer-vision
0 month ago · ai · - · -

[Paper] General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics....

#research #paper #ai #machine-learning #nlp
0 month ago · ai · - · -

[Paper] Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in parallel a...

#research #paper #ai #nlp
0 month ago · ai · - · -

[Paper] HistLens: Mapping Idea Change across Concepts and Corpora

Language change both reflects and shapes social processes, and the semantic evolution of foundational concepts provides a measurable trace of historical and soc...

#research #paper #ai #nlp
0 month ago · ai · - · -

[Paper] LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

Continuous diffusion models have achieved strong performance across domains such as images. However, in language modeling, prior continuous diffusion language m...

#research #paper #ai #machine-learning #nlp
0 month ago · ai · - · -

[Paper] K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks

We present this as a negative result with an explanatory mechanism, not as a formal upper bound. Predictive coding networks (PCNs) admit a K-way energy probe in...

#research #paper #ai #machine-learning #nlp
1 month ago · ai · - · -

[Paper] Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routinely bypass the...

#research #paper #ai #machine-learning #nlp
1 month ago · ai · - · -

[Paper] Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision

Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the provided evide...

#research #paper #ai #machine-learning #nlp
1 month ago · ai · - · -

[Paper] VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

Vision-language models (VLMs) still struggle with visual perception tasks such as spatial understanding and viewpoint recognition. One plausible contributing fa...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai · - · -

[Paper] VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certainty, whic...

#research #paper #ai #machine-learning #nlp #computer-vision

Newer posts

Older posts