[Paper] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

Published: 3 days ago (February 25, 2026 at 12:50 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.22144v1

Overview

Large Vision‑Language Models (LVLMs) have become the go‑to backbone for multimodal assistants, but they often “hallucinate” objects that aren’t actually in the picture. This paper digs into why that happens and proposes a lightweight, training‑free decoding tweak—NoLan—that dramatically cuts hallucinations without sacrificing performance.

Key Contributions

Root‑cause analysis: Systematic experiments show that the language decoder’s strong priors, not the vision encoder, are the primary driver of object hallucinations.
NoLan framework: Introduces a dynamic, inference‑time suppression of language priors based on the discrepancy between multimodal and text‑only output distributions.
Training‑free solution: No additional model parameters or fine‑tuning are required; the method works as a plug‑in to any existing LVLM.
Broad validation: Demonstrates consistent hallucination reduction across multiple LVLMs (e.g., LLaVA‑1.5 7B, Qwen‑VL 7B) and tasks (POPE, VQA, captioning).
Open‑source release: Code and integration scripts are publicly available, encouraging rapid adoption.

Methodology

Decomposing the pipeline – The authors isolate the vision encoder and language decoder by feeding the same visual features to a text‑only language model and comparing its output distribution to that of the full LVLM.
Measuring prior influence – They compute the KL‑divergence between the multimodal output distribution and the text‑only baseline. A large divergence signals that the language decoder is injecting strong priors.
Dynamic suppression – During decoding, NoLan scales down the logits (raw token scores) that are overly boosted by language priors. The scaling factor is a function of the observed divergence: the bigger the gap, the stronger the suppression.
Implementation – The technique is a thin wrapper around the standard beam‑search or sampling decoder; it requires no extra training data, gradients, or architectural changes.

Results & Findings

Model	Task	Baseline Accuracy	NoLan Accuracy	Δ Improvement
LLaVA‑1.5 7B	POPE (hallucination benchmark)	71.3 %	77.8 %	+6.5 %
Qwen‑VL 7B	POPE	68.9 %	76.1 %	+7.2 %
Various LVLMs	VQA & Image Captioning	Comparable or slightly lower	Same or higher	≤ 0 % loss, often +1‑2 %

Key takeaways

NoLan consistently lowers the rate of fabricated objects across models and tasks.
Because the method only modifies the decoding logits, there is virtually no overhead (≈ 1 ms per inference).
The approach does not degrade the model’s ability to generate fluent, context‑aware language.

Practical Implications

Deploy‑ready safety layer: Teams can integrate NoLan into existing LVLM services (e.g., chat‑bots, visual assistants) to make outputs more trustworthy without retraining.
Regulatory compliance: Reducing hallucinations helps meet emerging AI transparency standards that require verifiable outputs.
Cost‑effective improvement: Since NoLan is inference‑only, it avoids the compute expense of fine‑tuning large multimodal models.
Better user experience: Fewer false object mentions mean clearer instructions for downstream pipelines (e.g., robotics, AR overlays) that rely on accurate visual grounding.

Limitations & Future Work

Scope of hallucinations: The study focuses on object hallucinations; other types (e.g., attribute or relational hallucinations) remain unaddressed.
Dependency on baseline text‑only model: The effectiveness of the suppression factor hinges on the quality of the text‑only decoder used for comparison.
Potential over‑suppression: In edge cases where the language prior is actually correct (e.g., commonsense inference), NoLan might dampen useful information.
Future directions: Extending the dynamic suppression concept to handle attribute hallucinations, exploring adaptive thresholds per token type, and integrating visual grounding checks for a tighter vision‑language feedback loop.

Authors

Lingfeng Ren
Weihao Yu
Runpeng Yu
Xinchao Wang

Paper Information

arXiv ID: 2602.22144v1
Categories: cs.CV, cs.AI, cs.CL
Published: February 25, 2026
PDF: Download PDF

[Paper] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

[Paper] A Dataset is Worth 1 MB

[Paper] Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning

[Paper] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks