[Paper] Visualizing token importance for black-box language models

Published: 1 month ago (December 12, 2025 at 09:01 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.11573v1

Overview

The paper introduces Distribution‑Based Sensitivity Analysis (DBSA), a model‑agnostic technique that lets developers peek inside a black‑box large language model (LLM) and see how each input token influences the generated output. By treating the LLM as a stochastic oracle—without needing gradients or internal weights—DBSA offers a quick, plug‑and‑play way to audit models that are only reachable via API calls, a common scenario in production systems handling legal, medical, or compliance‑critical text.

Key Contributions

Model‑agnostic token‑level sensitivity metric – Works with any LLM accessible through a black‑box API, no need for source code or gradient access.
Distribution‑based approach – Estimates token importance by comparing output distributions under controlled perturbations, handling the inherent randomness of LLM sampling.
Lightweight, plug‑and‑play tool – Requires only a handful of API calls per token, making it practical for real‑time debugging or periodic audits.
Visualization framework – Generates intuitive heat‑maps that highlight which tokens the model “relies on” for a given generation.
Empirical validation – Demonstrates that DBSA surfaces sensitivities missed by existing interpretability methods (e.g., attention‑based scores, gradient‑based saliency) across several benchmark prompts.

Methodology

Prompt Perturbation – For each token t in the input prompt, DBSA creates a set of n perturbed prompts where t is replaced by a neutral placeholder (e.g., a mask token or a synonym).
Output Sampling – The black‑box LLM is queried k times for each perturbed prompt, collecting a sample of generated continuations (or token‑level probabilities).
Distribution Comparison – The original output distribution (from the unperturbed prompt) is compared to each perturbed distribution using a statistical distance (e.g., Jensen‑Shannon divergence).
Sensitivity Score – The average distance across the k samples becomes the sensitivity score for token t. Higher scores indicate that the model’s output changes noticeably when t is altered.
Visualization – Scores are mapped onto the original prompt as a heat‑map, letting users instantly spot “high‑impact” tokens.

Because the method only relies on repeated forward passes, it sidesteps the need for gradients, making it compatible with any hosted LLM (OpenAI, Anthropic, Cohere, etc.).

Results & Findings

Experiment	Setup	Key Observation
Synthetic bias probe	Prompt containing gendered nouns, ask LLM to generate occupation	DBSA highlighted gender tokens as highly sensitive, whereas attention scores were diffuse.
Legal clause analysis	Prompt with a contract clause, ask LLM to summarize	Tokens related to liability and dates showed the strongest influence on the summary output.
Medical note generation	Prompt with patient symptoms, request a diagnosis	Symptom tokens received the highest sensitivity scores, confirming clinical relevance.
Comparison with baselines	Gradient‑based saliency (when available) and attention weights	DBSA consistently produced clearer, more localized importance maps, especially under stochastic sampling (top‑p, temperature > 0).

Overall, DBSA succeeded in flagging tokens that, when altered, caused statistically significant shifts in the LLM’s response—often surfacing subtle dependencies that other methods missed.

Practical Implications

Compliance Audits – Regulators can use DBSA to verify that a model’s decisions are not unduly driven by protected attributes (e.g., race, gender) hidden in the prompt.
Prompt Engineering – Developers can iteratively refine prompts, removing or re‑phrasing high‑sensitivity tokens that cause unwanted model behavior.
Safety Guardrails – By monitoring sensitivity scores in production, teams can trigger alerts when a new prompt configuration introduces unexpected token dependencies.
Vendor‑agnostic Testing – Since DBSA works with any API‑only LLM, it fits naturally into CI/CD pipelines for products that rely on third‑party language services.
User‑Facing Explainability – Front‑end tools can display token‑heatmaps to end‑users (e.g., lawyers reviewing AI‑generated contracts), increasing trust and transparency.

Limitations & Future Work

Sampling Cost – The need for multiple forward passes per token can become expensive for long prompts or high‑throughput services; the authors suggest adaptive sampling to mitigate this.
Perturbation Choice – Replacing a token with a generic mask may not capture nuanced semantic shifts; exploring synonym or paraphrase perturbations could improve fidelity.
Statistical Distance Sensitivity – Different divergence measures may yield varying scores; a systematic study of alternatives is left for future research.
Dynamic Contexts – DBSA currently assumes a static prompt; extending it to multi‑turn conversations or streaming outputs remains an open challenge.

The authors envision a richer toolbox that combines DBSA with causal inference techniques and integrates directly into API monitoring dashboards.

Authors

Paulius Rauba
Qiyao Wei
Mihaela van der Schaar

Paper Information

arXiv ID: 2512.11573v1
Categories: cs.CL, cs.LG
Published: December 12, 2025
PDF: Download PDF

[Paper] Visualizing token importance for black-box language models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] From Signal to Turn: Interactional Friction in Modular Speech-to-Speech Pipelines

[Paper] Automating Historical Insight Extraction from Large-Scale Newspaper Archives via Neural Topic Modeling

[Paper] Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols

[Paper] DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry