[Paper] Towards Faithful Multimodal Concept Bottleneck Models

Published: 1 month ago (March 13, 2026 at 12:56 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2603.13163v1

Overview

The paper introduces f‑CBM, a new framework that brings faithful concept‑bottleneck models (CBMs) to multimodal AI—think vision‑language systems that can both predict and explain their decisions in human‑readable terms. By tackling concept detection and “leakage” (when hidden information sneaks into the explanation layer) together, f‑CBM delivers more trustworthy predictions without sacrificing accuracy.

Key Contributions

Unified multimodal CBM built on a vision‑language backbone that works for image‑text pairs as well as text‑only tasks.
Differentiable leakage loss that penalizes any task‑relevant information leaking into the concept layer, encouraging pure, interpretable representations.
Kolmogorov‑Arnold Network (KAN) prediction head that offers enough expressive power to improve concept detection while keeping the model tractable.
Comprehensive empirical evaluation showing the best trade‑off among task accuracy, concept detection quality, and leakage reduction compared to prior CBM variants.
Plug‑and‑play design: f‑CBM can be attached to existing vision‑language models (e.g., CLIP, ViLT) with minimal engineering effort.

Methodology

Backbone encoder – A standard vision‑language transformer (e.g., ViLT) processes raw inputs (images, text, or both) and produces a shared latent representation.
Concept bottleneck layer – The latent features are projected onto a set of predefined, human‑interpretable concepts (e.g., “has wheels”, “mentions price”). This projection is learned jointly with the rest of the network.
Leakage mitigation – A leakage loss measures how much task‑relevant signal can be recovered from the concept vectors using a lightweight probe network. The loss is differentiable, so during training the model is explicitly discouraged from embedding hidden cues in the bottleneck.
Prediction head – Instead of a simple linear classifier, the authors employ a Kolmogorov‑Arnold Network. KANs approximate any continuous function with a sum of univariate ridge functions, giving the model enough flexibility to map pure concepts to the final task output without needing to hide extra information.
Joint optimization – The total loss combines the standard task loss (e.g., classification cross‑entropy), a concept‑prediction loss (ensuring each concept is correctly detected), and the leakage loss. Gradient descent updates all components simultaneously, so concept detection and leakage control co‑evolve.

Results & Findings

Dataset	Task Accuracy ↑	Concept Detection F1 ↑	Leakage (lower is better)
Multimodal VQA‑CB (images + questions)	+2.3 % over baseline CBM	+4.1 % F1	‑35 % reduction
Text‑only sentiment with concepts	+1.1 % over baseline	+3.6 % F1	‑28 % reduction
Image‑only attribute classification	+0.8 %	+2.9 % F1	‑31 % reduction

Trade‑off: f‑CBM consistently sits on the Pareto frontier—improving interpretability (higher concept F1, lower leakage) while keeping or slightly boosting predictive performance.
Ablation: Removing the leakage loss caused a sharp rise in hidden information (leakage up 70 %) even though task accuracy stayed similar, confirming the loss’s role in enforcing faithful explanations.
KAN vs. linear head: The KAN head improved concept detection by ~3 % without inflating leakage, showing that expressive heads can replace the need for “cheating” via hidden signals.

Practical Implications

Debuggable AI services – Deployments that must justify decisions (e.g., medical imaging triage, e‑commerce recommendation) can now surface human‑readable concepts while being confident those concepts truly drive the output.
Regulatory compliance – In jurisdictions demanding “explainable AI”, f‑CBM offers a quantifiable leakage metric that auditors can inspect.
Rapid prototyping – Because f‑CBM plugs into existing vision‑language models, teams can add concept‑level interpretability to products without retraining from scratch.
Active learning & data collection – Accurate concept detectors enable targeted labeling (e.g., ask annotators to verify “has wheels” when the model is uncertain), reducing annotation costs.
Cross‑modal consistency checks – For multimodal systems, developers can verify that the same concept is detected consistently across image and text inputs, catching modality‑specific biases early.

Limitations & Future Work

Concept definition dependency – The framework assumes a pre‑specified set of concepts; discovering or refining concepts automatically remains an open challenge.
Scalability of KANs – While KANs are more expressive than linear heads, they add modest computational overhead, which could be a bottleneck for very large‑scale deployments.
Evaluation scope – Experiments focus on classification‑type tasks; extending f‑CBM to generation (e.g., captioning, code synthesis) will require new leakage metrics.
User studies – The paper measures fidelity mathematically but does not assess how end‑users interpret the concepts; future work could involve human‑centered evaluations.

Overall, f‑CBM pushes the state of the art in making multimodal AI both accurate and accountable, offering a practical pathway for developers who need trustworthy, concept‑level explanations.

Authors

Pierre Moreau
Emeline Pineau Ferrand
Yann Choho
Benjamin Wong
Annabelle Blangero
Milan Bhan

Paper Information

arXiv ID: 2603.13163v1
Categories: cs.CV, cs.LG
Published: March 13, 2026
PDF: Download PDF

[Paper] Towards Faithful Multimodal Concept Bottleneck Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

[Paper] Representation Learning for Spatiotemporal Physical Systems

[Paper] Visual-ERM: Reward Modeling for Visual Equivalence

[Paper] Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics