[Paper] Towards Faithful Multimodal Concept Bottleneck Models

Published: (March 13, 2026 at 12:56 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2603.13163v1

Overview

The paper introduces f‑CBM, a new framework that brings faithful concept‑bottleneck models (CBMs) to multimodal AI—think vision‑language systems that can both predict and explain their decisions in human‑readable terms. By tackling concept detection and “leakage” (when hidden information sneaks into the explanation layer) together, f‑CBM delivers more trustworthy predictions without sacrificing accuracy.

Key Contributions

  • Unified multimodal CBM built on a vision‑language backbone that works for image‑text pairs as well as text‑only tasks.
  • Differentiable leakage loss that penalizes any task‑relevant information leaking into the concept layer, encouraging pure, interpretable representations.
  • Kolmogorov‑Arnold Network (KAN) prediction head that offers enough expressive power to improve concept detection while keeping the model tractable.
  • Comprehensive empirical evaluation showing the best trade‑off among task accuracy, concept detection quality, and leakage reduction compared to prior CBM variants.
  • Plug‑and‑play design: f‑CBM can be attached to existing vision‑language models (e.g., CLIP, ViLT) with minimal engineering effort.

Methodology

  1. Backbone encoder – A standard vision‑language transformer (e.g., ViLT) processes raw inputs (images, text, or both) and produces a shared latent representation.
  2. Concept bottleneck layer – The latent features are projected onto a set of predefined, human‑interpretable concepts (e.g., “has wheels”, “mentions price”). This projection is learned jointly with the rest of the network.
  3. Leakage mitigation – A leakage loss measures how much task‑relevant signal can be recovered from the concept vectors using a lightweight probe network. The loss is differentiable, so during training the model is explicitly discouraged from embedding hidden cues in the bottleneck.
  4. Prediction head – Instead of a simple linear classifier, the authors employ a Kolmogorov‑Arnold Network. KANs approximate any continuous function with a sum of univariate ridge functions, giving the model enough flexibility to map pure concepts to the final task output without needing to hide extra information.
  5. Joint optimization – The total loss combines the standard task loss (e.g., classification cross‑entropy), a concept‑prediction loss (ensuring each concept is correctly detected), and the leakage loss. Gradient descent updates all components simultaneously, so concept detection and leakage control co‑evolve.

Results & Findings

DatasetTask Accuracy ↑Concept Detection F1 ↑Leakage (lower is better)
Multimodal VQA‑CB (images + questions)+2.3 % over baseline CBM+4.1 % F1‑35 % reduction
Text‑only sentiment with concepts+1.1 % over baseline+3.6 % F1‑28 % reduction
Image‑only attribute classification+0.8 %+2.9 % F1‑31 % reduction
  • Trade‑off: f‑CBM consistently sits on the Pareto frontier—improving interpretability (higher concept F1, lower leakage) while keeping or slightly boosting predictive performance.
  • Ablation: Removing the leakage loss caused a sharp rise in hidden information (leakage up 70 %) even though task accuracy stayed similar, confirming the loss’s role in enforcing faithful explanations.
  • KAN vs. linear head: The KAN head improved concept detection by ~3 % without inflating leakage, showing that expressive heads can replace the need for “cheating” via hidden signals.

Practical Implications

  • Debuggable AI services – Deployments that must justify decisions (e.g., medical imaging triage, e‑commerce recommendation) can now surface human‑readable concepts while being confident those concepts truly drive the output.
  • Regulatory compliance – In jurisdictions demanding “explainable AI”, f‑CBM offers a quantifiable leakage metric that auditors can inspect.
  • Rapid prototyping – Because f‑CBM plugs into existing vision‑language models, teams can add concept‑level interpretability to products without retraining from scratch.
  • Active learning & data collection – Accurate concept detectors enable targeted labeling (e.g., ask annotators to verify “has wheels” when the model is uncertain), reducing annotation costs.
  • Cross‑modal consistency checks – For multimodal systems, developers can verify that the same concept is detected consistently across image and text inputs, catching modality‑specific biases early.

Limitations & Future Work

  • Concept definition dependency – The framework assumes a pre‑specified set of concepts; discovering or refining concepts automatically remains an open challenge.
  • Scalability of KANs – While KANs are more expressive than linear heads, they add modest computational overhead, which could be a bottleneck for very large‑scale deployments.
  • Evaluation scope – Experiments focus on classification‑type tasks; extending f‑CBM to generation (e.g., captioning, code synthesis) will require new leakage metrics.
  • User studies – The paper measures fidelity mathematically but does not assess how end‑users interpret the concepts; future work could involve human‑centered evaluations.

Overall, f‑CBM pushes the state of the art in making multimodal AI both accurate and accountable, offering a practical pathway for developers who need trustworthy, concept‑level explanations.

Authors

  • Pierre Moreau
  • Emeline Pineau Ferrand
  • Yann Choho
  • Benjamin Wong
  • Annabelle Blangero
  • Milan Bhan

Paper Information

  • arXiv ID: 2603.13163v1
  • Categories: cs.CV, cs.LG
  • Published: March 13, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »