[Paper] QIME: Constructing Interpretable Medical Text Embeddings via Ontology-Grounded Questions

Published: (March 2, 2026 at 05:18 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.01690v1

Overview

The paper introduces QIME, a new way to turn medical text (e.g., clinical notes, research abstracts) into embeddings that are both high‑performing and human‑readable. Instead of opaque dense vectors, QIME represents each document as a series of yes/no answers to clinically grounded questions derived from biomedical ontologies. This makes the embeddings interpretable for clinicians and developers while still rivaling the accuracy of black‑box models.

Key Contributions

  • Ontology‑grounded question generation: Leverages medical concept signatures (e.g., SNOMED‑CT, MeSH) to automatically craft fine‑grained, clinically meaningful binary questions.
  • Training‑free embedding construction: Bypasses the need to train a separate classifier for every question; answers are obtained directly from the language model’s masked‑language‑model (MLM) probabilities.
  • Interpretability without sacrificing performance: Achieves state‑of‑the‑art results on biomedical similarity, clustering, and retrieval tasks, closing the gap to dense black‑box encoders.
  • Scalable and modular design: New medical domains or ontologies can be plugged in with minimal engineering effort.

Methodology

  1. Concept Signature Extraction

    • For each medical ontology cluster (e.g., “cardiovascular diseases”), the authors collect a set of representative terms (signatures) using TF‑IDF and ontology hierarchy information.
  2. Question Generation

    • A template (“Does the text mention X?”) is instantiated with each signature term, producing a pool of candidate yes/no questions.
    • A lightweight scoring model ranks questions by semantic atomicity (how specific and non‑redundant they are) and clinical relevance.
  3. Answering via MLM

    • Instead of training a binary classifier per question, QIME feeds the question and the target text into a pretrained biomedical language model (e.g., BioBERT).
    • The model’s masked token probability for “yes” vs. “no” yields the binary answer, forming one dimension of the final embedding.
  4. Embedding Assembly

    • The concatenated vector of binary answers (e.g., 256‑dim) constitutes the interpretable embedding.
    • Because each dimension maps to a concrete question, developers can read out why two documents are similar (e.g., both answer “yes” to “Is hypertension mentioned?”).

Results & Findings

BenchmarkQIME vs. Prior Interpretable MethodsQIME vs. Black‑Box Encoders
Biomedical Semantic Similarity (BIOSSES)+12.4% Spearman improvement-3.1% relative to BioBERT (gap narrowed)
Clustering (k‑means on PubMed abstracts)Adjusted Rand Index +0.18Within 5% of dense embeddings
Retrieval (MeSH‑based query)Recall@10 +9.7%Within 2% of SOTA dense retrievers

Key takeaways

  • The training‑free variant even outperforms the classifier‑based version, showing that MLM‑based answering is a strong signal.
  • Qualitative analysis reveals that the top‑ranked questions often correspond to clinically decisive concepts (e.g., “Is the patient on anticoagulants?”).

Practical Implications

  • Explainable AI for Clinical Decision Support: Developers can embed patient notes and instantly surface the exact clinical facts driving similarity scores, aiding trust and auditability.
  • Rapid Prototyping of Retrieval Systems: Since no per‑question training is required, teams can spin up a searchable knowledge base by simply defining the ontology of interest.
  • Regulatory Compliance: Interpretable embeddings satisfy emerging “right‑to‑explain” requirements in healthcare AI deployments.
  • Domain Adaptation: Adding a new specialty (e.g., oncology) only needs the corresponding ontology; QIME automatically generates relevant questions without retraining the whole model.

Limitations & Future Work

  • Ontology Dependence: The quality of embeddings hinges on the completeness and granularity of the underlying medical ontology; rare or emerging concepts may be missed.
  • Binary Question Scope: Complex relations (e.g., temporal or causal) are reduced to yes/no, potentially oversimplifying nuanced clinical narratives.
  • Scalability of Question Set: Very large signature pools can inflate embedding dimensionality; future work could explore adaptive pruning or hierarchical question encoding.
  • Evaluation on Real‑World Clinical Workflows: The paper focuses on benchmark datasets; deploying QIME in live EHR systems and measuring impact on clinician workflow remains an open step.

Authors

  • Yixuan Tang
  • Zhenghong Lin
  • Yandong Sun
  • Anthony K. H. Tung

Paper Information

  • arXiv ID: 2603.01690v1
  • Categories: cs.CL, cs.AI
  • Published: March 2, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »