[Paper] QIME: Constructing Interpretable Medical Text Embeddings via Ontology-Grounded Questions
Source: arXiv - 2603.01690v1
Overview
The paper introduces QIME, a new way to turn medical text (e.g., clinical notes, research abstracts) into embeddings that are both high‑performing and human‑readable. Instead of opaque dense vectors, QIME represents each document as a series of yes/no answers to clinically grounded questions derived from biomedical ontologies. This makes the embeddings interpretable for clinicians and developers while still rivaling the accuracy of black‑box models.
Key Contributions
- Ontology‑grounded question generation: Leverages medical concept signatures (e.g., SNOMED‑CT, MeSH) to automatically craft fine‑grained, clinically meaningful binary questions.
- Training‑free embedding construction: Bypasses the need to train a separate classifier for every question; answers are obtained directly from the language model’s masked‑language‑model (MLM) probabilities.
- Interpretability without sacrificing performance: Achieves state‑of‑the‑art results on biomedical similarity, clustering, and retrieval tasks, closing the gap to dense black‑box encoders.
- Scalable and modular design: New medical domains or ontologies can be plugged in with minimal engineering effort.
Methodology
-
Concept Signature Extraction
- For each medical ontology cluster (e.g., “cardiovascular diseases”), the authors collect a set of representative terms (signatures) using TF‑IDF and ontology hierarchy information.
-
Question Generation
- A template (“Does the text mention X?”) is instantiated with each signature term, producing a pool of candidate yes/no questions.
- A lightweight scoring model ranks questions by semantic atomicity (how specific and non‑redundant they are) and clinical relevance.
-
Answering via MLM
- Instead of training a binary classifier per question, QIME feeds the question and the target text into a pretrained biomedical language model (e.g., BioBERT).
- The model’s masked token probability for “yes” vs. “no” yields the binary answer, forming one dimension of the final embedding.
-
Embedding Assembly
- The concatenated vector of binary answers (e.g., 256‑dim) constitutes the interpretable embedding.
- Because each dimension maps to a concrete question, developers can read out why two documents are similar (e.g., both answer “yes” to “Is hypertension mentioned?”).
Results & Findings
| Benchmark | QIME vs. Prior Interpretable Methods | QIME vs. Black‑Box Encoders |
|---|---|---|
| Biomedical Semantic Similarity (BIOSSES) | +12.4% Spearman improvement | -3.1% relative to BioBERT (gap narrowed) |
| Clustering (k‑means on PubMed abstracts) | Adjusted Rand Index +0.18 | Within 5% of dense embeddings |
| Retrieval (MeSH‑based query) | Recall@10 +9.7% | Within 2% of SOTA dense retrievers |
Key takeaways
- The training‑free variant even outperforms the classifier‑based version, showing that MLM‑based answering is a strong signal.
- Qualitative analysis reveals that the top‑ranked questions often correspond to clinically decisive concepts (e.g., “Is the patient on anticoagulants?”).
Practical Implications
- Explainable AI for Clinical Decision Support: Developers can embed patient notes and instantly surface the exact clinical facts driving similarity scores, aiding trust and auditability.
- Rapid Prototyping of Retrieval Systems: Since no per‑question training is required, teams can spin up a searchable knowledge base by simply defining the ontology of interest.
- Regulatory Compliance: Interpretable embeddings satisfy emerging “right‑to‑explain” requirements in healthcare AI deployments.
- Domain Adaptation: Adding a new specialty (e.g., oncology) only needs the corresponding ontology; QIME automatically generates relevant questions without retraining the whole model.
Limitations & Future Work
- Ontology Dependence: The quality of embeddings hinges on the completeness and granularity of the underlying medical ontology; rare or emerging concepts may be missed.
- Binary Question Scope: Complex relations (e.g., temporal or causal) are reduced to yes/no, potentially oversimplifying nuanced clinical narratives.
- Scalability of Question Set: Very large signature pools can inflate embedding dimensionality; future work could explore adaptive pruning or hierarchical question encoding.
- Evaluation on Real‑World Clinical Workflows: The paper focuses on benchmark datasets; deploying QIME in live EHR systems and measuring impact on clinician workflow remains an open step.
Authors
- Yixuan Tang
- Zhenghong Lin
- Yandong Sun
- Anthony K. H. Tung
Paper Information
- arXiv ID: 2603.01690v1
- Categories: cs.CL, cs.AI
- Published: March 2, 2026
- PDF: Download PDF