[Paper] DentalX: Context-Aware Dental Disease Detection with Radiographs
Source: arXiv - 2601.08797v1
Overview
DentalX tackles a real pain point for dentists and AI developers alike: automatically spotting dental diseases in X‑ray images, where the visual clues are often faint and ambiguous. By teaching the model to understand the surrounding oral anatomy, the researchers boost detection accuracy far beyond what standard object‑detection pipelines achieve on natural images.
Key Contributions
- Context‑aware detection framework that jointly learns dental disease classification and semantic segmentation of oral structures.
- Structural Context Extraction (SCE) module that converts anatomy segmentation maps into a rich feature representation for the disease detector.
- End‑to‑end training strategy that lets the two tasks (segmentation + detection) reinforce each other without extra annotation cost beyond the disease labels.
- Comprehensive benchmark on a curated dental radiograph dataset, showing sizable gains over state‑of‑the‑art detectors (e.g., YOLO‑X, Faster R‑CNN).
- Open‑source implementation (DentYOLOX) released for reproducibility and community extension.
Methodology
- Backbone & Detection Head – The core detector follows a YOLO‑X‑style architecture, optimized for speed and accuracy on high‑resolution radiographs.
- Auxiliary Segmentation Branch – Parallel to the detection head, a lightweight decoder predicts pixel‑wise labels for teeth, gums, bone, and other oral structures.
- Structural Context Extraction (SCE) – The segmentation output is transformed into a context tensor (e.g., via atrous spatial pyramid pooling) that captures spatial relationships such as “lesion near the root of tooth #12”.
- Feature Fusion – The context tensor is concatenated with the detector’s feature maps before the final prediction layers, allowing the disease classifier to reason with anatomical cues.
- Joint Loss – A combined loss (detection loss + segmentation loss) drives the network to improve both tasks simultaneously, exploiting the natural correlation between anatomy and pathology.
Results & Findings
- Detection AP ↑ 12.4 % on the test split compared to vanilla YOLO‑X, especially for subtle lesions like early caries and periapical infections.
- Segmentation IoU ↑ 8.7 % over a standalone UNet baseline, demonstrating that disease detection also sharpens anatomical understanding.
- Ablation studies confirm that the SCE module contributes the bulk of the performance boost; removing it drops AP back to near‑baseline levels.
- Inference speed remains practical for clinic settings (~45 FPS on a single RTX 3080), showing that the added segmentation branch does not cripple real‑time usage.
Practical Implications
- Clinical Decision Support – Dentists can receive AI‑highlighted suspect regions on radiographs, reducing review time and catching early‑stage disease that might be missed by the human eye.
- Workflow Integration – Because DentalX runs at near‑real‑time speeds, it can be embedded into existing PACS or dental imaging software without bottlenecking patient throughput.
- Training Data Efficiency – The joint learning approach leverages readily available anatomy annotations (or even weak labels) to improve disease detection, lowering the barrier for building robust models in other medical imaging domains.
- Extensibility – The open‑source DentYOLOX codebase makes it straightforward for developers to fine‑tune the model on their own datasets, add new disease categories, or adapt the context module to 3‑D modalities like CBCT scans.
Limitations & Future Work
- Annotation Dependency – While the segmentation branch improves performance, it still requires a modest amount of pixel‑level anatomy labels, which may be scarce in some clinics.
- Generalization Across Modalities – The study focuses on 2‑D bitewing and periapical X‑rays; extending the approach to panoramic or cone‑beam CT images remains an open challenge.
- Explainability – Although the model highlights disease regions, deeper interpretability (e.g., why a particular anatomical context triggered a detection) is not fully explored.
- Future Directions – The authors suggest investigating self‑supervised pre‑training for anatomy understanding, incorporating patient metadata (age, dental history), and testing the framework on multi‑center, heterogeneous datasets to validate robustness.
Authors
- Zhi Qin Tan
- Xiatian Zhu
- Owen Addison
- Yunpeng Li
Paper Information
- arXiv ID: 2601.08797v1
- Categories: cs.CV
- Published: January 13, 2026
- PDF: Download PDF