[Paper] Toward Reliable and Explainable Nail Disease Classification: Leveraging Adversarial Training and Grad-CAM Visualization

Published: (February 4, 2026 at 01:08 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2602.04820v1

Overview

A new study proposes a deep‑learning pipeline that can automatically identify six common nail diseases from photos with >95 % accuracy. By combining adversarial training for robustness and visual explanation tools (Grad‑CAM/SHAP), the authors aim to deliver a model that not only performs well but also tells clinicians why it made a particular decision—an essential step toward trustworthy AI in dermatology.

Key Contributions

  • Benchmarking four state‑of‑the‑art CNNs (InceptionV3, DenseNet201, EfficientNetV2, ResNet50) on a public nail‑image dataset (3,835 samples, 224 × 224 px).
  • Achieving top‑tier performance: InceptionV3 reaches 95.57 % accuracy, surpassing the other architectures.
  • Introducing adversarial training to harden the classifier against noisy or borderline images, reducing misclassifications on challenging cases.
  • Providing model interpretability through Grad‑CAM heatmaps (and SHAP values) that highlight the nail regions driving each prediction, helping users verify that the model focuses on medically relevant features.
  • Packaging the workflow as a reusable Python pipeline, ready for integration into clinical decision‑support tools or tele‑dermatology apps.

Methodology

  1. Data preparation – All images were resized to a uniform 224 × 224 px resolution and normalized. The dataset was split into training/validation/test sets using stratified sampling to preserve class balance.
  2. Model training – Each CNN was fine‑tuned on the nail‑disease data with standard cross‑entropy loss and Adam optimizer. Early stopping and learning‑rate scheduling prevented overfitting.
  3. Adversarial robustness – During training, the authors generated FGSM (Fast Gradient Sign Method) perturbations on‑the‑fly and fed these adversarial examples back into the network, encouraging it to learn more invariant features.
  4. Explainability – After inference, Grad‑CAM was applied to the final convolutional layer to produce heatmaps over the nail image. In parallel, SHAP (SHapley Additive exPlanations) values were computed to quantify each pixel’s contribution to the predicted class.
  5. Evaluation – Accuracy, precision, recall, and F1‑score were reported per class, and robustness was measured by the drop in performance on adversarially perturbed test images.

Results & Findings

ModelAccuracyRobustness (Δ on adversarial test)
InceptionV395.57 %–1.2 %
DenseNet20194.79 %–1.5 %
EfficientNetV293.6 %–2.0 %
ResNet5092.3 %–2.3 %
  • Adversarial training reduced the accuracy loss on perturbed images by roughly 30 % compared with vanilla training, confirming improved resilience.
  • Grad‑CAM visualizations consistently highlighted the nail plate and surrounding lesions, aligning with dermatologists’ visual cues.
  • SHAP analysis revealed that color variations (e.g., pallor, discoloration) and texture patterns were the strongest predictive features, offering quantitative insight into the model’s reasoning.

Practical Implications

  • Clinical decision support – A lightweight InceptionV3 model can be embedded in electronic health‑record (EHR) systems or mobile apps to give doctors a second opinion, speeding up triage for nail‑related conditions.
  • Tele‑dermatology – Patients can upload a selfie of their nail; the backend runs the robust classifier and returns a confidence score plus an explanation heatmap, helping remote clinicians assess whether an in‑person visit is needed.
  • Quality assurance – The visual explanations act as a sanity check for developers, allowing quick detection of dataset bias (e.g., background artifacts) before deployment.
  • Regulatory readiness – By providing interpretable outputs, the system aligns with emerging AI‑in‑health guidelines that demand transparency and traceability.

Limitations & Future Work

  • Dataset scope – The public set contains only 3.8 k images and six disease categories; rare nail conditions are not represented, limiting generalizability.
  • Clinical validation – The study stops at technical evaluation; prospective trials with dermatologists are needed to confirm diagnostic utility and safety.
  • Real‑time performance – While InceptionV3 is relatively fast, deploying on edge devices (e.g., smartphones) may require model pruning or quantization.
  • Future directions – Expanding the dataset with multi‑ethnic samples, incorporating multimodal data (patient history, lab results), and exploring self‑supervised pre‑training to further boost robustness and explainability.

Authors

  • Farzia Hossain
  • Samanta Ghosh
  • Shahida Begum
  • B. M. Shahria Alam
  • Mohammad Tahmid Noor
  • Md Parvez Mia
  • Nishat Tasnim Niloy

Paper Information

  • arXiv ID: 2602.04820v1
  • Categories: cs.CV, cs.AI, cs.LG
  • Published: February 4, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »