[Paper] Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning
Source: arXiv - 2601.02265v1
Overview
Long‑acting injectables (LAIs) are polymer‑based drug depots that release medication over weeks or months, dramatically improving adherence for chronic diseases. In this paper, Robles and Samad show how a specially‑designed, explainable machine‑learning pipeline can predict early (24‑72 h) and complete release profiles for 321 LAI formulations, while also revealing which material attributes drive those outcomes.
Key Contributions
- Custom data transformation that converts heterogeneous in‑vitro release curves into a format amenable to standard ML models.
- Three predictive tasks:
- Regression of cumulative release at 24 h, 48 h, and 72 h.
- Classification of release‑profile type (e.g., monophasic, biphasic, triphasic).
- Full‑curve prediction of complete release kinetics.
- Explainability via SHAP (Shapley additive explanations) to quantify the impact of formulation variables (polymer type, drug loading, particle size, etc.) on early vs. late release.
- Time‑independent modeling that outperforms conventional time‑dependent approaches for complex biphasic/triphasic release patterns.
- Open‑source implementation (code and trained models) enabling reproducibility and rapid adoption by formulation scientists.
Methodology
- Dataset Curation – 321 LAI formulations from the literature were digitized, each annotated with 23 physicochemical descriptors (polymer chemistry, drug properties, particle morphology, etc.) and the corresponding in‑vitro release curves.
- Feature Engineering – Release curves were summarized using a set of time‑independent descriptors (e.g., area under the curve, early‑release slope) to decouple the learning problem from explicit time‑series modeling.
- Model Suite – Gradient‑boosted trees (XGBoost) and random forests were trained for regression and classification tasks. Hyperparameters were tuned via nested cross‑validation to avoid overfitting on the relatively small dataset.
- Explainability – SHAP values were computed for each prediction, allowing the authors to rank the importance of formulation attributes and visualize how they push a prediction higher or lower.
- Evaluation – Pearson’s r for early‑release regression (> 0.65 at 72 h), macro‑averaged F1‑score for profile‑type classification (0.87), and mean absolute error for full‑curve prediction were reported against held‑out test sets.
Results & Findings
- Early‑Release Prediction: Correlation between predicted and measured cumulative release reached 0.71 at 24 h and improved to 0.78 at 72 h, indicating the model captures the dominant early‑release mechanisms.
- Profile‑Type Classification: The model distinguished monophasic, biphasic, and triphasic release curves with an overall F1‑score of 0.87, demonstrating reliable categorization even with limited data.
- Complete‑Release Modeling: A single, time‑independent model could reconstruct full release curves, accurately reproducing delayed biphasic and triphasic patterns that traditional time‑dependent models struggle with.
- Feature Insights: SHAP analysis highlighted polymer degradation rate, drug‑polymer affinity (log P), and particle size distribution as the top drivers for early release, while polymer molecular weight and cross‑link density dominated the later, complete‑release phase.
Practical Implications
- Accelerated Formulation Design: Development teams can input candidate polymer‑drug combinations into the publicly available model to obtain rapid early‑release estimates, cutting down on costly bench experiments.
- Risk Mitigation: By understanding which attributes most affect delayed release, manufacturers can prioritize robust control strategies (e.g., tighter particle‑size specifications) early in the scale‑up process.
- Regulatory Support: Explainable predictions provide a data‑driven rationale for formulation choices, which can be incorporated into IND/MAA submissions to demonstrate a mechanistic understanding of release behavior.
- Platform Extension: The time‑independent framework can be adapted to other depot systems (e.g., microspheres, in situ forming gels) with minimal re‑training, offering a reusable tool across drug‑delivery programs.
Limitations & Future Work
- Dataset Size & Diversity: Although 321 formulations is sizable for LAI research, the chemical space is still limited; performance on novel polymers or biologics remains to be validated.
- In‑Vitro vs. In‑Vivo Translation: The models predict in‑vitro release; bridging to in‑vivo pharmacokinetics will require additional physiological descriptors (e.g., tissue diffusion, immune response).
- Dynamic Conditions: The current approach assumes static release media; future work could incorporate pH or enzymatic degradation variations to model more realistic implantation environments.
- Model Generalization: Exploring deep‑learning architectures that directly ingest raw release curves may further improve accuracy for highly irregular profiles, but at the cost of interpretability.
Authors
- Karla N. Robles
- Manar D. Samad
Paper Information
- arXiv ID: 2601.02265v1
- Categories: q-bio.BM, cs.LG
- Published: January 5, 2026
- PDF: Download PDF