[Paper] Transcriptome-Conditioned Personalized De Novo Drug Generation for AML Using Metaheuristic Assembly and Target-Driven Filtering
Source: arXiv - 2512.21301v1
Overview
A new computational pipeline links a patient’s RNA‑seq profile directly to the design of brand‑new drug candidates for acute myeloid leukemia (AML). By turning transcriptomic signatures into structural “hot‑spots” on disease‑relevant proteins and then assembling molecules around those spots with a custom evolutionary algorithm, the authors demonstrate a scalable route to truly personalized de novo drug discovery.
Key Contributions
- Transcriptome‑driven target selection: Used TCGA‑LAML bulk RNA‑seq and Weighted Gene Co‑expression Network Analysis (WGCNA) to pinpoint 20 high‑value biomarkers (e.g., HK3, SIGLEC9).
- Structure prediction for non‑crystallized targets: Applied AlphaFold 3 to generate high‑confidence 3D models for all selected proteins.
- Hot‑spot quantification: Employed DOGSiteScorer to locate and score druggable pockets on each model.
- Reaction‑first evolutionary metaheuristic: Designed a novel fragment‑assembly algorithm that builds molecules by iteratively applying chemical reactions, guided by multi‑objective optimization (binding alignment, synthetic feasibility, drug‑likeness).
- End‑to‑end validation: Integrated ADMET filtering, QED scoring, and SwissDock docking; highlighted lead Ligand L1 with a predicted binding free energy of –6.57 kcal/mol against the A08A96 biomarker.
Methodology
- Data ingestion: Bulk RNA‑seq from the TCGA‑LAML cohort was processed to generate a co‑expression network. Modules most correlated with disease outcome were mined, yielding 20 candidate genes.
- Structure modeling: For each gene product, AlphaFold 3 produced a 3D structure. The DOGSiteScorer engine scanned these models to rank pockets by size, depth, and druggability.
- Fragment library preparation: A curated set of ~10 k commercially available fragments (rule‑of‑three compliant) served as building blocks.
- Metaheuristic assembly:
- Reaction‑first encoding: Instead of assembling atoms arbitrarily, the algorithm selects a chemical reaction (e.g., amide coupling, Suzuki coupling) and then chooses compatible fragments.
- Multi‑objective fitness: Each candidate molecule is scored on (a) geometric alignment of key pharmacophore features to the pocket, (b) synthetic accessibility, and (c) drug‑likeness (QED).
- Evolutionary loop: Populations evolve through mutation (alternative reactions), crossover (fragment swapping), and selection, converging on high‑scoring chemotypes.
- In‑silico vetting: The top 200 molecules undergo ADMET prediction (toxicity, metabolism) and docking with SwissDock to estimate binding free energies.
Results & Findings
- Biomarker hotspot quality: All 20 targets displayed at least one pocket with a DOGSiteScorer score >0.7, indicating high druggability.
- Chemical novelty: The generative run produced >15 k unique scaffolds, with a median Tanimoto similarity <0.3 to any molecule in ChEMBL, confirming structural originality.
- Drug‑likeness: QED distribution peaked between 0.5–0.7, comparable to known oral drugs.
- Lead identification: Ligand L1 (MW = 312 Da, QED = 0.68) docked to the A08A96 hotspot with ΔG = –6.57 kcal/mol, and passed all ADMET thresholds (no predicted hERG inhibition, low hepatotoxicity).
- Scalability: The entire pipeline—from RNA‑seq to lead list—completed in ~48 hours on a modest GPU‑enabled workstation, demonstrating feasibility for routine clinical use.
Practical Implications
- Patient‑specific drug pipelines: Oncology labs could feed a newly sequenced AML sample into the workflow and receive a shortlist of chemically tractable leads within days, shortening the “bench‑to‑bedside” cycle.
- Accelerated hit‑to‑lead: Because the algorithm builds molecules from known reactions, the resulting compounds are synthetically accessible, easing the transition to medicinal chemistry synthesis.
- Beyond AML: The modular nature (target discovery → structure → hotspot → assembly) means the same framework can be repurposed for other heterogeneous cancers, infectious diseases, or rare genetic disorders where transcriptomics is available.
- Integration with existing pipelines: The generated leads can be fed directly into high‑throughput screening or AI‑driven activity prediction platforms, complementing traditional virtual screening that relies on pre‑existing libraries.
Limitations & Future Work
- Bulk vs. single‑cell data: The study used bulk RNA‑seq, which may mask sub‑clonal expression patterns; incorporating single‑cell transcriptomics could refine target selection.
- In‑vitro validation: All efficacy claims are computational; experimental binding assays and cellular viability tests are needed to confirm activity.
- Docking accuracy: SwissDock provides a fast estimate of binding energy but lacks explicit solvation and entropy terms; future work will integrate more rigorous free‑energy methods (e.g., MM‑GBSA).
- Algorithm generalization: While the reaction‑first metaheuristic performed well on AML targets, benchmarking against other protein families will be necessary to prove broad applicability.
Bottom line: This paper showcases a proof‑of‑concept that merges systems‑level transcriptomics with a custom, reaction‑driven molecular generator, opening a realistic path toward truly personalized drug design for AML and potentially many other diseases.
Authors
- Abdullah G. Elafifi
- Basma Mamdouh
- Mariam Hanafy
- Muhammed Alaa Eldin
- Yosef Khaled
- Nesma Mohamed El‑Gelany
- Tarek H. M. Abou‑El‑Enien
Paper Information
- arXiv ID: 2512.21301v1
- Categories: cs.LG, q-bio.QM
- Published: December 24, 2025
- PDF: Download PDF