[Paper] Developing a novel Comorbidities Index for predicting 10-year mortality in Prostate Cancer patients: A computational data-driven approach

Published: (May 29, 2026 at 08:18 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2605.31213v1

Overview

A team of researchers has built a new, data‑driven comorbidity index that predicts 10‑year mortality for men with prostate cancer who are candidates for radical prostatectomy. By re‑weighting and even re‑formulating the classic Charlson Comorbidity Index (CCI) with modern machine‑learning techniques, they achieve noticeably better survival discrimination—an advance that could help clinicians avoid overtreatment while still offering curative surgery to the right patients.

Key Contributions

  • Tailored comorbidity index: Created a prostate‑cancer‑specific version of the CCI that reflects contemporary survival trends.
  • Population‑Based Bio‑Inspired Algorithms (PBBIAs): Leveraged genetic programming, particle‑swarm optimization, and other meta‑heuristics to automatically discover optimal weightings and symbolic formulas.
  • Comprehensive benchmark: Compared six optimization strategies (including classic CCI, a previously published prostate‑specific CCI, and standard survival models) on the same dataset.
  • Interpretability focus: Produced compact, human‑readable models (e.g., via GPLearn) without sacrificing predictive power.
  • Performance gain: Achieved up to a 0.10 increase in concordance index (C‑index) over the original CCI when prostate‑cancer‑specific variables were added.

Methodology

  1. Data collection – Retrospective cohort from a single high‑volume urology center, containing patient demographics, comorbidities, tumor characteristics, treatment details, and 10‑year follow‑up outcomes.
  2. Feature engineering – Started with the 19 comorbidity items used in the Charlson index, then added prostate‑cancer‑specific predictors (e.g., Gleason score, PSA level).
  3. Optimization pipelines
    • Genetic Programming (GP): Evolved symbolic expressions that map comorbidity counts to a risk score, guided by a fitness function based on the C‑index.
    • Population‑Based Metaheuristics: Implemented a Genetic Algorithm (GA) and a Fast‑Swarm‑Topology Particle Swarm Optimizer (FST‑PSO) to search the weight space for the best linear combination of comorbidities.
    • Baseline models: The original CCI, a previously published prostate‑cancer‑specific CCI (PCCI), and conventional Cox proportional hazards models.
  4. Model selection & validation – Used 5‑fold cross‑validation to guard against overfitting, reporting average C‑index and calibration plots for each approach.
  5. Interpretability check – Selected the most parsimonious GP‑derived formulas (via GPLearn) that still met a pre‑defined performance threshold.

Results & Findings

ApproachC‑index (baseline)C‑index (best)Δ vs. CCI
Original CCI0.68
PCCI (clinical)0.71+0.03
GA‑optimized linear weights0.77+0.09
FST‑PSO0.76+0.08
GPLearn (compact GP)0.75+0.07
Standard Cox model (all vars)0.73+0.05
  • Interpretability: The top GP models consisted of fewer than 10 terms, making them easy to translate into a bedside calculator.
  • Calibration: All data‑driven models showed tighter alignment between predicted and observed 10‑year mortality across risk deciles compared with the original CCI.
  • Impact of PCa‑specific variables: Adding Gleason score and PSA boosted every model’s discrimination, confirming that comorbidities alone are insufficient for accurate prognosis in this population.

Practical Implications

  • Clinical decision support: A lightweight, interpretable risk score can be embedded into electronic health records (EHRs) or surgical planning tools, instantly flagging patients whose 10‑year non‑cancer mortality exceeds the threshold for radical prostatectomy.
  • Resource allocation: Hospitals can better triage surgical slots, focusing operating‑room time on patients with the highest net benefit.
  • Personalized counseling: Surgeons can present patients with a transparent, data‑backed estimate of competing mortality risk, improving shared decision‑making.
  • Software integration: Because the best‑performing models are either simple linear combinations or short symbolic expressions, they can be implemented in any language (Python, R, Java) with negligible computational overhead—ideal for real‑time dashboards or mobile apps.
  • Regulatory pathway: The interpretability of the GP‑derived formulas aligns with emerging AI‑in‑healthcare guidelines that favor transparent models over black‑box deep nets.

Limitations & Future Work

  • Single‑institution data: The cohort reflects the practice patterns and patient demographics of one center; external validation on multi‑center or population‑level datasets is needed.
  • Retrospective design: Unmeasured confounders (e.g., socioeconomic status, lifestyle factors) could bias the learned weights.
  • Static model: The index does not currently account for changes in comorbidity status over time; a dynamic survival model could capture longitudinal risk better.
  • Extension to other treatments: Future research should test whether the same framework can improve risk stratification for radiation therapy, active surveillance, or systemic treatments.

By marrying classic epidemiologic scoring with modern bio‑inspired optimization, the study offers a pragmatic, developer‑friendly blueprint for updating legacy clinical indices to reflect today’s patient outcomes.

Authors

  • Davide Farinati
  • Francesco Barletta
  • Paolo Zaurito
  • Simone Scuderi
  • Nicholas Raison
  • Alejandro Granados
  • Prokar Dasgupta
  • Giorgio Gandaglia
  • Alberto Briganti

Paper Information

  • arXiv ID: 2605.31213v1
  • Categories: cs.NE
  • Published: May 29, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »