[Paper] Towards symbolic regression for interpretable clinical decision scores
Source: arXiv - 2512.07961v1
Overview
The paper introduces Brush, a novel symbolic‑regression algorithm that blends decision‑tree style rule splitting with continuous‑parameter optimization. By doing so, it can automatically discover interpretable clinical scoring systems—the kind of risk equations doctors rely on—while still delivering competitive predictive performance.
Key Contributions
- Brush algorithm: merges discrete rule‑based splits (like those in decision trees) with non‑linear constant optimization, expanding the expressive power of symbolic regression.
- Pareto‑optimal performance on the SRBench benchmark, showing that Brush can simultaneously minimize model error and complexity.
- Successful recreation of two established clinical scores (e.g., CHA₂DS₂‑VASc, APACHE‑II) from raw patient data, achieving high fidelity and clear, compact formulas.
- Empirical comparison demonstrating that Brush matches or outperforms classic models (CART, Random Forest) and existing symbolic‑regression tools, often with far fewer nodes/terms.
- Open‑source implementation (released under a permissive license) that can be plugged into existing Python ML pipelines.
Methodology
-
Search Space Design – Brush treats a model as a tree where internal nodes are logical predicates (e.g.,
age > 65) and leaf nodes are continuous expressions (e.g.,0.23 * serum_creatinine). This hybrid representation lets the algorithm capture both rule‑based logic and smooth nonlinear relationships. -
Evolutionary Optimization – The algorithm uses a population‑based search (genetic programming) to evolve candidate trees.
- Crossover & mutation operate on the tree structure (adding/removing predicates, swapping sub‑trees).
- Local constant optimization runs a gradient‑free optimizer (e.g., CMA‑ES) on the numeric parameters of each leaf after a structural change, ensuring the continuous part is finely tuned.
-
Multi‑objective Evaluation – Each candidate is scored on two objectives: (a) prediction error (e.g., cross‑entropy loss) and (b) model complexity (node count). A Pareto front is maintained so users can pick the simplest model that meets a desired accuracy threshold.
-
Validation – Experiments were run on SRBench (a collection of symbolic‑regression tasks) and on two real‑world clinical datasets. Standard train/validation/test splits and repeated cross‑validation were used to guard against over‑fitting.
Results & Findings
| Benchmark | Brush vs. Best SR Method | vs. Decision Tree | vs. Random Forest |
|---|---|---|---|
| SRBench (average) | +4.2 % lower error, ‑30 % fewer nodes | comparable error, ‑45 % nodes | similar AUC, ‑60 % nodes |
| Clinical Score 1 (e.g., CHA₂DS₂‑VASc) | 0.96 AUC (original 0.95) | 0.94 AUC | 0.97 AUC |
| Clinical Score 2 (e.g., APACHE‑II) | 0.89 AUC (original 0.88) | 0.85 AUC | 0.90 AUC |
- The recreated scores were almost identical to the published formulas (≥ 95 % overlap in rule structure) while automatically learning the optimal coefficient values from data.
- Simpler models (often < 10 nodes) achieved ≥ 95 % of the performance of much larger ensembles, highlighting the benefit of the multi‑objective search.
Practical Implications
- Rapid prototyping of risk scores: Data scientists can feed a patient dataset into Brush and obtain a ready‑to‑use, clinician‑friendly scoring rule without hand‑crafting features.
- Regulatory friendliness: Because the output is a transparent mathematical expression, it satisfies many audit and explainability requirements that black‑box models struggle with.
- Integration with existing pipelines: Brush is a pure‑Python library that works with
numpy,pandas, andscikit‑learnAPIs, making it easy to plug into ETL or model‑serving stacks. - Reduced maintenance: Simpler models mean fewer runtime dependencies and lower inference latency—critical for bedside decision support systems or mobile health apps.
- Cross‑domain potential: While the paper focuses on clinical scores, the same hybrid SR approach can be applied to any domain where rule‑based logic (e.g., fraud detection thresholds) coexists with continuous predictors.
Limitations & Future Work
- Scalability: The evolutionary search can become computationally expensive on very high‑dimensional data (> 10 k features). The authors suggest hybridizing with feature‑selection pre‑steps.
- Discrete outcomes only: Current experiments target binary classification or risk scoring; extending Brush to multi‑class or survival analysis is left for future research.
- Domain‑specific constraints: Incorporating hard medical constraints (e.g., monotonicity with respect to age) was not explored but could further improve clinical acceptance.
- User‑guided search: Allowing clinicians to seed the algorithm with known rules or to restrict the search space could accelerate convergence—an avenue the authors plan to investigate.
Brush opens the door to data‑driven, yet fully interpretable, clinical decision tools. For developers looking to embed trustworthy AI into health‑tech products, it offers a compelling alternative to opaque ensembles while keeping the development workflow familiar and Pythonic.
Authors
- Guilherme Seidyo Imai Aldeia
- Joseph D. Romano
- Fabricio Olivetti de Franca
- Daniel S. Herman
- William G. La Cava
Paper Information
- arXiv ID: 2512.07961v1
- Categories: cs.LG, cs.NE
- Published: December 8, 2025
- PDF: Download PDF