[Paper] Towards symbolic regression for interpretable clinical decision scores

Published: 1 week ago (December 8, 2025 at 02:00 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.07961v1

Overview

The paper introduces Brush, a novel symbolic‑regression algorithm that blends decision‑tree style rule splitting with continuous‑parameter optimization. By doing so, it can automatically discover interpretable clinical scoring systems—the kind of risk equations doctors rely on—while still delivering competitive predictive performance.

Key Contributions

Brush algorithm: merges discrete rule‑based splits (like those in decision trees) with non‑linear constant optimization, expanding the expressive power of symbolic regression.
Pareto‑optimal performance on the SRBench benchmark, showing that Brush can simultaneously minimize model error and complexity.
Successful recreation of two established clinical scores (e.g., CHA₂DS₂‑VASc, APACHE‑II) from raw patient data, achieving high fidelity and clear, compact formulas.
Empirical comparison demonstrating that Brush matches or outperforms classic models (CART, Random Forest) and existing symbolic‑regression tools, often with far fewer nodes/terms.
Open‑source implementation (released under a permissive license) that can be plugged into existing Python ML pipelines.

Methodology

Search Space Design – Brush treats a model as a tree where internal nodes are logical predicates (e.g., age > 65) and leaf nodes are continuous expressions (e.g., 0.23 * serum_creatinine). This hybrid representation lets the algorithm capture both rule‑based logic and smooth nonlinear relationships.
Evolutionary Optimization – The algorithm uses a population‑based search (genetic programming) to evolve candidate trees.
- Crossover & mutation operate on the tree structure (adding/removing predicates, swapping sub‑trees).
- Local constant optimization runs a gradient‑free optimizer (e.g., CMA‑ES) on the numeric parameters of each leaf after a structural change, ensuring the continuous part is finely tuned.
Multi‑objective Evaluation – Each candidate is scored on two objectives: (a) prediction error (e.g., cross‑entropy loss) and (b) model complexity (node count). A Pareto front is maintained so users can pick the simplest model that meets a desired accuracy threshold.
Validation – Experiments were run on SRBench (a collection of symbolic‑regression tasks) and on two real‑world clinical datasets. Standard train/validation/test splits and repeated cross‑validation were used to guard against over‑fitting.

Results & Findings

Benchmark	Brush vs. Best SR Method	vs. Decision Tree	vs. Random Forest
SRBench (average)	+4.2 % lower error, ‑30 % fewer nodes	comparable error, ‑45 % nodes	similar AUC, ‑60 % nodes
Clinical Score 1 (e.g., CHA₂DS₂‑VASc)	0.96 AUC (original 0.95)	0.94 AUC	0.97 AUC
Clinical Score 2 (e.g., APACHE‑II)	0.89 AUC (original 0.88)	0.85 AUC	0.90 AUC

The recreated scores were almost identical to the published formulas (≥ 95 % overlap in rule structure) while automatically learning the optimal coefficient values from data.
Simpler models (often < 10 nodes) achieved ≥ 95 % of the performance of much larger ensembles, highlighting the benefit of the multi‑objective search.

Practical Implications

Rapid prototyping of risk scores: Data scientists can feed a patient dataset into Brush and obtain a ready‑to‑use, clinician‑friendly scoring rule without hand‑crafting features.
Regulatory friendliness: Because the output is a transparent mathematical expression, it satisfies many audit and explainability requirements that black‑box models struggle with.
Integration with existing pipelines: Brush is a pure‑Python library that works with numpy, pandas, and scikit‑learn APIs, making it easy to plug into ETL or model‑serving stacks.
Reduced maintenance: Simpler models mean fewer runtime dependencies and lower inference latency—critical for bedside decision support systems or mobile health apps.
Cross‑domain potential: While the paper focuses on clinical scores, the same hybrid SR approach can be applied to any domain where rule‑based logic (e.g., fraud detection thresholds) coexists with continuous predictors.

Limitations & Future Work

Scalability: The evolutionary search can become computationally expensive on very high‑dimensional data (> 10 k features). The authors suggest hybridizing with feature‑selection pre‑steps.
Discrete outcomes only: Current experiments target binary classification or risk scoring; extending Brush to multi‑class or survival analysis is left for future research.
Domain‑specific constraints: Incorporating hard medical constraints (e.g., monotonicity with respect to age) was not explored but could further improve clinical acceptance.
User‑guided search: Allowing clinicians to seed the algorithm with known rules or to restrict the search space could accelerate convergence—an avenue the authors plan to investigate.

Brush opens the door to data‑driven, yet fully interpretable, clinical decision tools. For developers looking to embed trustworthy AI into health‑tech products, it offers a compelling alternative to opaque ensembles while keeping the development workflow familiar and Pythonic.

Authors

Guilherme Seidyo Imai Aldeia
Joseph D. Romano
Fabricio Olivetti de Franca
Daniel S. Herman
William G. La Cava

Paper Information

arXiv ID: 2512.07961v1
Categories: cs.LG, cs.NE
Published: December 8, 2025
PDF: Download PDF

[Paper] Towards symbolic regression for interpretable clinical decision scores

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Spatia: Video Generation with Updatable Spatial Memory

[Paper] Predictive Concept Decoders: Training Scalable End-to-End Interpretability Assistants

[Paper] Artism: AI-Driven Dual-Engine System for Art Generation and Critique

[Paper] Learning Model Parameter Dynamics in a Combination Therapy for Bladder Cancer from Sparse Biological Data