Evolutional Math: Cross-Validated Island-Model Genetic Programming for Interpretable Symbolic Regression on Small, Wide Datasets

arXiv:2606.28381v1 Announce Type: cross Abstract: Symbolic regression via genetic programming routinely fails on small, wide datasets - a regime common in clinical-trial monitoring, biostatistics, and engineering pilot studies - by converging on bloated, overfit expressions that exploit correlation rather than prediction. We present Evolutional Math, an open-source genetic programming system that combines four design choices to yield compact, interpretable formulas in this regime. First, fitness is measured by R-squared on held-out cross-validation folds rather than Pearson correlation on the
The perennial challenge of symbolic regression on small, wide datasets in critical fields like medicine and engineering is being directly addressed by novel algorithmic approaches, indicating a current push for more robust AI solutions.
This development proposes a method to derive interpretable and accurate models from limited data, which is crucial for high-stakes applications where black-box AI models are unacceptable and data scarcity is common.
The ability to reliably generate compact, interpretable formulas from previously difficult datasets could enable wider adoption of AI in fields requiring explainable outcomes and reduce the barrier to entry for AI in data-poor environments.
- · Biostatisticians
- · Clinical trial monitoring
- · Engineering pilot studies
- · AI explainability researchers
- · Opaquely complex AI models
- · Traditional statistical modeling
Improved model deployment in critical sectors due to enhanced interpretability and reliability with limited data.
Accelerated discovery and validation cycles in scientific research and product development where data collection is expensive or slow.
Potential for new regulatory frameworks for AI systems that prioritize interpretability and robustness on diverse datasets.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI