Discovery and inference beyond linearity for epidemiological data by integrating Bayesian regression, tree ensembles and Shapley values

arXiv:2505.00571v3 Announce Type: replace-cross Abstract: Machine Learning (ML) is gaining popularity in epidemiology and healthcare studies for hypothesis-free discovery of risk and protective factors. ML is strong at discovering nonlinearities and interactions, but this power is compromised by a lack of reliable inference. Although Shapley values provide local measures of features' effects, valid uncertainty quantification for these effects is typically lacking, thus precluding statistical inference. We propose RuleSHAP, a framework that addresses this limitation by combining a dedicated Bay
The increasing adoption of Machine Learning in sensitive fields like epidemiology is driving a demand for more robust and interpretable inference methods.
This development addresses a critical limitation of ML in scientific and medical applications, fostering greater trust and enabling regulatory acceptance by providing reliable uncertainty quantification.
Machine Learning models can now move beyond mere prediction to provide statistically valid causal inference in complex datasets, particularly in health and epidemiological studies.
- · Machine Learning researchers
- · Epidemiologists
- · Healthcare sector
- · Pharmaceutical companies
- · Purely black-box ML models
- · Traditional statistical methods lacking scalability
Improved accuracy and reliability of epidemiological predictions and risk factor identification using advanced ML.
Faster development and deployment of targeted public health interventions and drug discovery based on more robust data insights.
Enhanced AI-driven policy making for public health, potentially leading to significant improvements in global health outcomes.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG