CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning

arXiv:2502.03946v5 Announce Type: replace Abstract: Data preprocessing is often paid little attention in machine learning, despite its potentially significant impact on model performance. While automated machine learning pipelines are starting to recognize and integrate data preprocessing into their solutions for classification and regression tasks, this integration is lacking for more specialized tasks like time-to-event models for censored data. As a result, survival analysis not only faces the general challenges of data preprocessing but also suffers from the lack of tailored, automated sol
The proliferation of complex data tasks in AI and the limitations of current automated machine learning solutions for specialized areas like time-to-event models are driving the need for advanced preprocessing automation.
Automated, intelligent data preprocessing for niche AI applications like survival analysis reduces manual effort, improves model reliability, and democratizes access to sophisticated AI techniques for domains like healthcare and finance.
The explicit integration of reinforcement learning into data preprocessing for time-to-event models signals a new wave of automation that addresses previously underserved or harder-to-automate aspects of specialized AI pipelines.
- · AI researchers and data scientists
- · Healthcare and pharmaceutical industries
- · Financial modeling and risk assessment
- · Manual data preprocessing specialists
- · Less adaptable AI platforms
Improved accuracy and efficiency in predictive modeling for fields relying on time-to-event data.
Faster development and deployment of AI solutions in critical sectors, leading to more robust decision-making.
Increased adoption of AI in areas previously constrained by complex data preparation, accelerating innovation and competitive advantage.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG