Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling

arXiv:2606.15038v1 Announce Type: new Abstract: Accurate time-to-event (TTE) prediction from multimodal clinical data remains challenging due to modality imbalance and distribution shift. We introduce a foundation model-driven framework for cross-modal representation alignment between CT imaging and longitudinal EHR data, designed to generalize across tasks and institutions. CT and EHR modalities are encoded independently using domain-specific foundation models and aligned in a shared latent space through four principled fusion strategies: late fusion, contrastive alignment, cross-attention, a
The increasing availability of diverse clinical data modalities and the maturation of foundation models are enabling more sophisticated approaches to TTE prediction.
This research addresses a critical challenge in clinical AI by improving the accuracy and generalizability of predictions, which is vital for personalized medicine and proactive healthcare interventions.
The ability to more reliably combine and align multimodal patient data for time-to-event modeling will lead to better diagnostic, prognostic, and treatment planning in healthcare.
- · Healthcare AI developers
- · Medical research institutions
- · Patients with complex conditions
- · Clinical diagnostics companies
- · Traditional statistical modeling approaches
- · Single-modality diagnostic tools
Improved predictive accuracy in clinical settings and more robust generalizability of AI models across different institutions.
Accelerated development of AI-driven personalized treatment plans and proactive disease management strategies.
Potential for reduced healthcare costs through earlier and more effective interventions, and a shift towards preventative medicine aided by AI.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI