Abstract
Motivation
Multi-omics integration methods are now common in cancer studies, but results remain sensitive to design choices, including when fusion occurs, what is fused, and how missingness is handled. As a result, it is difficult to compare studies and determine which integration choices are most reliable for cross-cohort cancer analyses.
Results
From a PRISMA-guided review of 30 studies (2020–2025), we find that graph-based or hybrid pipelines dominate, with deep learning as the next most common family, and survival prediction as the main use case. Method families tend to align with the task and time of fusion; graph-hybrid approaches favour early- to intermediate-stage fusion, while deep learning spans the three stages of fusion. Across studies, three recurring trade-offs emerge: early-intermediate fusion can stabilize high-dimensional inputs but is sensitive to modality imbalance; shared latent-space designs better preserve partially observed samples; and late fusion supports more stable subtype structure but makes feature attribution less direct. The main message is that integration works best when fusion choices match the data’s noise, sparsity, and missingness, and when interpretability is built into the architecture rather than added later.
Multi-omics integration methods are now common in cancer studies, but results remain sensitive to design choices, including when fusion occurs, what is fused, and how missingness is handled. As a result, it is difficult to compare studies and determine which integration choices are most reliable for cross-cohort cancer analyses.
Results
From a PRISMA-guided review of 30 studies (2020–2025), we find that graph-based or hybrid pipelines dominate, with deep learning as the next most common family, and survival prediction as the main use case. Method families tend to align with the task and time of fusion; graph-hybrid approaches favour early- to intermediate-stage fusion, while deep learning spans the three stages of fusion. Across studies, three recurring trade-offs emerge: early-intermediate fusion can stabilize high-dimensional inputs but is sensitive to modality imbalance; shared latent-space designs better preserve partially observed samples; and late fusion supports more stable subtype structure but makes feature attribution less direct. The main message is that integration works best when fusion choices match the data’s noise, sparsity, and missingness, and when interpretability is built into the architecture rather than added later.
| Original language | English |
|---|---|
| Journal | Bioinformatics Advances |
| Volume | 6 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 15 Apr 2026 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
-
SDG 9 Industry, Innovation, and Infrastructure
Keywords
- Cancer
- Machine learning
- Deep learning
- Data integration
- Multi-omics
Fingerprint
Dive into the research topics of 'A review of multi-omics integration techniques across five machine learning method families'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver