ecancermedicalscience

Research

Integrative systematic review and transcriptomic -machine learning analysis of molecular signatures in metaplastic breast cancer

7 May 2026
Joshua Agilinko, Sonam Patel, Jogitha Selvarajah, Nicholas Tekkis, Mathew Vithayathil, Suzette Samlalsingh

Background: Metaplastic breast cancer (MpBC) is a rare and aggressive breast cancer subtype characterised by marked histological heterogeneity, therapeutic resistance and poor clinical outcomes. Despite increasing molecular research, existing evidence remains fragmented, heterogeneous and poorly integrated, limiting clinical translation and biomarker validation.

Methods: We developed an integrative analytical framework combining systematic review, quantitative meta-analysis, transcriptomic profiling and interpretable machine learning to identify and prioritise molecular markers in MpBC. A Preferred Reporting Items for Systematic Reviews and Meta Analyses-guided systematic review was conducted across PubMed, arXiv and Semantic Scholar. Effect sizes were standardised to Cohen’s d and synthesised using a random-effects model. Transcriptomic analysis was performed on the GSE165407 dataset using DESeq2 in R (RStudio version 1.1.463), with differentially expressed genes cross-referenced against literature-derived biomarkers. Supervised models including a multi-layer perceptron and boosted random forest were applied, with performance evaluated using receiver operating characteristic analysis. Model interpretability was assessed using SHapley Additive exPlanations.

Results: Eleven studies met inclusion criteria. Meta-analysis demonstrated low heterogeneity and a pooled effect size of d = 0.74 (95% CI 0.59–0.88), indicating a consistent moderate-to-large biomarker signal across studies. Pathway enrichment revealed convergence on PI3K/AKT/mTOR signalling, immune modulation and epithelial -mesenchymal transition. Transcriptomic profiling demonstrated concordance with literature-derived markers. The random forest model achieved strong classification performance (AUC = 0.91), with high specificity and minimal misclassification. SHapley Additive exPlanations analysis identified both canonical (PI3KCA, RPL39, EXO1) and non-canonical (CD55, LARGE2) contributors to model prediction.

Conclusion: This study provides an integrated synthesis linking systematic evidence, transcriptomic validation and interpretable machine learning in MpBC. By reconciling fragmented literature with data-driven modelling, we identify a biologically coherent and clinically tractable molecular signature, offering a foundation for biomarker-driven stratification and translational validation.

Related Articles

Julia Palma, Sofía Aljaro, Daniela Arce, Milena Villarroel, Federico Antillón, Luiz Lopes, Nataly Mercado, Adriana Morais, Andrés Portilla, Leonardo Arana, Guillermo Chantada, Mónica Cypriano, Soad Fuentes, Augusto Pereira, Lourdes Vega, Nubia Zuñiga, Liliana Vásquez, Andrea Capellano, Paola Friedrich
Carolina Muñoz Olivar, Sylvia Ramis, Francisco Acevedo, Benjamin Walbaum, Karol Ramirez, Gina Merino, Barbara Samith, Isabel Saffie, Carolina Zarate, Lidia Medina, Constanza Figueroa, Francisco Dominguez, Mauricio Camus, Catalina Vargas, Maria Elena Navarro, Dravna Razmilic, Marisel Navarro, Constanza Pinto, Catalina Muñoz, Raul Martinez, Manuel Manzor, Cesar Sanchez
Ariel Cherro, Laura Aresca, María Susana Ciruzzi, Alejo Agranatti, María Fernanda Montaña, Cynthia Frahne, Jaqueline Cimerman
Grace M Ferri*, John F Murphy*, Akash Oza*, Alexander J B Bulteel, Wafaa Abbasi, Rachel Anderson, Mehmed Taha Dinc, Eva Gaufberg, Kayra Cengiz, Sainikhil Sontha, Janice Weinberg, Patrick Kurpaska, Yashvin Onkarappa Mangala, Matthew Kulke, Umit Tapan
Meghal Prajapati, Anil Kumar Goel, Yamini Patel, Divyeshkumar Rana, S Lokesh, Pooja Panchal, Dhruv Rathod, Chandramouli Ramalingam, Kondeti Ajay Kumar