ecancermedicalscience

Research

Integrative systematic review and transcriptomic -machine learning analysis of molecular signatures in metaplastic breast cancer

7 May 2026
Joshua Agilinko, Sonam Patel, Jogitha Selvarajah, Nicholas Tekkis, Mathew Vithayathil, Suzette Samlalsingh

Background: Metaplastic breast cancer (MpBC) is a rare and aggressive breast cancer subtype characterised by marked histological heterogeneity, therapeutic resistance and poor clinical outcomes. Despite increasing molecular research, existing evidence remains fragmented, heterogeneous and poorly integrated, limiting clinical translation and biomarker validation.

Methods: We developed an integrative analytical framework combining systematic review, quantitative meta-analysis, transcriptomic profiling and interpretable machine learning to identify and prioritise molecular markers in MpBC. A Preferred Reporting Items for Systematic Reviews and Meta Analyses-guided systematic review was conducted across PubMed, arXiv and Semantic Scholar. Effect sizes were standardised to Cohen’s d and synthesised using a random-effects model. Transcriptomic analysis was performed on the GSE165407 dataset using DESeq2 in R (RStudio version 1.1.463), with differentially expressed genes cross-referenced against literature-derived biomarkers. Supervised models including a multi-layer perceptron and boosted random forest were applied, with performance evaluated using receiver operating characteristic analysis. Model interpretability was assessed using SHapley Additive exPlanations.

Results: Eleven studies met inclusion criteria. Meta-analysis demonstrated low heterogeneity and a pooled effect size of d = 0.74 (95% CI 0.59–0.88), indicating a consistent moderate-to-large biomarker signal across studies. Pathway enrichment revealed convergence on PI3K/AKT/mTOR signalling, immune modulation and epithelial -mesenchymal transition. Transcriptomic profiling demonstrated concordance with literature-derived markers. The random forest model achieved strong classification performance (AUC = 0.91), with high specificity and minimal misclassification. SHapley Additive exPlanations analysis identified both canonical (PI3KCA, RPL39, EXO1) and non-canonical (CD55, LARGE2) contributors to model prediction.

Conclusion: This study provides an integrated synthesis linking systematic evidence, transcriptomic validation and interpretable machine learning in MpBC. By reconciling fragmented literature with data-driven modelling, we identify a biologically coherent and clinically tractable molecular signature, offering a foundation for biomarker-driven stratification and translational validation.

Artículos relacionados

Lia Pamela Rebaza Vasquez, Jaime Ponce de la Torre, Raul Alarco, Joseana Ayala Moreno, Henry Gomez Moreno
G Luis Pendola, Roberto Elizalde, Pablo Sitic Vargas, José Caicedo Mallarino, Eduardo Gonzalez, José Parada, Mauricio Camus, Ricardo Schwartz, Enrique Bargalló, Ruffo Freitas, Mauricio Magalhaes Costa, Vilmar Marques de Oliveira, Paula Escobar, Miguel Oller, Luis Fernando Viaña, Antonio Jurado Bambino, Gustavo Sarria, Francisco Terrier, Roger Corrales, Valeria Sanabria, Juan Carlos Rodríguez Agostini, Gonzalo Vargas Chacón, Víctor Manuel Pérez, Verónica Avilés, José Galarreta, Guillermo Laviña, Jorge Pérez Fuentes, Lía Bueso de Castellanos, Bolívar Arboleda Osorio, Herbert Castillo, Claudia Figueroa
Julia Ismael, Federico Losco, Sergio Quildrian, Pablo Sanchez, Isabel Pincemin, Jose Lastiri, Santiago Bella, Alejandro Chinellato, Guillermo Dellamea, Alejandro Ahualli, Silvana Rompato, Julio Velez, Rafael Escobar, Ariel Zwenger, Cristina Rosales, Claudia Bagnes, Jorge Puyol, Dario Niewiadomski, Edgardo Smecuol, Fabio Nachman, Eduardo Gonzalez, Gustavo Ferraris, Juan Ramos Suppicich, Paola Price, Luis Medina, Juan O’Connor