Background: Metaplastic breast cancer (MpBC) is a rare and aggressive breast cancer subtype characterised by marked histological heterogeneity, therapeutic resistance and poor clinical outcomes. Despite increasing molecular research, existing evidence remains fragmented, heterogeneous and poorly integrated, limiting clinical translation and biomarker validation.
Methods: We developed an integrative analytical framework combining systematic review, quantitative meta-analysis, transcriptomic profiling and interpretable machine learning to identify and prioritise molecular markers in MpBC. A Preferred Reporting Items for Systematic Reviews and Meta Analyses-guided systematic review was conducted across PubMed, arXiv and Semantic Scholar. Effect sizes were standardised to Cohen’s d and synthesised using a random-effects model. Transcriptomic analysis was performed on the GSE165407 dataset using DESeq2 in R (RStudio version 1.1.463), with differentially expressed genes cross-referenced against literature-derived biomarkers. Supervised models including a multi-layer perceptron and boosted random forest were applied, with performance evaluated using receiver operating characteristic analysis. Model interpretability was assessed using SHapley Additive exPlanations.
Results: Eleven studies met inclusion criteria. Meta-analysis demonstrated low heterogeneity and a pooled effect size of d = 0.74 (95% CI 0.59–0.88), indicating a consistent moderate-to-large biomarker signal across studies. Pathway enrichment revealed convergence on PI3K/AKT/mTOR signalling, immune modulation and epithelial -mesenchymal transition. Transcriptomic profiling demonstrated concordance with literature-derived markers. The random forest model achieved strong classification performance (AUC = 0.91), with high specificity and minimal misclassification. SHapley Additive exPlanations analysis identified both canonical (PI3KCA, RPL39, EXO1) and non-canonical (CD55, LARGE2) contributors to model prediction.
Conclusion: This study provides an integrated synthesis linking systematic evidence, transcriptomic validation and interpretable machine learning in MpBC. By reconciling fragmented literature with data-driven modelling, we identify a biologically coherent and clinically tractable molecular signature, offering a foundation for biomarker-driven stratification and translational validation.