ecancermedicalscience

Research

Benchmarking of radiobiological NTCP models in head and neck radiotherapy using independent computational pipelines: an institutional validation study with machine learning augmentation

16 Jun 2026
Kalyan Mondal, Abhijit Mandal, Anuj Vijay1c, Ganeshkumar Patel

Background & purpose: Normal tissue complication probability (NTCP) models require institutional validation before clinical implementation. Traditional radiobiological models, such as the Lyman–Kutcher–Burman (LKB) and Equivalent Uniform Dose (EUD) models, provide mechanistic dose–response frameworks, while machine learning (ML) approaches offer exploratory, data-driven alternatives that remain inadequately characterised in South Asian populations.

Methods: This retrospective study included 51 head and neck cancer patients treated with definitive radiotherapy. Binary endpoints were Grade ≥2 xerostomia (n = 3), dysphagia (n = 5) and mucositis (n = 4), scored using Common Terminology Criteria for Adverse Events version 5.0. NTCP calculations were performed using two independent computational pipelines (MATLAB-based RBMODELv1 and a Python implementation), with agreement assessed using Bland–Altman analysis. Traditional NTCP models (LKB, EUD) were evaluated and compared with artificial neural networks and XGBoost in a hypothesis-generating framework using a stratified 70:30 train–test split. Model performance was assessed using the area under the receiver operating characteristic curve (area under the curve), accuracy and Spearman’s rank correlation.

Results: Excellent agreement was observed between computational pipelines (mean bias 0.8%, 95% limits −1.9% to 3.5%). Traditional models demonstrated strong rank-order correlation with toxicity grades (ρ = 0.61–0.79, p < 0.001) and high accuracy (LKB: 90.0%–94.1%). Institution-specific parameters differed from quantitative analyses of normal tissue effects in the clinic values, including a lower parotid TD50 (34.1 versus 39.0 Gy). Exploratory ML analyses showed numerically higher discrimination for parallel organs but not for mixed-architecture structures; however, severe class imbalance (3–5 events per endpoint) limits statistical reliability.

Conclusion: Dual computational pipelines enable reproducible NTCP modeling for institutional use. Traditional radiobiological models perform acceptably after local calibration, while exploratory ML findings suggest potential organ-architecture-dependent patterns that require validation in adequately powered multi-institutional cohorts.

Artículos relacionados

Milagros Abad-Licham, Juan Astigueta, Caddie Laberiano Fernández, Himelda Chávez Torres, Grisnery Maquera Torres, Edwin Figueroa, Ricardo Bardales
G Luis Pendola, Roberto Elizalde, Pablo Sitic Vargas, José Caicedo Mallarino, Eduardo Gonzalez, José Parada, Mauricio Camus, Ricardo Schwartz, Enrique Bargalló, Ruffo Freitas, Mauricio Magalhaes Costa, Vilmar Marques de Oliveira, Paula Escobar, Miguel Oller, Luis Fernando Viaña, Antonio Jurado Bambino, Gustavo Sarria, Francisco Terrier, Roger Corrales, Valeria Sanabria, Juan Carlos Rodríguez Agostini, Gonzalo Vargas Chacón, Víctor Manuel Pérez, Verónica Avilés, José Galarreta, Guillermo Laviña, Jorge Pérez Fuentes, Lía Bueso de Castellanos, Bolívar Arboleda Osorio, Herbert Castillo, Claudia Figueroa
José Fernando Robles Díaz, Adela Heredia Zelaya, Alicia Milagros Avalos Rosas