Prognostic value of a patient-reported functional score versus physician-reported Karnofsky Performance Status Score in brain metastases

Introduction Our aim was to investigate the added prognostic value of a patient-reported functional outcome score over Karnofsky Performance Status (KPS) in patients with non-small-cell lung cancers (NSCLC) with brain metastases. Materials and methods The baseline data are from a prospective cohort study involving 140 consecutive patients presenting at our institute. A patient reported performance status (PRPS) was obtained by summing the physical- and role-functioning scale scores of the EORTC QLQ C30 questionnaire. Nested cox proportional hazards models predicting survival were developed including both KPS and PRPS (full model), KPS only (KPS Model), and PRPS only (PRPS model). The incremental value of the addition of KPS or PRPS was ascertained using the likelihood ratio test, model adequacy index and integrated discrimination Improvement (IDI). Results PRPS was an independent and statistically significant prognostic factor and had only a moderate degree of agreement with KPS. All models showed nearly the same discrimination and calibration accuracy, but the likelihood ratio test comparing the full model to the KPS model was significant (L.R. Chi2 = 5.34, p = 0.02). Model adequacy index for the KPS model was 85% versus 95% for the PRPS model. IDI when comparing the KPS model to the full model was 0.0279, while it was 0.008 for the PRPS model versus the Full model. Conclusions Use of patient-reported functional outcomes like PRPS can provide the same prognostic information as KPS in patients of NSCLC with brain metastases. Highlights Patient-reported functional status (PRPS) has a moderate degree of agreement with KPS. PRPS is an independent and significant predictor of survival in brain metastases. PRPS can replace KPS without loss of prognostic information.


Introduction
Performance status (PS) has been defined by the National Cancer Institute as 'a measure of how well a patient is able to perform ordinary tasks and carry out daily activities' [38]. The baseline PS of patients is one of the most important factors influencing their prognosis [2]. In the context of patients with brain metastases, the baseline PS is the key prognostic factor determining survival [13,[32][33][34]. This is exemplified by the fact that the baseline PS is included in the diagnosis-specific graded prognostic assessment (DS-GPA) score across different primary sites [33]. The physician-reported performance status is usually reported in the form of a summary score, and the two most commonly used scales are the Karnofsky performance status (KPS) [18], and the Eastern Cooperative Oncology Group Performance Status (ECOG-PS) scales [23]. Of these, KPS has been used in successive trials in brain metastases patients and is also used for validated prognostic indices like the GPA [33].
PS is usually estimated by health care professionals during the course of a routine health care visit. However, estimation of PS using the KPS or ECOG scales suffers from interobserver variation due to the subjective nature [7,28,31,36] of the assessment scale. Further, given the subjective nature of assessment, it is not surprising that there is only a moderate degree of agreement between the PS score as assessed by the patient themselves and the physician [3,5,6,17,22].
It is well known that physicians and other health care workers' judgement about subjective measures, like patient's pain, quality of life, anxiety, depression, etc., is different from that of the patients themselves [30]. Given the inherent subjectivity in PS assessment and the previous reports of disagreement between physician and patient assessed PS, it is reasonable to question if the use of patient-reported functional status would allow us to get better prognostic estimates as compared to KPS.
Functional status can be ascertained through the functional domains of Health-Related Quality of Life (HR-QOL) instruments like the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire C-30 (QLQ-C30) [20]. This questionnaire comprises of 30 items, of which 15 items contribute to five functional scales looking at physical, role, emotional, cognitive and social functioning.
Between June 2012 to April 2015 we enrolled 140 consecutive patients with non-small-cell lung cancer (NSCLC) with brain metastases planned for palliative whole brain radiotherapy in a prospective cohort study (CTRI Number: CTRI/2013/01/003299). As a part of this study, patients received usual care and underwent quality of life assessment using the EORTC QLQ C30. The objective of the current study is to ascertain if the patient reported functional status as ascertained using the EORTC QLQ C30 provides additional prognostic information over and above KPS.

Materials and methods
One hundred and forty (140) consecutive patients with NSCLC with brain metastases were enrolled in this IRB approved prospective cohort study after written informed consent (Clinical Trial Registry of India number: CTRI/2013/01/003299). The study was conducted between June 2012 to April 2015 at a tertiary cancer centre in India. The quality of life (QOL) assessments and mini-mental state examination (MMSE) [11] was performed at baseline and information regarding traditional prognostic variables was recorded. QOL was assessed using www.ecancer.org ecancer 2017, 11:779 the EORTC QLQ C-30 [1], , and BN-20 questionnaires [21]. All questionnaires were self-reported. All patients underwent palliative whole brain radiotherapy (WBRT) to a dose of 20 Gy in five fractions over one week. Similar QOL and MMSE assessments were also performed at subsequent follow-up. Data regarding baseline quality of life scores were available for all patients.
For the purpose of this study, we calculated the baseline values of the functional scales of the EORTC QLQ C30 questionnaire. The calculation was done as per the methodology suggested by the EORTC QLQ scoring manual [10]. This involved calculation of the raw score for each scale by obtaining the average score for all the items in the scale. This was followed by linear transformation of the raw score to obtain a score ranging between 0 and 100. For each functional scale, increasing scores represented a better functioning. As KPS primarily focusses on the physical functioning, we chose the physical-and role-functioning scales for constructing a Patient-Reported Performance Scale (PRPS). The seven items which are a part of this PRPS are shown in Table 1.
The scale to which each item belongs is also indicated. Items taken from the EORTC QLQ C30 questionnaire.
The PRPS was the average of the physical and role functioning scale scores. No further score transformation was done. PRPS = (Physical Functioning Scale Score + Role Functioning Scale Score)/2 .... Equation 1.
The above calculation methodology was adopted, as it retained the original scale definition as well as the original score calculation method as defined by the EORTC. However, we also used another method of calculation where the PRPS was derived from the seven questions directly using the formulae: However, the PRPS calculated using this alternate methodology did not alter the essential results in the study (results not shown).
Prior to analysing the utility of PRPS in predicting prognosis, we ascertained the degree of agreement between the PRPS and the KPS. Traditional measures to define agreement like the weighted kappa score are not useful in this setting as the two scales are different. Hence, we used polychoric correlation as a measure of agreement. Polychoric correlation allows estimation of agreement between two ordinal scales measuring an underlying latent construct which is assumed to be continuous [12]. After quantifying the degree of agreement between the KPS and PRPS, we ascertained the impact of the addition of PRPS as a prognostic factor. In order to ascertain the successive nested, cox proportional hazards models were fit. These models were as follows: 1. Full model: age, gender (male/female), number of brain metastases (1-3 or 3 or more), extracranial disease (present/absent), epidermal growth factor receptor (EGFR) mutation (mutated/wild type/not tested), KPS and PRPS.
2. KPS Model: Same as model 1 except PRPS was excluded.
3. PRPS Model: Same as model 1 except KPS was excluded.

Patient characteristics
The demographic and disease-related characteristics of the patient population are described in Table 2. The database was closed for analysis in August 2016, by which time, 111 patients (79.3%) had died. The median overall survival calculated from the date of completion of WBRT was 166 days (95% CI: 108-242 days). All deaths were related to disease progression. The 30 day, 60 day and 120 day survivals were 92%, 79% and 55% (Figure 1). Assuming a rule of thumb of 10 events per variables, 111 events gave us sufficient power to examine 11 variables in a prognostic model [24,25].

Agreement between PRPS and KPS
The polychoric correlation coefficient between KPS and PRPS was 0.46 (standard error: 0.07) indicating that only a modest degree of agreement existed between KPS and PRPS. The test for bivariate normality was not significant (Chi-square = 159.8, df = 229, p = 0.9998) indicating that the assumptions of polychoric correlation were not violated. As can be seen in Figure 2, there is a wide variability between the PRPS and KPS. The important prognostic variables which have been included in the model did not seem to influence the correlation between PRPS and KPS ( Figure 2 and Appendix). Patients with a poorer cognitive function in general had a better agreement between the KPS and PRPS scores, though the degree of agreement was still modest with the highest polychoric coefficient of 0.67 for patients with poor cognitive function as defined by the MMSE test (Appendix). www.ecancer.org ecancer 2017, 11:779

Comparative evaluation of prognostic models
The three pre-specified models are presented in Table 3 and as can be seen, in the full model as well as the PRPS model, PRPS was found to be a significant predictor of survival. While KPS was a significant predictor of survival in the KPS model, it was not significant in the full model when PRPS was also included. Model assumptions were checked as shown in Appendix.
Contrasts have been depicted for continuous variables for a better understanding of the hazard ratios. Statistically significant variables in the model are indicated with*. p values < 0.05 is taken as significant and all coefficients have been rounded to two decimal places.
EGFR: epidermal growth factor receptor; KPS: Karnofsky performance status; LR Chi 2 : likelihood ratio chi-square value; PRPS: patientreported performance scale. See Appendix for the graphical representation and the summary statistics for the individual models.  Figure 3 shows that actual time-dependent AUC values for the three models at different time points. As can be seen, the KPS model has slightly better discrimination ability at shorter follow-up times, while the converse is true about the PRPS model. However, on the whole, the discriminative ability of the PRPS model is nearly similar to that of the full model at all time points.  The findings were confirmed in the risk assessment plots (RAP) and the values of integrated discrimination improvement (IDI) (Figure 4 and Appendix). While the addition of PRPS to a model with KPS resulted in a slight improvement in the discrimination between non-events, there was no difference in discrimination for patients with events. On the other hand, addition to KPS to a model with PRPS does not result in any discernible improvement in the predictive ability for either patients with or without events.
Bootstrap optimism-corrected calibration plots of the three models showed a good fit with mean absolute errors and 0.9 quantile of the absolute error being 0.035 (0.07), 0.036 (0.06), and 0.037 (0.08) for the full model, KPS model, and PRPS model, respectively. Model calibration plots are shown in the Appendix.

Discussion
As a construct, performance status (PS) has both a subjective and an objective domain, but the current methods of assessment in the clinic use mainly subjective measures [19]. It is, therefore, surprising that physicians continue to rate the patient's performance status, while other subjective issues, like pain and quality of life, are usually patient reported. The EORTC QLQ C30 has seven questions that deal with the physical and role function, and these questions are more specific as compared to traditional PS assessment methods like KPS or ECOG scales. Hence, intraobserver variability is likely to be less as compared to traditional PS assessment as the subjectivity arising out of physician assessment would be eliminated [35].
Guzelant et al have previously shown that of all the scales in the EORTC QLQ C30 questionnaire, physical functioning and role functioning scales have the strongest correlation with KPS [15]. In the current study, we demonstrate that a composite score of the physical and role functioning score (PRPS) provides valid prognostic information over and above KPS and may be used instead of KPS with little loss in model predictiveness and accuracy.
Previous studies evaluating physician-and patient-reported performance scales have shown significant disagreement between the two [3,5,6,17,22]. As shown in this study, there is only a modest agreement between patient-reported functional scores and physician-reported PS. In addition to inter-observer variability in rating PS [3], it is known assessment of PS is influenced by physician bias. For example, Broderick et al have demonstrated that younger patients are generally assigned more favourable PS scores by physicians [7]. This is despite the fact that PS does not correlate well with comorbidity and the disease stage [9].
While the current study demonstrates that PRPS can substitute for KPS without loss of prognostic information, results from previous studies have been conflicting. For example, Ando et al have reported that oncologist rated PS scores best fit the observed survival [3]. However, unlike the current study, the prognostic model comparisons were not done based on nested models and robust internal validation or calibration was not performed. One of the concerns regarding the use of patient performance status has been that patients tend to rate their own performance status poorly [3,8,22,29]. The reason behind this are poorly understood but may be related to depression [17] or a subconscious desire to seek help [3]. The use of patient-reported PS can result in almost half the patients excluding themselves from clinical trials where PS is a part of the inclusion criteria [8]. The use of PRPS as used in the present study can be a way around, wherein the patient-reported scores are used for determining prognosis, while physician-determined PS is used to determine clinical trial eligibility. Further use of specific questions as used in the EORTC QLQ C30 may reduce the possibility of patients rating their performance status artificially lower [35].
As Suh et al. have reported composite PS derived from a patient reported review of systems, can be obtained in a longitudinal fashion in the clinic and longitudinal changes in this influence prognosis [35]. Further, unlike the findings reported by Suh et al, we found that baseline PRPS was a significant and independent predictor of survival [35].
Our current study has several strengths in this regard. It is prospectively conducted in a cohort of patients with a single disease site. Consecutive patients were recruited to minimise selection bias. As all patients received the same radiotherapy treatment variability was also minimised which may have been a factor in previous studies [3,5]. Follow-up was also complete and adequate. The final event (death) was observed in more than 80% of the patients. Baseline patient-reported QOL data were complete as was information regarding other prognostic variables. Further, all these information was collected in a standardised proforma, which further minimised interobserver variability. Hence, this data set was ideal to compare and contrast prognostic models that employ baseline predictive factors. The findings from the study lend further credence to the observation that patient-reported outcome measures are better predictors of survival as compared to traditional PS [14].
The limitations of the study include the fact that it was conducted in a single centre and most patients had an adenocarcinoma histology.
Hence, applicability of this model to patients with other histologies and brain metastases from other sites may be limited. Nonetheless, performance status is a part of all diagnosis-specific GPA indices [33], which is an indicator of its importance in patients with brain metastases irrespective of the site of origin.
Further all patients in this study had received whole brain radiotherapy. Patients who received stereotactic radiotherapy for brain metastases, usually have a lower intracranial disease burden and hence likely to have better performance status. However, even in this setting, a score like the PRPS is likely to discriminate better between subtle grades of functional impairments as compared to the ECOG and KPS scales. As shown in the appendix, patients with better MMSE scores had a poorer correlation between the KPS and PRPS scale.
Patients' assessment of their own HRQoL assessments is influenced by several factors like comorbidities, cognitive function, and response bias. However, the same issues affect assessment of a subjective domain like functional status when done by other observers. Eliminating other observers from the assessment process, has to potential to reduce inter-observer bias as the patient directly reports his/her functional status. In this regard, the finding that patients with poorer cognitive function had better agreement between the PRPS and the KPS score provides some insights. It is likely that the functional deficits produced in the presence of major cognitive deficits would have caught the attention of the treating physician resulting in better agreement in grading the functional deficits. In patients with better cognitive function (as assessed by the MMSE), the functional deficits would have been subtle and hence would have either not been noted by the physician or graded adequately using KPS.

Conclusion
Our study shows that patient functional score derived from the EORTC QLQ C30 (PRPS) was both a statistically significant and an independent predictor of mortality in patients with brain metastases. It also gave the same prognostic information as KPS in our prognostic model. There was significant disagreement between PRPS and KPS, although the same were measured at the same time point. The results of this study highlight the importance of patient reported outcome measures in patients with brain metastases and should spur further research in the use of patient-reported functional outcomes in this population.

Funding
The project was supported by an intramural grant from Tata Memorial Hospital. www.ecancer.org

PRPS calculation
As indicated in the manuscript the PRPS was calculated as the sum physical and role function scores obtained from the baseline EORTC QLQ C30 questionnaires.

Physical Function Scale
Min.
1st As can be seen from the above, the correlation coefficient seems to vary between 0.17 and 0.56 with the largest correlations being observed for the EGFR unknown subgroup and the subset with 1-3 brain metastases. However, overall the correlation coefficients indicate only a weak-moderate correlation.
In order to formally acertain the magnitude of agreement between KPS and PRPS, we will use the polychoric correlation. The results of the polychoric correlation analysis is indicated below.

Model specification
First we will build the first model including both PRPS and KPS. The other explanatory variables to be included are: 1. Age (modelled as a continuous variable)

PRPS (modelled as continuous variable)
The three continuous variables would be expanded by restricted cubic splines using four knots. The choice of four knots for restricted cubic splines stems from the observation by Harrell et al, that using four knots offers an adequate fit of the model and is a good compromise between flexibility and loss of precision caused by overfitting a small sample. As shown by the results of the AONVA the linearity assumptions seems to hold true.

Checking linearity assumption
Further confirmation is obtained by visual examination of the plots of martingale residuals against the continuous variables, which allows us the check the functional forms of the covariates for the linearity assumptions. As the loess smooth lines and the 95% confidence intervals show the lines are approximately centred around 0 with no discernable pattern indicating that we can proceed with a linear assumption.

Checking proportional hazards assumption
We now check the proportional hazards assumption using scaled shoenfeld residual both using hypothesis testing and graphical methods.
As can be seen from the plots above, the proportional hazards assumption holds true. The global test of proportional hazards penalised for the 8 degrees of freedom is non-significant with p value of 0.13 indicating that the proportional hazards assumption is not violated. www.ecancer.org ecancer 2017, 11:779

Outlier detection
In the plot below, we fit the deviance residuals to the observation ID to detect overtly influential outliers. As can be seen, there are no significant outliers detected. The loess smooth line is approximately centred around 0 with no definite pattern. Most observations lie within the 1 standard deviation.
From the above, we can see that the assumptions of the cox proportional hazards model hold true. The final full model can thus be specified as a linear combination of the covariates. www.ecancer.org As we can see comparing the full model with the model including PRPS, the likelihood ratio test shows that inclusion of KPS does not account for enough variance that we can reject the null hypothesis that the coefficient for KPS equals 0. In other words the models are the same or that the goodness to fit is not improving with the addition of KPS to a model which already includes PRPS.

C-statistic
The Harrell's C Index is a global index for validation of a prognostic model. It is an unitless index of the rank correlation between the predicted prognosis and the actual observed prognosis. A model with a higher predictive discriminatory ability has a higher C-index. A higher value implies that the model assigns a higher probability of survival to patients with higher survival times.

Reclassification scatterplots
We divided the survival outcomes by the median survival to derive two groups of patients: one with survival times of 182.5 days or less and the other whose survival times was more than 182.5 days. Patients with survival less than or equal to median were classified as having a 'Poor Outcome', while the others were considered to have a 'Good Outcome'. Predicted mortality probablities at 182.5 days were then obtained from the three models. Scatter plots of predicted mortality by the full model verses the KPS model and the PRPS models were then created. The density or the number of points above and below the diagonal line was thus an indicator of the accuracy of prediction. For patients with a Good Outcome (i.e. median survival > 182.5 days), points lying below and to the right of the diagonal indicated that the full model was a better predictor of the actual survival and vice versa.

Risk assessment plots and IDI
Risk assessment plots are plots in which sensitivity for those with events and one specificity for those without events against the calculated risk are depicted in the same figure. The dotted lines represent the curves drawn from the data obtained from the reference model, while the solid lines represent those obtained from the new model. In our case, we designated the reference model as the KPS and PRPS models, respectively, and the new model as the full model to ascertain the added benefit of adding PRPS or KPS, respectively. The area under curve between the dotted and dashed lines are used to derive the Integrated Discrimination Improvement. In this case, the package shows the integrated discrimination improvement for both events and non-events. This makes the risk assessment plot more informative than an ROC plot because it illustrates separately how good each model is for both those with and without events. Improved performance for assigning lower risk to non-event individuals moves the reference curve (red dashed line) toward the lower-left corner (red solid line), whereas improved performance for assigning higher risk to event individuals moves the reference curve (

Packages used for analysis
The following packages were used for analysis: