Active case-finding method improves completeness and accuracy of data reported to the rural Eastern Cape Cancer Registry in South Africa

The quality and accuracy of the data provided by cancer registries has a significant impact on decision making. Over decades, high-income countries have been successful in monitoring their cancer burden because of well-established data abstraction techniques such as digital systems. Conversely, in low- and middle-income countries, sparsely distributed cancer registries, using alternative less costly, but imprecise methods are struggling to capture all cancer cases. A population-based cancer registry in South Africa covering a resource-limited rural population is faced with challenges in case finding yet the quality and accuracy of the data provided has a significant impact on decision making. The objective of this study was to assess data quality using two data quality attributes ‘completeness and accuracy’ and also to determine the benefits of using active and passive case-finding methods for cancer registration in this population. Data used were collected between January 2014 and December 2015 from four hospitals to compare the quality of both active and passive case-finding methods. From all four hospitals during the same period, a first set of data obtained through passive reporting was compared with a second set of data obtained through active case finding. Covering multiple facilities during active case finding can significantly improve quality of data, while passive case finding is challenged by data collection being confined to one specific health facility, only. Better investment in active case finding is recommended in settings with resource-distribution disparities.


Introduction
Cancer registries play an important role in monitoring the globally increasing cancer burden. Also, they are of pivotal importance in the planning and evaluation of cancer intervention programmes [1]. Consequently, the quality and accuracy of the data provided by cancer registries has a significant impact on decision making. The International Agency for Research on Cancer (IARC) of the World Health Organization; the global cancer surveillance controlling body, sets the rules and guidelines for cancer registries. Each registry's data quality is measured against those rules and its value depends on the quality of data generated. In cancer registration, quality of data is described by four attributes: namely, comparability, accuracy/validity, completeness and timeliness [2][3][4][5][6]. However, with the latter, there are no international guidelines at present, although specific standards for abstraction and reporting have been set out by certain organisations [4].
Over decades, high-income countries have been successful in monitoring their cancer burden because of well-established data abstraction techniques such as digital systems [7]. Conversely, in low-and middle-income countries, sparsely distributed cancer registries, using alternative less costly, but imprecise methods are struggling to capture all cancer cases. In Africa, particularly in rural areas, cancer registries are faced with challenges in case finding which include limited resources such as laboratories and oncologist specialists for accurate diagnoses, inaccessibility of hospitals due to poor infrastructure and poor record keeping for tracing and follow-up to patients [8]. This impacts on the generation of reliable, valid and complete data.
The Eastern Cape Cancer Registry (ECCR) is the only rural population-based cancer registry in South Africa covering eight magisterial areas [9][10] with a population of 1.2 million [11]. However, merely collecting cases and establishing a cancer registry is insufficient. Cancer registries have a vested interest in promoting the use of their data and must ensure data generated are of good quality [12]. The ECCR uses both active and passive case-finding methods to mitigate the discrepancies and shortfalls of either method alone [13]. The use of both methods remains financially burdensome to the registry but beneficial in terms of data quality. The active case-finding method involves annual visits to collaborating hospitals by ECCR staff to review and extract all available information from patients' records with a cancer diagnosis [13]. While the passive case-finding method involves data collectors in only four collaborating hospitals who collect and send data to the ECCR on a monthly basis. For uniformity in both methods, a standardised data collection tool is used [14,15].The ECCR contributes to cancer incidence (CI) data globally and regionally; Cancer Incidence in Five Continents (CI5); volumes X and XI [16,17], Cancer in sub-Saharan Africa [18] and survival collaborative studies; CONCORD 2 and 3 [19,20]. Other than the achievement of these developmental milestones, no internal study has been conducted to evaluate ECCR data quality. The objective of this study was to assess the quality of data generated by ECCR in two data quality attributes 'completeness and accuracy'. Completeness is the extent to which all incident cancers occurring in a population are included in the registry [2], whereas accuracy is defined as the proportion of cases in the data with a given characteristics (e.g. topography and morphology) that truly had the attribute [4]. This study focused on these two quality attributes as an initial attempt to checking internal consistency and uniformity of registry records produced using active and passive data collection methods in a resource-limited setting.

Data collection and study sample
This study was conducted between January 2014 and December 2015. In total, 15 hospitals and their associated pathology laboratories constitute the main data source used by the ECCR. Data from these hospitals generally is obtained by registry staff periodically performing patient chart reviews, also termed active case finding. Four of the 15 hospitals are resourced to proactively perform independent data abstraction and reporting to the cancer registry by hospital staff, also termed passive case finding. Data from four hospitals were used to compare the quality of both, the active and passive case-finding methods. From all four hospitals, in 2014 and 2015, a first set of data was obtained through passive reporting. The second dataset was constituted by cases collected by cancer registry staff from the same hospitals during the same period. The four hospitals included in the study were two peripheral hospitals (Tafalofefe and St Elizabeth) and two referral hospitals (Frere and Umtata General Hospital Complex (UGHC)). All types of malignant cancer cases were included and coded according to the International Coding of Disease for Oncology [20].

Data and analysis
For this study, the main information used included variables for patients' identification (name, surname, address, age, sex and ethnicity), the primary organ invaded by the tumour (topography), histological classification of the cancer tissue (morphology), incidence date (first date the patient was seen by the doctor and cancer diagnosed) and healthcare facility in which the patient was first diagnosed with cancer. Variables selected constitute the mandatory variables expected for each case to be accepted as valid in the database. The complete standardised data collection tool used during both active and passive case finding is attached (Appendix 1: Confidential cancer notification form). Before analysis, duplicates were cleaned once the patient's information has been consolidated.

(i) Dataset to assess completeness
For each hospital, completeness of active and passive case finding was assessed by calculating the number of patients captured during 2014 and 2015 by either method divided by the total number of patients identified through both methods combined; new cases active = N a and new cases passive = N p (C=N a Tcases*100 and C=N p Tcases*100). Patients captured by both methods were identical if all identification variables were matching.

(ii) Dataset to assess accuracy
To assess accuracy of information on topography and morphology, data were anonymised by assigning each patient a registry number as identification. Only 1,040 cases recorded by both methods (active and passive) were included in the analysis (Figure 1). Of these, 10% (N = 104) were randomly selected to assess and calculate the agreement in information provided with regard to topography and morphology received through both case-finding methods ( Figure 1). Three reviewers independently assessed these records and if results differed reexamination was carried out. Subsequently, analysis using Microsoft Excel 2016 was done. Testing for statistically significant differences between active and passive case-finding methods and between hospitals was done in R using McNemar's and Pearson's Chi-squared tests, respectively (Table 1a and b).

Table 1. (a) How did the performance of active and passive case finding differ between peripheral and referral hospitals?
By year, p-values for pairwise comparison of the sensitivity of active and passive case finding between peripheral and referral hospitals using Pearson's Chi-squared test with Yates' continuity correction

Completeness
We compared cancer registry data obtained through both active and passive case-finding methods from four hospitals in the Eastern Cape Province of South Africa. These consisted of two peripheral hospitals (Tafalofefe and St Elizabeth) and two referral hospitals (Frere and UGHC). During the study period (2014-2015), using active and passive case finding combined, a total of 2,961 individual cancer cases were identified at these four hospitals ( Figure 1, Table 2). The number of cases jointly identified by both case-finding methods was 2,176 (74% of the total amount of cases diagnosed; Table 2, Figure 2a and b). Neither of the two methods alone identified all diagnoses reported. However, among all cancer cases diagnosed, active case-finding identified a significantly higher proportion than that from passive case finding (p < 0.05; Tables 1a, b and 3).
For the peripheral hospitals St Elizabeth and Tafalofefe, respectively, active case-finding identified 94% and 96% of all cases diagnosed (95% for both peripheral hospitals combined; Table 2). In contrast, passive case finding covered significantly fewer cases among all cases identified (66% and 85%, respectively or 71% for both peripheral hospitals combined; p < 0.05; Table 2). Combined for the two peripheral hospitals, adding active case finding to the passive case-finding method increased the number of cancer cases identified by 41% (529 cases were reported through passive case finding alone and 215 cases were added by including active case finding; Tables 2 and 3). In contrast, adding passive case finding to the active case-finding method increased case detection by 6%, only (703 cases were reported through active case finding alone and 41 cases were added by including passive case finding).
The significantly higher performance of active over passive case finding for the peripheral hospitals was not observed for the two referral hospitals Frere and UGHC. For these hospitals combined, among all cancer cases diagnosed, the proportion of cases captured by active or passive case finding alone was very similar (88%; Table 2). Also, adding active case finding to the passive case-finding method still increased the number of cancer cases identified by 13% (Tables 2 and 3).

Accuracy
An assessment of the accuracy of information abstracted from the medical records by each method was done with regard to data on tumour topography and morphology. The overall proportion of patient records matching for tumour topography was 89% in 2014 and 90% in 2015 (Table 4). However, agreement for morphology was less with 83% in 2014 and 76% in 2015 (79% for both years combined; Table 5, Figure 3).
Specifically, for data on tumour topography, at both peripheral hospitals, agreement of active and passive case-finding methods was 100%; however, at a very low sample size (N = 19 records in total). For referral hospitals, agreement was 87% at UGHC and 92% at Frere (89% in total; N = 85; Table 4).
For data on tumour morphology, at peripheral hospitals, agreement was 82% at St Elizabeth and 75% at Tafalofefe (79% in total; N = 19; Table 5). For referral hospitals, agreement of active and passive case finding was 71% at UGHC and 85% at Frere (79% in total; N = 85; Table 5).

Discussion
The main objective of this study was to assess the completeness and accuracy of cancer patients' data reported to a rural population-based cancer registry in South Africa. Data on the burden of cancer in Sub-Saharan Africa are very scarce and the limited information available is mostly biased towards urban centres with better infrastructure and possibly distinct disease epidemiology. Our study contributes significantly to the understanding of the importance of CI even in rural populations of Africa with resource limitations.
The overall results of the study indicated less variations with regard to completeness compared to accuracy; with a range of 83%-96% in referral hospitals and 83% in peripheral hospitals ( Figure 4). However, differences were noted in individual hospitals with a range of 65%-97%. Neither of the two methods alone identified all patients reported. Since registry staff reviewed the same records used by data collectors during data abstraction, human error in recording resulted in missing cases in either case-finding method. However, in the entire study, active case-finding contributed more cases than passive (p < 0.05). We also observed that for the resource-constrained peripheral hospitals, active case-finding captured a significantly larger portion of the total number of patients recorded than passive case finding (95% versus 71%). Hence, active case finding significantly improved case detection; increasing the number of cases identified by as much as 41%. At better equipped referral hospitals, passive and active case finding identified a similar proportion of patients among all cases recorded (88%). Nonetheless, also for referral hospitals, active case finding significantly increased the number of patients captured by 13%.
A higher accuracy agreement for data on topography was observed with disagreement in only 17%-21% of records which is not a nugatory result especially in under-resourced settings where there is manual data collection. Passive case-finding led to poor information recorded on cancer morphology compared to active. Disagreement was frequently based on different or incorrect information recorded regarding diagnoses. A higher record of accuracy was observed in referral compared to peripheral hospitals. Frere Hospital, which is a referral, showed the highest accuracy in recording. This hospital unlike the others has an in-house pathology laboratory linked data to patients' records. The hospital is also fully equipped with oncology and radiation units and several practicing oncologists and registrars. Similar performance of passive reporting (Appendix 2: Resource differences between hospitals observed during the study, possibly impacting quality of passive reporting) was reported at UGHC Hospital which is a referral hospital and at Tafalofefe Hospital which although is a peripheral hospital has a similar infrastructure which is supportive and relevant to cancer registration. In contrast, St Elizabeth Hospital had the lowest completeness for passive case-finding (Table 2) due to frequently interrupted data submissions, incorrect record of information regarding diagnosis which raised some concern. Information regarding high morphology disagreement reflects the absence of oncology trained specialists at both peripheral hospitals and lack of follow-up of referred cases so that information about diagnosis is completed and improved.  The proportions indicate the cases added by active case finding relative to the number reported by passive case finding *Peripheral hospitals **Referral hospitals This is the first study of its kind in South Africa and one of very few in Africa due to the general scarcity of cancer registries in this continent. The results highlight the challenges experienced in achieving cancer registration with stable good quality data. These include lack of funding to support these registries, political will and commitment to support data generation activities. Consequently, there is no infrastructure (geographical remoteness, no transport, etc.), human (oncologists) and material resources (laboratories, proper documentation, etc.) and inaccessibility of health medical care centres.
Similar studies include one by Al-Haddad et. al. [8] whose findings in Nigeria are similar to our study, because both found evidence of incompleteness in case finding. This was attributed to infrastructure challenges including human and material resources for proper diagnosis and data recording. There is urban bias observed in another study conducted in Tanzania; of the Kilimanjaro Cancer Registry [22]. High performance was evident with 98% completeness and 94% accuracy and can be linked with the number of specialists in the hospital which tends to improve the quality of data collection [4]. The same was observed in two recent studies from the high-income countries: Singapore (Fung et. al [23]) and Switzerland (Wanner et. al [24]). In both studies, infrastructure is not an issue, but these registries showcase the importance of high-quality data which is important for confidence in published results and receiving trust of those who use data including public health planners.

Limitations
Our study has some limitations. The study population was small; only two peripheral and two referral hospitals were included, which may only be marginally representative of the reporting capabilities of other hospitals in South Africa. Furthermore, their results cannot be compared within the ECCR as they are the only hospitals using the passive case-finding method which constitute only 20% of the total case-finding methods. Patients were double counted especially those collected by active case finding. However, use of multiple sources improves the quality of diagnoses. These limitations did not impact on the general findings of our study but highlighted the challenges of passive reporting especially in resource-limited areas.

Conclusion
Distinct hospital-specific challenges in cancer registration were observed, particularly in peripheral hospitals. More than a third of the patients diagnosed would not have been reported. Infrequent reporting is associated with staff shortage resulting in delays in reporting and missing information in some cases. This study highlights the importance of regular training for data collectors and retain them in the project for longer time exposure and experience to improve their understanding of concepts such as 'completeness and accuracy' in cancer registration and impact of these in cancer epidemiology. Furthermore, the passive case-finding method alone has negative implications for the quality of data. Two methods used complement each other particularly in settings like South Africa with disparities in resource distribution. It was also evident that utilising and access to multiple sources including pathology laboratories during active case finding improved data quality in the ECCR.
Cancer registration in Africa is possible but it has proved to be a difficult task due to many factors, some of which include: difficulty in tracing cases, lack of medical information systems and lack of funding or underfunding. Africa must seriously consider investing in special staff training for efficient digital information exchange between external pathology laboratories and hospitals to address these challenges. Investment in active case finding will be cost effective and is recommended.

Acknowledgments, funding and conflicts of interest
This study was sponsored entirely by the SAMRC. Co-author Dr Borna Müller is employed by F. Hoffmann-La Roche Ltd, a pharmaceutical producer of cancer drugs and diagnostics. No other co-author nor the project received financial support from Roche.
This study was presented in part as a poster at the African Organization for Research and Training in Cancer (AORTIC), 11th International Conference, Kigali Rwanda in 2017.
The following colleagues are acknowledged for their valuable inputs: Mr Steady Chasimpha of Malawi Cancer Registry for advice and guidance at the early conception of this study, data collectors in the four selected hospitals without whom data would be incomplete, collaborating hospitals in and outside the registration area for co-operation and support during data collection, Mr Thendo Ramaliba for the technical support, Miss Ngcwalisa Jama for editing the earliest version of the draft manuscript and IARC and African Cancer Registry Network for technical training skills and support.
The authors have no conflicts of interest to declare.