The National Cancer Institute (NCI) announced that it had removed all prostate specific antigen (PSA) data from the SEER (Surveillance, Epidemiology and End Results) and SEER-Medicare programmes.
The PSA data were removed after quality control checks revealed that a substantial number of PSA values included in the programmes were incorrect.
An editorial published in The Journal of Urology explores the ramifications of the removal of these data for researchers, clinicians, and administrators within the health care community, as well as the use and accuracy of large administrative data sets in general.
The SEER programme, initiated by NCI in 1973 and one of the oldest and most highly regarded cancer registries in the world, is legislatively mandated to collect cancer incidence and survival data from 17 population-based cancer registries across the United States, representing roughly 28% of the U.S. population.
The SEER-Medicare data set links the cancer information in SEER to administrative claims data for patients in SEER covered under the Medicare programme.
David F Penson, MD, MPH, Director of the Center for Surgical Quality and Outcomes Research, Professor and Chair, Department of Urologic Surgery at Vanderbilt University, and the VA Tennessee Valley Geriatric Research, Education, and Clinical Center, Nashville, TN, cautions that withdrawal of these data from SEER will have two major impacts on the field of prostate cancer research.
“First, ongoing analyses using SEER and SEER-Medicare that include PSA data will have to be redesigned in light of the problems with these data. Simply put, journals will not be able to accept SEER studies that rely on the PSA data as a primary variable of interest, including those that use PSA in risk stratification systems to adjust for confounding or in cohort identification. This effect is relatively straightforward and should not cause great problems in the field going forward.”
According to the author, “The greater problem, however, is the impact of the flawed PSA data on the existing urological literature. SEER and SEER-Medicare data have been used to address questions about screening and effectiveness of treatments for localised and advanced disease. How can we now trust these studies given the problems with the PSA data?”
Dr Penson cautions that while large administrative databases like SEER have tremendous value when answering difficult clinical and health care policy questions if used properly, researchers should reconsider publishing secondary data analyses just because the data are relatively easy to obtain and analyse.
“We cannot ask these data sets to answer questions that they are not capable of answering. In that situation we have to do the really hard work and collect primary data. It’s time for us to stop doing big data fishing expeditions and taking the easy way out.”