In an effort to address a major challenge when analysing large single-cell RNA-sequencing datasets, researchers from The University of Texas MD Anderson Cancer Center have developed a new computational technique to accurately differentiate between data from cancer cells and the variety of normal cells found within tumour samples.
The work was published today in Nature Biotechnology.
The new tool, dubbed CopyKAT (copy number karyotyping of aneuploid tumours), allows researchers to more easily examine the complex data obtained from large single-cell RNA-sequencing experiments, which deliver gene expression data from many thousands of individual cells.
CopyKAT uses that gene expression data to look for aneuploidy, or the presence of abnormal chromosome numbers, which is common in most cancers, said study senior author Nicholas Navin, Ph.D., associate professor of Genetics and Bioinformatics & Computational Biology.
The tool also helps to identify distinct subpopulations, or clones, within the cancer cells.
"We developed CopyKAT as a tool to infer genetic information from the transcriptome data. By applying this tool to several datasets, we showed that we could unambiguously identify, with about 99% accuracy, tumour cells versus the other immune or stromal cells present in a mixed tumour sample," Navin said. "We could then go one step further to discover the subclones present and understand their genetic differences."
Historically, tumours have been studied as a mixture of all cells present, many of which are not cancerous.
The advent of single-cell RNA sequencing in recent years has enabled researchers to analyse tumours in much greater resolution, examining the gene expression of each individual cell to develop a picture of the tumour landscape, including the surrounding microenvironment.
However, it's not easy to distinguish between cancer cells and normal cells without a reliable computational approach, Navin explained.
Former postdoctoral fellow Ruli Gao, Ph.D., now assistant professor of Cardiovascular Sciences at Houston Methodist Research Institute, developed the CopyKAT algorithms, which improve upon older techniques by increasing accuracy and adjusting for the newest generation of single-cell RNA-sequencing data.
The team first benchmarked its tool by comparing results to whole-genome sequencing data, which showed high accuracy in predicting copy number changes.
In three additional datasets from pancreatic cancer, triple-negative breast cancer and anaplastic thyroid cancer, the researchers showed that CopyKAT was accurate in distinguishing between tumour cells and normal cells in mixed samples.
These analyses were made possible through collaborations with Stephen Y. Lai, M.D., Ph.D., professor of Head and Neck Surgery, as well as Stacy Moulder, M.D., professor of Breast Medical Oncology, and the Breast Cancer Moon Shot, part of MD Anderson's Moon Shots Program, a collaborative effort to rapidly develop scientific discoveries into meaningful clinical advances that save patients' lives.
In analysing these samples, the researchers also showed the tool is effective in identifying subpopulations of cancer cells within the tumour based on copy number differences, as confirmed by experiments in triple-negative breast cancers.
"By using CopyKAT, we were able to identify rare subpopulations within triple-negative breast cancers that have unique genetic alterations not widely reported, including those with potential therapeutic implications," Gao said. "We hope this tool will be useful to the research community to make the most of their single-cell RNA-sequencing data and to drive new discoveries in cancer."
The tool is freely available to researchers here.
The authors note that the tool is not applicable to the study of all cancer types.
Aneuploidy, for example, is relatively rare in paediatric and haematologic cancers.