Integrating multiple omics information in the era of precision oncology
Dr Yu Shyr - Vanderbilt-Ingram Cancer Center, Nashville, USA
All of this we know because the next generation sequencing data, that data is huge. Not only the amount of data is huge, it’s the number of the subjects, now we have the whole sequencing data, is very big. How big is it? Several years ago we just finished a thousand genome project, now we are talking about ten thousand; we will reach a million people with the whole genome data in 3-5 years. So with that amount of data how can we really dig into the data and get the information out that helps us to design clinical trials more efficiently?
How will you do this?
I’m the statistical bioinformatician, so that is my job. So my opinion is we need to develop more tools, we need to have better computing, even using a private cloud computing system because the data is huge. We also need to work closely with not only the clinicians but even the basic science researchers because the goal is to use the big data, data mining technology, you still need to move back to the wet lab to confirm what you are finding is reasonable, it biologically makes sense, and then you move back to clinical trials. So the answer is the following: we need more training programmes for big data, we need more people who want to join the data science field, we need more technologies. Honestly, at this stage every week you have a new so-called omics technology available but the algorithms, the data analysis, that gap exists so we need to work harder.
How can we bridge the distance between the wet lab and the data?
My view is there are two ways. The first way is once the wet lab, traditionally from the cell line from the animal model they find something and then they will publish that data. So if you used data scientists in the right way so what you do is you use your animal data, whatever you are finding, and ask a data scientist to do the data mining, data searching in the publically available data set in the patient base. In the other case you can immediately see, wow, if my finding has any clinical impact based on the existing data, for instance a certain mutation, do you see that mutation, the patient with that mutation, or that set of mutations, they survive longer or shorter. So you can immediately know whether that has any clinical impact. So your lab data plus the so-called publically available data, that’s number one.
Number two, in my opinion, is in our traditional cancer research we use so many traditional ways, we try the compounds, we try all the cell lines, we try to find 50% of cells were killed, what is the dose dot dot dot. Can we start with using the big data concept, let’s use existing data, publish the data, even to help us guide in the beginning of our experiment. So one is, let’s say, my lab have finished, do I have a clinical impact, you can use data scientists. Even in the very beginning can I use available data? Don’t underestimate cell line data, there are tonnes of cell line data available. Can we use existing information to help us to form my hypothesis and start my experiment?
Have you had any successes so far?
Yes, we have a lot of success stories. Honestly, today you can see PD-1, PD-L1, all the successful stories, those are all the marker-based results that can help us to really improve the patients’ survival time. But what I really want to say is that in the future we pretty much identify the so-called prevalence rate pretty much above 3% or even 1%, we identified that. What’s the next step, people will say, what can we do more about that? You can see now we are more focussed on particular positions like 17q deletion, whatever. But in addition to that I do think in the future we can introduce. What is the? What do we say is the? means I have a cancer, I have a control, I have of the SNPs here, I identify a set of SNPs that are different. are across all the diseases from ICD9. I have all the patients with all the diseases. Here I look at a certain SNP. Is there any disease that shares the same pattern of the SNP expression? For that case we can identify can drug-drug interaction help us to improve the survival. Then we will identify a certain drug applied to a different disease. That’s very, very interesting and the challenge in the future.
How will this data affect the process of patient diagnosis?
If you cannot interpret your results garbage. So I do think another challenge of the big data is how we interpret results. Think about this, oncologists may only have thirty seconds, a minute, ninety seconds before you receive this and then enter to the treatment rooms. So it’s a very short time period. So we do need a genetics counsellor, this kind of speciality, to help us to interpret the results. There are two ways that I can think about this. First of all, the informatician, they need to collect enough recently published results and then summarise on that and then we should help the genetics counsellor to summarise all this. The genetics counsellor uses their knowledge and then they interpret this to either help oncologists or to directly interact with the patient, that is the way. My opinion is after you finish the sequencing you should have the first paragraph tell the patients what genes he will mutate. Second is what is the latest finding about those genes, that mutated gene? The third is any clinical trials in your institution or, if you don’t have that in your institution, what other institutions have? I think the patients really care about that information so that is we should from three billion variables, that’s how big the sequencing data is, and then drop to about three paragraphs and that patient can really understand that.
What is your take home message?
This really is the time we really need to promote team science. So in the past we always thought, ‘This is my paper, this is my project, this is my protocol.’ I think it’s the time when you need to have truly the team science. Everybody plays their role in order to deliver this precision medicine. I think that I still want to emphasise one thing, that is medical research for the big data, especially data mining, is far behind the rest of the industry. For instance, if you go to Target every day you can see every night they change the distribution of their products based on data mining so they are so fast to respond to their data, they just do that. A lot of the other industrials work more aggressively than the medical field so we need to change the culture, that’s number one. Second, we need more respected team scientists including the institution, they should promote the people who are on the team but that may not be the PI. So I think all those things need to be changed.