News

Lies, damned lies and statistics: SPYing on trial design

20 Jul 2016
Lies, damned lies and statistics: SPYing on trial design

Independent ecancer blogger Dr Bishal Gyawali rounds up the latest news in clinical trial design

The New England Journal of Medicine is currently swamped with papers on adaptive design of clinical trials, with recent papers including two clinical trials, one perspective, one review and one editorial. If you are like me, you must have had a difficult time understanding what was going on. Damned statistics!

No one can argue that we need better cancer drugs for our patients, and we need them ASAP. Clinical trials take lots of time to conduct and many end up with negative results. Hence, we need better-designed clinical trials that give rapid and hopefully positive results. Adaptive randomisation is one such design. In this design, multiple drug arms can be tested simultaneously against a common control.

Using Bayesian probability, the trial assigns patients to those drug arms that are more likely to benefit based on the specific biomarkers. Bayesian theorem calculates the current and future probabilities based on prior probability, i.e. it calculates the probability of success based on the past and ongoing success data. More simply, it predicts the probability of an outcome based on what is already known. Thus, the sample size and patients’ allocation to treatment arms keep changing as new results keep coming.

This approach looks very seductive for so many reasons: one, you can test multiple drugs simultaneously; two, patients have more chances of getting randomised to the drug that works for them; three, smaller sample size would suffice; four, it can better predict those drugs that are more likely to succeed at phase 3; five, you can change the sample size and even primary endpoint midway if needed. There can be few downsides to Bayesian design, right?

In fact, there can. Like every other statistical method, adaptive randomisation is just a statistical tool. It’s a means, not the end. A road, not the destination. Just like p-values that are used, abused and hacked, Bayesian design has the potential to be hijacked. In fact, compared to classic trial design, adaptive randomisation has more potential to be misused to hijack evidence-based medicine. Indeed, most of us clinicians without proper knowledge of statistics will have no other option than to uncritically believe the published results - reading between the lines in the “Methods” sections will be a Herculean task.

So the first lesson, at least for me, from this avalanche of papers on adaptive trials in NEJM was: "we doctors need to sharpen our statistical skills."

Statistics should be taught in more detail in the undergraduate and postgraduate medical courses. Without having a good knowledge of statistics, the future (and the present) doctors won’t be able to make informed judgements about any trial results, and will have to take the pharmaceutical reps at their words.

Now let’s look at the important I-SPY 2 Trial that incorporated this adaptive randomisation technique. This is a phase 2 trial of neoadjuvant approach in breast cancer, with a primary endpoint of pathological complete response rate (pCR). This trial has 13 arms: one control arm of standard neo-adjuvant chemotherapy with paclitaxel (and trastuzumab if HER2 positive) and other 12 arms with experimental agents added to paclitaxel for 12 weeks.

This is followed by cyclophosphamide plus doxorubicin for four further cycles. Out of these 12 different experimental arms, the results for two arms have been published: the veliparib-carboplatin arm and the neratinib arm. Patients are assigned to these different arms based on their biomarkers - a total of 10 “signatures” based on hormone receptor (HR), HER2 and a 70-gene assay status.


Veliparib-carboplatin was considered for three subsets of HER2 negative patients: HER2-, HR HER2- and HR- HER2-. HR-HER2- is also known as triple negative breast cancer (TNBC).  Of these, veliparib-carboplatin “graduated” in the TNBC subset, i.e., achieved the pre-specified efficacy threshold with 88% chance of success in a subsequent phase 3 trial of 300 patients. The estimated rates of pCR in TNBC subset was 51%(95% PI , 36%-66%) in veliparib-carboplatin cohort vs 26% (95% PI, 9%-43%) in the control cohort.

Neratinib was tested among patients with 10 different biomarker subtypes, and graduated only in patients with HR-, HER2 subtype: mean estimated pCR of 56% (95% PI, 37%-73%) v 33% (95% PI, 11%-54%). 
All fine and exciting, but... you knew there was going to be a but, didn’t you? In fact, there are many buts:

1. It is controversial whether pCR can serve as a surrogate endpoint in breast cancer. In fact, pooled analyses have failed to validate pCR as a surrogate endpoint for improved event-free or overall survival. So, does achieving higher pCR rates in I-SPY 2 trials imply clinical benefit to patients? We can’t be sure until we have data for survival. In addition, the phase 3 trials that have been planned based on these results from I-SPY 2 also have pCR as primary endpoint. Thus, even if these phase 3 trials turn out positive, we can’t be sure that we are improving our patients’ survival with this approach.

2. As discussed earlier, Bayesian probabilities are calculated and updated continually as the ongoing results arrive. Thus, it is important to understand that the rates of pCR mentioned above are not the true pCR rates, but estimated (probable) rates based on Bayesian theorem. You might have noticed that the rates have a range of probabilities-95% probability interval instead of the usual 95% confidence interval. In this longitudinal model, the pCR probability rates are being continually estimated and updated based on another surrogate marker for pCR, i.e. Magnetic Resonance Imaging (MRI) scans! We don’t know yet whether pCR is a reliable surrogate for survival, and here pCR itself is estimated based on reductions of tumour volume in MRI scans. A surrogate based on another surrogate. Both of these surrogates are still lacking validation.

3. The efficacy of carboplatin in TNBC has already been shown by a meta-analysis. Studies have shown that adding carboplatin to paclitaxel improves pCR in TNBC patients. Carboplatin is also known to be effective in patients with BRCA mutations. In this trial, 17% of patients in veliparib-carboplatin arm had BRCA mutations versus 5% in the control arm. Thus, the benefit of veliparib-carboplatin seen in I-SPY 2 could be the result of the known efficacy of carboplatin in this setting: the added benefit of veliparib is not known.

4. The probability that neratinib would succeed in a phase 3 trial is 79%. The pre-specified threshold of efficacy was defined at probability of success of at least 85% in a phase 3 trial. Still, neratinib “graduated”. So, why was neratinib given grace marks to help it graduate? As the authors mention in the discussion, neratinib had achieved the 85% score before all patients had completed neoadjuvant therapy and undergone surgery-of course, based on MRI predictions! But when all patients completed neoadjuvant treatment and true pCR results got updated, the updated probability of success was reduced to 79%.  This looks like another rociletinib story to me! In the rociletinib study, the true response rate was much lower than the reported rate because the responses observed at first glance were not confirmed in a later scan. In this neratinib trial, the MRI-based estimated probability was higher but as true pCR data were available, the updated pCR probability got diminished. Can we call this agent as having had “success”? Or did our continued SPYing find us another TIGER? Only time will tell.

5. Another concern would be conducting randomised phase 3 trials based on these results: will patients accept getting randomised to placebo arms when they are informed that there is a more than 85% chance that the investigational agent will perform better in the trial? Will physicians be comfortable asking patients to enrol in a trial where one agent clearly has more chances of success? What about the principle of clinical equipoise in clinical trial? This article on “Equipoise and the ethics of clinical research” from NEJM - published a month before I was born - still seems very relevant.  

6. Finally, will the results for other 10 experimental arms be published? I understand that it is an ongoing trial, but hope that the results for all the arms will be published, whether positive or negative. On that note, should the results of these two drug arms (or better all twelve arms) from the same trial have been published as a single paper?

But make no mistake - I am not against the adaptive randomisation trials. I think this design is a great improvement upon the classical trials. But like all excellent tools, I am worried about this design being misused and manipulated. Indeed, statistics are one of the best tools that can be used to deceive people: as the quote mistakenly attributed to Mark Twain says, "lies, damned lies and statistics!"

The I-SPY 2 trial is a great endeavour and the researchers should be applauded for their work. However, we should not forget our goal. Our goal is to help patients. We do that by either helping them live longer or live better. The Bayesian probability curves do not mean a thing to our patients - nor do the pCR rates.

Indeed, a look at the recent trials reminds us that we may have forgotten our destination and have become attached to the road instead. That’s why we often see trials with sub-standard control arms just to prove that the tested drug is superior. Have we forgotten that our target is to help the patients, and not to prove that the drug is better or to approve the drug? Sometimes these mean the same thing, sometimes they don’t.

Philippe Moreau and S. Vincent Rajkumar have provided some excellent suggestions on how clinical trials can really help advance multiple myeloma treatment for patients in a beautiful piece in the Lancet. You can easily extrapolate these recommendations to any other cancer.

Trials should focus on helping patients. If that objective is met, the tools used to get there don’t matter. I think I’ll let Dr. Lehman say it instead: “I’m not decrying it as an approach by any means, but it means that honest thinking, open data and collaborative working will be needed more than ever if medical science is not going to sink into a conceptual and statistical morass.”

 


Bishal Gyawali (MD) is an independent blogger. He is undergoing his postgraduate training in medical oncology at the Graduate School of Medicine, Nagoya University, Japan, where he is also a PhD candidate under the Japanese government scholarship. He also serves as visiting faculty at the department of Hemato-Oncology in Nobel Hospital, Kathamandu, Nepal. He graduated in medicine from Institute of Medicine, Tribhuwan University, Nepal in 2011 with seven gold medals for his academic excellence. He has been honoured with “Student of the Decade award” and “Best Student Award” for his academic excellence in Nepal. His areas of interest include evidence-based oncology practice, cost-effectiveness of cancer therapies and economic feasibility of cancer management in low-income countries. Dr Gyawali tweets at @oncology_bg. Dr Gyawali is an independent blogger and his views are not representative of ecancer.