Sanger Institute Hits 1 Terabase

2 Jul 2008

The Wellcome Trust Sanger Institute has sequenced the equivalent of 300 human genomes in just over six months. The Institute has just reached the staggering total of 1,000,000,000,000 letters of genetic code that will be read by researchers worldwide helping them to understand the role of genes in health and disease. Scientists will be able to answer questions unthinkable even a few years ago and human medical genetics will be transformed.

The amount of data is remarkable: every two minutes the Institute produces as much sequence as was deposited in the first five years of the international DNA sequence databases which started in 1982.

The Institute has major roles in projects such as The 1000 Genomes Project, The International Cancer Genome Consortium and the second round of the Wellcome Trust Case Control Consortium all of which will depend on DNA sequence to uncover genetics variants that are important for human disease. Next-generation sequencing is also enabling the Institute's own research portfolio.

The Sanger Institute's Cancer Genome Project team co-led by Professor Mike Stratton and Dr Andy Futreal has searched for genes that are mutated in common cancers for eight years. Until now that has meant a piecemeal approach focussing either on a few samples or only a few hundred regions from the genome. While this is a hugely successful method next-generation sequencing means that all genes and gene regions in many cancer samples can be looked at simultaneously. "We have already published results from a study of lung cancer samples that illustrate the complexity and diversity of cancer genomes and have obtained more data in six months than in the previous five years" explains Professor Stratton. "The advent of the next-generation sequencing technologies allows us now to search for all the types of somatic change in cancer genomes and to begin complete resequencing of whole cancer genomes acquiring full catalogues of somatic changes ultimately in thousands of cancers as a leading player in the International Cancer Genome Consortium."

Raw data is produced by the next-generation sequencing platforms at the Sanger Institute on a massive scale - more than 50 Terabytes of quality-filtered data per week currently. These data are being deposited in both local and international databases.