Hot Topic: Clinical Applications of Next Generation Sequencing - Part 2
Click CC to turn on closed captioning.
Published: January 2014
Technological improvements in instrumentation and methodologies for next generation sequencing have changed the landscape of diagnosis and monitoring of disease. Mayo Clinic has implemented the first next generation sequencing test for identification of hereditary colorectal cancer with several other projects in development. Knowledge of an individual’s genome sequence provides information on life, health, risk factors, and also pharmacogenomic data to influence clinical decisions.
Presenter: David I. Smith, PhD
- Division of Experimental Pathology and Laboratory Medicine at Mayo Clinic in Rochester, Minnesota
Questions and Feedback
Questions regarding this Hot Topic can be submitted for the presenter using the evaluation form at the end of the presentation. All questions and answers will be published on the Q&A approximately one month after the Hot Topic posting.
For immediate questions, contact:
TranscriptDownload the PDF
Welcome to Mayo Medical Laboratory’s Hot Topics. These presentations provide short discussion of current topics and may be helpful to you in your practice. Our speaker for this program is Dr. David I. Smith, PhD, from the Division of Experimental Pathology and Laboratory Medicine at Mayo Clinic. In Part 2 of this 2-part series, Dr. Smith reviews next-generation sequencing and explains the clinical applications for this platform. Thank you, Heidi, for that kind introduction. There are no financial disclosures to make.
Next Generation Sequencing
In a previous Hot Topic recording, I talked about the technologies that led to Next Gen sequencing. This is the second presentation; and in this one, I’m going to be speaking about how these technologies can be utilized and how they will be utilized in clinical practice. To summarize what I discussed previously, next generation sequencing is a group of platforms that utilize massively parallel sequencing to generate gigabases of DNA sequence. Several platforms include the Illumina HiSeq 2000, which is capable of generating 600 Gb of DNA sequence on 2 flow cells in 2 weeks, which actually corresponds to 6 billion paired end reads. Another platform is the Ion Proton, which is capable of generating 10 Gb in 2 to 4 hours and produces 60 to 80 million reads, but there are also other sequencing platforms including Pacific Biosciences, Oxford Nanopore, NabSys, and GnuBio. But by and large, most of the Next Gen sequencing I’m going to be speaking about is being done on the Illumina or the Ion Proton platform.
Uses of Next Generation Sequencing
There are a variety of uses of next generation sequencing. The very first use and the one most people think about is whole-genome sequencing, abbreviated WGS. This requires 100 plus Gb of sequence generated from an individual plus a lot of bioinformatic analysis afterwards to assemble that genome. A simpler form of sequencing is known as whole exome sequencing or WES, and this one captures all the exons--there are 200,000 exons in the human genome corresponding to about 38 Mb--and this is sequenced. This is very easy relative to whole genome sequencing because you have to sequence so much less, hence there’s less sequencing and less post-sequencing analysis. There is also targeted sequencing where you capture your regions of interest and sequence them. This also has less sequencing and analysis, but there’s also transcriptome sequencing, otherwise abbreviated as RNAseq, and that is to determine what is actively being transcribed in the cells of interest.
Whole Genome Sequencing-WGS
Going through each of these individually, whole genome sequencing, or WGS, is started by creating a library from DNA isolated from the individual that you want to sequence. This can be DNA isolated from white blood cells, but it can be also DNA isolated from tissues. One produces mate-pair libraries because these provide linked coverage and facilitate mapping complex genomes that contain many repetitive sequences. Indeed, the human genome contains 45% of its sequences that are highly repetitive. You need at least 100 Gb of sequence to have sufficient coverage to produce an accurate sequence. If you are characterizing a cancer genome--cancer genomes which can be quite heterogeneous--you actually need to sequence even more. There is also--a lot of bioinformatic analysis is very intensive with whole genome sequencing.
Mate-Pair Genomic DNA Library
I am going to briefly describe how mate-pair genomic DNA libraries are actually produced. One isolates the DNA from the individual that you want to sequence, and then those pieces of DNA are fragmented into pieces that are smaller that are approximately 5 kb in length. The ends of those DNA pieces are attached with biotin, and then those pieces are circularized to produce a fragment that now has 2 biotin ends, which were originally 5 kb apart, now adjacent to each other. After the circularization, if that circularized molecule is fragmented, the various pieces that you generate are all different from each other, but one piece has two biotin-labeled molecules on it. If you select for that with streptavidin magnetic beads, you can simply purify away that tiny little fragment that contained 2 pieces that were originally 5 kb apart.
Mate-Pair Next Gen Sequencing
Those biotin-labeled fragments that are purified are then the source for next generation sequencing. Those pieces are put into the next generation sequencer, which then generates 50, 100, or 150 bases of sequence from both ends of that fragment. Now those ends were actually originally derived from 5 kb apart in the human genome. So after, when you are aligning those sequences that you’ve determined to the genome, what should happen is the 2 pieces should line up along one chromosome or another, and the 2 pieces should be about 5 kb apart. The exciting thing about this is that, by sequencing a small amount of DNA, you’re actually producing information about a 5 kb region; hence, less sequencing is required to characterize a region. In addition, these pieces which were 5 kb apart facilitate mapping, especially with a genome that contains a large number of repetitive sequences, which are very difficult to map.
WES or Targeted Sequencing
Now, as compared to whole genome sequencing, which requires over 100 Gb of sequence and a large amount of bioinformatic analysis, a much easier way to do it is WES, or whole exome sequencing, or even smaller than that—targeted sequencing. So, instead of sequencing the entire genome, why not just go after the specific areas of interest? Small groups of genes, for example, genes involved in some pathway or disease, or the entire exome (the 200,000 exomes that comprise the coding portion of our genome, which is only 38 Mb). But the very first question, of course, is how to capture your targets, and the answer is baits. Baits are simply sequences which are exactly homologous to the sequences you would like to capture. This is one of the strengths of DNA. Once you have the sequence of interest, it is very easy to design a sequence to capture what you would like.
This slide here depicts the way that this targeted capture is actually done. One synthesized on the top right a variety of biotinylated baits. These are DNA sequences that are complimentary to the sequences which you would like to capture. They are labeled with biotin . -- biotin because it binds very strongly to streptavidin--and this is the way you’re going to purify these baits out after they’ve reached out and gotten their homologous sequences. If you take these baits and you mix it with the genomic DNA sample that is shown on the top left, the genomic DNA is fractionated into pieces, and then the pieces have PCR primers on the ends of it so that after the selection procedure, when you have very small amounts of DNA, you can PCR amplify it up. Now what you do is you mix the baits with the sequences that have the primers on the ends of it. You hybridize those together and then after hybridization, you can purify the pieces that have hybridized to the baits by using streptavidin-coated magnetic beads. These bind very strongly to the biotin which is attached to the baits. The baits bind very strongly to the DNA sequences you would like to capture. And after a series of washes, you have a dramatic enrichment of just the sequences that you’re interested in characterizing in more detail.
Advantages of Targeted Sequencing
One of the advantages of targeted sequencing is there’s less to sequence and analyze, and that makes the post-sequencing analysis that much easier. You also get much greater depth of coverage of your targeted region; and because of this, you can actually determine rare variants that are present in what you have captured. So this is another strength to targeted sequencing.
Transcriptome sequencing is the sequencing of the RNA to determine what is actively being transcribed in your tissues of interest. Now, one of the problems with transcribed RNA is that the majority of RNA is ribosomal RNA, that’s over 90% of the total RNA; but there are various techniques that can be utilized to purify the transcripts that you want. Traditionally, the transcripts you’re interested in are polyadenylated transcripts. But in addition, if you want, there are actually techniques and technologies to hybridize, specifically, away the ribosomal sequences. But, it is interesting that there is much more to transcription than just the 20,000 protein-coding genes. Indeed, today, we believe that in addition to protein-coding genes, there may be up to 300,000 regulatory elements in the RNA. These are RNA molecules which are produced that do not code for protein, but they apparently regulate the expression of the genes.
First NGS Test at Mayo Clinic
Now, the very first Next Gen sequencing test at the Mayo Clinic is a select gene panel. It is called the Hereditary Colorectal Cancer Panel, and it’s a multi-gene panel. What they have done is they have chosen 22 genes that are potentially involved in hereditary colorectal cancer. They use the SureSelect to pull down the exons from these 22 genes; there are some 250 exons in these genes. They utilize the HiSeq 2000 to sequence the captured exons to obtain at least 100X sequence depth. That means that every nucleotide within those 22 genes is covered at least 100 times. Then Bioinformatic analysis and Work Bench is utilized to visual the results so the results can be obtained and then, in an understandable fashion, be reported back to the clinician and then back to the patient. The very first test which was developed at Mayo Clinic actually went live April 24, 2013.
There are other NGS tests being developed. For example, there is a test being worked on to detect mutations in mitochondrial genomes. We each contain 50 to 100 mitochondria in every cell in our body; and when mitochondria start to have mutations in them, what happens is you have mixed populations. There are some molecules that have mutations and some that don’t. It is very difficult to determine this with capillary electrophoresis, and it is very straight-forward to determine this with next generation sequencing. This is important because as certain individuals get older and have enough mutations in their mitochondria, there are a variety of different genetic diseases that arise from this; and this is a very nice assay to detect this where we could not do this before. In addition, there’s a cardiovascular gene panel. It’s a series of genes that are known to be involved with key cardiovascular diseases, and there are a whole series of additional gene panels in various stages of development. The only thing that is needed is the target genes that you’re interested in studying and then making baits for those genes, and then that panel can screen a large number of genes in one fell swoop.
Mate-Pair Sequencing: Replacement for Array CGH or Cytogenetics
It turns out that next generation sequencing can be used for a large number of things in the clinic. I’m going to give you a variety of other examples, but it turns out that there are many other examples that can also be used. For example, the mate-pair sequencing, which I briefly described where you sequence pieces that were originally 5 kb part, can actually be utilized for not just whole genome sequencing but can actually be used as a replacement for array CGH, which itself has recently replaced cytogenetic analysis. One does mate-pair sequencing; but instead of sequencing 100 Gb to cover the complete genome, you only sequence 10 Gb of the resulting libraries. The cost is less than one-tenth of the cost of sequencing a whole genome; and because of the mate-pair coverage, this produces a very high-resolution map across each genome—all the information that you obtain from array CGH but a much higher resolution, and it detects things that array CGH cannot detect. There are significant bioinformatic challenges to this, and there are also challenges to clinical implementation; but there is no question that in the next few years we’re going to see a transition from cytogenetics being done with array CGH to being done with next generation sequencing.
NGS and Pediatrics
Next generation sequencing is also making dramatic impacts on pediatrics. For example, the current success rate for determining the genetic cause of many pediatric abnormalities is low at approximately 10%; and there is frequently something known as the diagnostic odyssey. This is when you have a child with some type of genetic defect, and what clinicians do is they test 1 gene after in another in an attempt to find the gene that is wrong in that child. After you have done this for 2 to 3 genes, the total cost of that is in the thousands and many thousands of dollars. It turns out it might actually be much easier, instead of doing this on a gene-by-gene basis, to think about sequencing exome of an individual or sequencing the genome. Well it turns out that Baylor is starting to do this right now with whole exome sequencing, and the immediate success rate that they’re getting is up to 30% with this strategy; and the total cost of this is, at this point, not much more than screening 1 to 2genes on a gene-by-gene basis.
NGS and Cancer Care
Next generation sequencing is also having very dramatic impacts on cancer care. I’m going to give you just a couple of examples of this --several are commercial--and then some examples that are being utilized at the Mayo Clinic. For example, Genome Health, a company in California, utilizes next generation sequencing to do RNAseq on formalin-fixed paraffin-embedded breast tumors. They’re doing this because they have a gene expression signature that indicates which patients will be responsive to tamoxifen therapy. Currently, a woman given tamoxifen therapy--about 4% of the women given tamoxifen actually benefit from that therapy, a therapy that costs $40,000. With this test, you can enrich to find a higher proportion of women that are more likely to be responsive to that therapy. The test is called Oncotype DX, and Genomic Health is actually releasing tests -- similar tests --for who would be responsive to different chemotherapies for prostate and for colon cancer. Another company is Foundation Medicine that comes out of Boston. What they are utilizing is next generation sequencing after capture. They capture all the exons from over 400 cancer-related genes, and they sequence this from the tumors. What they look for is specific mutations --mutations which could actually be targeted by existing drugs--and the idea is that you could treat the tumors based upon the key genes that are mutated. And a number of places, including the Mayo Clinic, is starting to send their cancer patients’ tissue to Foundation Medicine to find this information to better devise therapies that are going to be more likely to benefit the patient.
Breast Cancer Genome Guded Therapy (BEAUTY)
At the Mayo Clinic, we have a number of efforts going in similar things. One of the most exciting ones is the Breast Cancer Genome Guided Therapy or the BEAUTY project. It’s a genome sequence guided adaptive trial. There are 2 phases to this trial. In phase I, they have proposed to sequence germline and tumor genomes from women with breast cancer, and identify novel mutations to potentially identify drug targets. They want to determine, however, the functional significance of the mutations that they detect; and utilizing this, they are creating xenografts, which is actually transplanting the tumor from the individual’s breast cancer into a mouse that is immunocompromised; and through those mice --those mice are becoming effectively avatars for the breast cancer in the woman. So based on the mutations that they detect that suggest certain drugs, those drugs could actually be given to the mouse tumor to see which drugs work against the tumor of the mouse’s, then the avatar, rather than having the patient being the guinea pig for the experimentation. In phase II, the genomic information gained from phase I is used to individualize breast cancer drug therapy of each individual patient. This shows the actual process of neoadjuvant therapy for women with invasive breast cancer; and through this, there are multiple points where tumor biopsies could be taken before treatment, after HER2 treatment, before and after surgery.
BEAUTY Tumor Biopsies
And the next thing shows all of the molecular analysis that are going to occur to the breast tumor. So for example, in addition to sequencing the exome, they’re also going to do RNAseq to see what has transcribed in the breast tumor. They’re going to sequence the exome. They’re going to do a SNP array and even analyze the methylation, another type of modification in the DNA. All of those together will begin to suggest what are the pathways that are altered in a specific breast cancer. The xenografts, which are the tumors now from the breast cancer growing in the mouse, can then be treated with the various drugs that are suggested by the Omix test at the top and then finding those drugs that give the best response. Then you can turn around and give those drugs to the woman, and you have a higher probability that the tumors are going to be treated with the drugs that you’re utilizing.
NGS and "Wellness"
In addition to all these things, there is a very nice model for how next generation sequence could actually be a part of normal wellness. Rather than thinking about this as a technology that’s utilized after disease has occurred, can you use this to monitor yourself during your normal life? And there’s a wonderful test case of this, and this is from Dr. Michael Snyder who is a professor at Stanford University. He wanted to show, with a willing subject, how all of these technologies together could guide life, and the willing subject became himself. What he did was he took DNA from himself and his mother (his father is no longer alive), and he sequenced the genomes of himself and his mother--his mother because, see, then he could tell about genes that he had potentially inherited and connects him to his family and the medical records from his family. Now what he wanted to also do in addition to having his full genome sequenced is he-- every 2 weeks--goes into the clinic and gets a blood draw. That blood draw is then examined every 2 weeks for RNAseq to see what is being transcribed. There is a whole proteomic panel. There are various panels to do metabolomics, and all of these things are done, and they monitor him as he goes through time. This produced really interesting insights. For example, even though he’s a healthy 58-year-old, he was able to detect a prediabetic condition developing within himself way before he would have seen it before; and he was able to treat it by exercising even more rather than going on various diabetic-type drugs. In addition, each time he would get a cold or a flu, the analysis that he was doing enabled them to determine the exact strain that he had, and they can continue to monitor him. So, this raises the point of—when will whole genome sequencing just be a routine part of everyone’s medical record? And the reality is that in the next couple of years, this is going to become more and more common.
Other Clinical Applications of NGS
There are other clinical applications of next generation sequence too. And indeed, in this very short talk, I can only give a few examples. But one very important one is metagenomics. It turns out that we house hundreds of bacterial and viral species inside of us. There’s actually 100 times more foreign bacteria in us than our own human cells, and you can actually use next generation sequence-to-sequence in the large number of different bacterial and viral genomes; and this actually can tell you a lot about your overall health and wellness. Now, of course, when will whole genome sequencing become a routine part of our medical record? Cost is actually a small issue. The larger issue is bioinformatics. All of this leads to true individualized medicine. Knowledge of an individual’s genome sequence provides information on life, health, risk factors, but also on pharmacogenetic data. By 1 sequence, we can determine what are the concentrations of drugs that are going to be most effective in your body that are not going to be toxic but are still going to be strong enough to do what they need to do. So all of this integrated genomic data can be applied, for example, in cancer to develop specific therapies for the treatment of that cancer and that patient, and this is one of the first areas where we’re going to see complete adoption of this technology to transform clinical care.
Thank you for your time.