Hot Topic

The Human Genome Project


Receive notification when new Hot Topics are published:

Click CC to turn on closed captioning.

Published: March 2009

Print Record of Viewing

Dr. Smith discusses how the human genome was sequenced and the resulting changes in genomic testing that will advance the detection and treatment of cancer.

Presenter: David I. Smith, PhD

Questions and Feedback

Contact us:



Welcome to Mayo Medical Laboratories' Hot Topics. These presentations provide short discussion of current topics and may be helpful to you in your practice.

Our presenter for this program is Dr. David I. Smith, PhD, from the Division of Experimental Pathology at Mayo Clinic. Dr. Smith will be discussing how the human genome was sequenced and the resulting changes in genomic testing that will advance the detection and treatment of cancer.

How Do We Obtain Genetic Information?

Every single one of us begins our life at the exact similar instance of egg being fertilized. At that moment, each of us has all of the genetic information we are ever going to have for our entire life. Three billion base pairs of DNA from our mother, 3 billion base pairs from our father and when that fertilized egg begins, the very first thing that happens is it divides to form 2 cells, to form 4 cells, 16 cells. And then at some point, those cells begin to differentiate from each other so that eventually we can have polar bodies formed and the cells can begin to differentiate into the different types. All of the genetic information from every single one of those processes had to have been present in the fertilized egg.

Cell Cross-Section

Each cell has multiple components within it. This is the structure of an average mammalian cell and what you can see is the various cellular components. There are over 1 million proteins that are present in the average cell and all of the genetic information for all of those components must have been present in the original fertilized egg. If that is not impressive enough…

Different Cell Types

That fertilized egg ultimately, by the time you were born, is going to differentiate into 200 different cell types. This is just an example at the bottom of some of those different cell types. All of them share the exact same genetic information. All of the information for all of that must have been present in the original fertilized egg. When there is damage that occurs to that genetic information, and the next slide shows a very good example of that…

What Happens When You Sit Outside in the Sun?

This is what happens when you sit outside in the sun. The sun actually causes damage to your DNA, produces thymidine dimers which is a type of mutation and that type of mutation can lead to something, like in this example.


Melanoma. So you see the mutations have actually caused alterations in that DNA and now the program of those cells is different. These cells are nonregulated and are beginning to grow out of control. From this simple analysis we begin to understand something very straightforward.

DNA is the Altered Target in Cancer Cells

And that is that DNA is the ultra target in cancer cells. Those agents that alter or damage DNA caused the cancer so for example, if you spend more time in the sun, you are giving your body the opportunity for DNA damage and there is a relationship between the amount of exposure you have and the probability you have for developing cancer. The other thing we begin to learn is that the more alterations that occur, the greater the chance there is for cancer formation but there are many questions that we do not know the answer for. For example, how many alterations cause cancer? Is it just a few alterations or is it a much larger number? Even when we talk about cancer of one specific type, how similar are those cancers? If you took 100 women with ovarian cancer, are we looking at the same entity in every single woman or is each cancer unique and distinct?

DNA Structure

This is the structure of DNA. It is a very simple structure of a double-stranded molecule that is comprised of 4 different building blocks: A, C, G, and T. It is the actual sequence of those, in the order across the 3 billion base pairs of your DNA that determines who you are; it determines what your risk is for various diseases. So that if we want to understand what is going on and study something that is occurring in cancer, we have to have some understanding of what is occurring in the DNA.

How to Tackle a Problem as Difficult as Cancer?

So how do we tackle a problem as difficult as cancer? The very first thing is to develop a catalog of all known genes. Once you have that, each of those genes can be looked at for mutations and other alterations that occur in cancer. And then when you find there are a whole subset of those genes that are altered, an obvious question is how many of those alterations are important in the process? That’s why it becomes necessary to test the genes that were altered for their functional effect. Realizing that it’s probably more than one gene or more than many genes, one starts to become interested in the systems biology which is the interplay of how genes interact. But the first step for all of this is to have a list of all the genes. This comes from the genomic sequence.

Sequencing DNA

Well to sequence DNA, and actually the sequencing method that was used for producing the human genome sequence in 1999 was the enzymatic chain termination procedure which is also called Sanger sequencing, for Sanger the scientist who developed it. It is necessary to first subclone the little pieces into bacteria so that you can propagate them and then sequence them. You have 2 choices: You can either direct your sequencing to specific regions or you can take all of the DNA and fragment it into small, overlapping pieces, hence random sequencing. The key for both, either directed or random, is to use computers to assemble all of the pieces together.

More on DNA Structure

This is the structure of DNA and it is important to realize several things about the structure. Because DNA is double-stranded, if you have one strand of the DNA, you actually have all the necessary information for generating the second strand of DNA. The second thing is that the structure of DNA facilitates many things including the polymerase chain reaction so that if you know the sequence of the DNA, you can actually construct primers to amplify any portion of it. The third thing that is very interesting about the structure is that if you have one strand and you are able to determine its sequence, you actually will have the sequence of the second strand of DNA.

Replicating a Strand of DNA

So taking advantage of this has actually helped DNA replicate. You have a template and a primer, which is a short stretch of sequence, and that dictates the nucleotides that are going to be added in. So, for example, with the very first available base which is a G, the only base which is capable of being put in there is cytosine or C. That base is put in, there is cleavage of a high energy phosphate bond that couples that with the open and available hydroxyl group. Once that is done and the C is in position, there is now an available hydroxyl group which enables the addition of the next base. That base is an A and that dictates that the next strand is a T. So, as you see, as you add on additional bases, the first available base was a G which can take a C, the next one is an A which can take a T, the next is an A which can take another T, and the last one is a T which can take an A. So by that sequence and by those additions, you can actually determine the sequence of the top strand. Now, the bases that are added in have an available hydroxyl group so that they are able to take the next base. But what Sanger realized was that if that hydroxyl group was changed to just hydrogen instead, no additional bases could be added to that growing chain.

Developing the Deoxy Chain Terminiation Sequence

And taking advantage of this is how the deoxy chain termination sequencing was developed. If one takes a very low concentration of a dead end inhibitor, a dideoxy A, and adds it to the reaction, perhaps 1:1000 is the ratio between the dideoxy-A and the normal deoxy-ATP, so that most of the time the base is added and that base enables an additional base to be added. But when the dideoxy base is added, that stops the chain from growing. If one has 4 separate reactions (a dideoxy A, a dideoxy G, a dideoxy C, and a dideoxy T) those separate reactions can be used to monitor the growing of the chain. What happens is that sometimes you will stop at the first available A, sometimes you will stop at the second, at the third, etc. and what happens is you end up generating a ladder. These bases were originally put in and they were radioactive so the growing chains were radioactive and the growing chains could be detected with autoradiography. So after the gel is run and a piece of radiographic film is superimposed on top of this and then developed, you can actually see the bands where the chain has terminated at the very first A, the next available G, and then you see 2 where C’s have been added. One literally reads up this ladder to determine the sequence; you can get a sequence of the growing chain from this and from that sequence you can infer the sequence of the original molecule that you started with.

Reading a DNA Sequence

When you have set up these reactions and everything is together, the next step is to put it into a very long, 2 meter long plate. This plate is extremely thin but the gel is poured inside of here, your samples are loaded on to the top of it.

Gel Electrophoresis

They are then added to a gel electrophoresis apparatus where you put a little bit of buffer on the top, a little bit on the bottom. You attach electrodes so that the positive electrode is towards the bottom. Since the DNA is negatively charged, it will be attracted to the positive electrode and the various DNA fragments that have synthesized will be migrating through the gel. And the only thing is that the fragments are able to traverse the gel more quickly than the larger fragments so that one can see the single base at the very bottom of the gel 2, 3, 4, etc. all the way up the gel and then one simply looks at where one is along the gel and one can determine the length of the growing chain. When this is done, you simply open the gel up.

Two DNA Sequences Seen in Gel Electrophoresis

You put autoradiographic film on top of it and then you develop it. What you see here is the sequence of 2 strands of DNA. It takes 4 lanes to see a single sequence. The other thing that you can see here is you get very good separation between the bands in the lower molecular weight range but as one moves up the gel, the bands get closer and closer together and this is one of the limitations of this technology. This technology can be used to read in to sequence maybe 800 to 900 bases but above that, the bases are simply too close together to read so one simply does this on a lot of different strands to assemble a DNA sequence. Now, Sanger did a number of very important contributions, one of them being Sanger sequencing.

Overlapping Pieces of DNA

But a second contribution was the realization that a very novel way to sequence DNA would simply be to fragment that DNA into a set of overlapping pieces. This is a representation of some of those overlapping pieces. So if one takes a small gene, in this case maybe 5000 base pairs and fragments that in to overlapping pieces, one can generate the complete sequence of that gene. It was realized a number of years ago that the strategy that works for a small stretch of DNA might actually work for something as large as the 3 million base pair genome. The idea is that the entire genome would be fractionated/fragmented and then the tiny pieces would be sequenced. The only limitation of this is that you actually have to sequence more than the length of the human genome because you have to cover it multiple times with these overlapping pieces.

Requirements to Sequence the Human Genome?

So the requirements to sequence the human genome with these technologies, and these technologies were state-of-the-art in the 1980s, is the realization that you can only read about 700 to 900 base pairs on a sequencing gel. And to sequence the 3 billion base pair genome randomly, you have to sequence it multiple times. Hence, you actually need to sequence some 25 to 30 billion base to do the genome sequence. While this would require, with traditional technologies, 111 million lanes and that’s equivalent to 10 to 15 million gels, if one applies 1000 graduate students working 24/7 on this, it will take them 10 years to complete and this would be at a cost of trillions of dollars. So clearly there has to be some technological advances to simplify this problem to make it more doable. So what was actually done was the very first thing…

Advances in Sanger Sequencing

…was to generate something that got you away from the radioactivity. That was the idea that rather than putting in the radioactive dideoxy nucleotide, that one would actually put a fluorescent tag on each of the dideoxy nucleotides, one for each color, and with the fluorescent dyes you have no need for radioactivity and autoradiograms. In addition, if you put a CCD camera at the bottom of the gel, this could monitor the sequences coming off the gel. So it’s the same strategy as the radioactive Sanger sequencing but now all 4 reactions can be put into a single lane so right away this requires one fourth the number of gels as a radioactive approach.

Sequencing with Fluorescent Dye

This is a result from a sequencing with the fluorescent dyes. One of the things that you can see is that this is a single lane on this and you can still see all 4 of the nucleotides together. This reads some 300 bases of a sequence and this enables the ability for this to be developed into a high throughput technology.

Advances in Fluorescent Sequencing

When this was first developed it was still done still with polyacrylamide gels. That would still require many gels; that would still require multiple millions of gels. But very quickly people switched that instead of gels they switched to long capillary tubes that the samples could be run through. These capillary tubes enable you to reuse them many times. You put a sample on, you run the sample through, you wash to remove anything residual, and you add an additional sample. Then you are not required to continually prepare new polyacrylamide gels and this sets you up for automation. Initially these were single capillary devices that would run one sample through, again one sample could do a complete sequence of one material but then very quickly they developed 6-capillary devices and eventually this led to a 96-capillary device.

Celera Genomics

If one goes 10 years ago and looks what was the most state-of-the-art sequencing machine that was an ABI 3700. This was a 96-capillary machine that enables you to actually do 96 sequences simultaneously. Three hours later one can put on another sample or set of samples on that and thus one could do 8 runs a day. What Celera Genomics did is they took advantage of this technology: Celera Genomics purchased 300 of these one-quarter of a million dollar machines and they used the strategy of whole genome shotgun sequencing. They assembled a supercomputer which was one of the fastest computers 10 years ago to assemble the sequences. They took the genome of the founder of Celera, Craig Venter, and generated his genome sequence.

Capacity: 96 Capillary Sequencing

The capacity of 96 capillary sequencing is that with a 3-hour run time, one can do 8 cycles a day. Each machine can run 960 sequences a day. Hence the 330 machines that Celera had were able to process 316,800 sequences with up to 1000 base pairs per sequence. Thus, that is 31,680,000 base pairs of sequencing per day and when this was run continuously for 3 to 4 months, they were able to generate the complete human genome sequence of the Craig Venter genome and basically do 25 gigabases of sequencing to completely cover his genome.

Computers and the Human Genome Project

One critical component of all of this was the need for sufficiently fast computers to do this. You needed a computer that was both quick and had sufficient memory and thus you needed sufficient RAM to assemble all the sequences and sufficient hard drive space to crunch and analyze the sequence and these computers were not developed until 1990. Thus, one of the fastest supercomputers of the day was utilized to assemble those sequences into the first human genome draft sequence but today computers are over 1,000 times faster with 500 times more memory. So a portable computer that you carry around under your arm has more sequencing power than the fastest computer on the planet from 10 years ago.

Where are We Today?

Where are we today? Well there have been many, many genomes that have been completely sequenced. E. coli was completely sequenced; the baker’s yeast was completely sequenced; C. elegans, the tiny worm; Drosophila melanogaster, the fruit fly; and Arabidopsis, which is a very nice non-flowering plant was sequenced. Today, thousands of bacterial genomes have been sequenced. Of course the Venter genome enabled us to have the human genome sequence and in addition, many different animals from around the planent: chimp, rhesus, dog, cat, cow, mouse, rat, opossum, chicken, and zebra fish have all been sequenced. Now, more genomes are coming online every day. The total cost for all of this for the human genome sequence in 1999 was over $200 million, and now, 12 years later, we’re actually at the dawn of a new era where the cost for assembling and doing a human genome sequence is approaching the $1,000 mark. This is extremely important because as long as the cost is very high, the ability to do this for many genomes is next to impossible. But once the cost of the human genome becomes $1,000, it enables a whole variety of clinical practices based upon human genome sequencing. And if it was some $200 million 12 years ago, and we are approaching $1,000 today, it will not stop there. The cost for sequencing a genome will become less and then we are facing a world that looks like science fiction today but is vast becoming reality where every child born will have their genome sequenced at birth and we can monitor them through their life.

What Have We Learned From Genome Sequences?

Well, what have we learned from human genome sequences? Surprisingly, only 2% of our genome actually codes for protein. In addition, about 45% of our genome is highly repetitive and we’re not quite sure what the function of that is. But that still leaves about 53% of the genome which is not coding for protein and is nonrepetitive. What is the purpose of this “junk”? Well, it turns out that if one compares the sequence between human and other organisms that are closely related to us at an evolutionary distance, you can find that there are many highly conserved regions and generally conservation is a very good indication of important function that must be conserved. Interestingly, not all of the conserved regions are genes. This suggests that maybe the “junk” is not junk at all but is important sequence that has other function than just coding for protein.

What Can We Do With Sequenced Genomes?

Well what can we do with sequenced genomes? Well the very first thing is a list of genes really provides a list of suspects. We discover that there are some 27,000 human genes. Well one of the first things we can do is we can analyze each gene for changes and alterations. One of the key things that the Human Genome Project did was to change the way we think about science and change the way we do science because instead of thinking about things on a gene-by-gene or sequence-by-sequence basis, you started to develop high throughput technologies so that you could look at things on a very large scale. So a couple of technologies were developed. One was a technology where one could measure the expression of all of the genes simultaneously and this is transcriptional profiling which I’ll describe shortly. In addition, one could begin to characterize all the proteins. Remember that the 27,000 human genes actually end up encoding information for over a million different proteins. So there’s a much greater complexity at the level of the proteins in the cell but there are technologies to characterize all the proteins in the cell and this is called proteomics.

Transcriptional Profiling (TP)

So first… Transcriptional profiling are technologies that can be used to simultaneously measure the messenger RNA concentrations for many or all expressed genes. It’s very important if we’re talking about cancer that one is analyzing samples which are homogeneous. If you take a sample which is heterogeneous, which is a mixture of different populations of cells and use this technology, you will get a number average of what is occurring in that mixed population and you really won’t have an indication of what’s occurring in specific cells. A very important other component of this is that you need a suitable normal control. If you have a cancer and you have the precursor cells, hopefully from the same individual before cancer, one can say what are the alterations? The advantage of transcriptional profiling is the fact that one is actually using the messenger RNA concentrations as a surrogate for what is occurring in terms of production. It’s a much simpler way to do this but there are some important limitations. One of the limitations is the fact that messenger RNA could make more than one protein and I’ll discuss that a little bit later. Another important thing is that in addition to the samples being homogeneous, it’s important to have a suitable normal control for comparison. If we are comparing a cancer in one person to a normal in another person, are they actually the same or not?

Different Technologies to Produce Microarrays

Well, there are various technologies that can be used to produce tiny microarrays and a microarray is just a small matrix upon which one generates probes to look for the measurement of the transcript from the various genes and three different technologies that are currently in use. The first one was called photolithography; this was based on the same technology that’s used in producing computer chips but instead of actually etching silicone circuits on a micro-circuit what you use instead is light and synthesis of oligonucleotides and Affymetrix utilized this to synthesize millions of oligonucleotides which were 25 nucleotides long which could hybridize to individual transcripts. A second technology is laser jet printers and this was utilized by Agilent; Agilent was a company that came out of Hewlett-Packard and they used the Hewlett-Packard laser jet printers which normally contain four different colored dyes. But if those dyes were replaced instead with a solution of A, a solution of C, a solution of G, or a solution of T, then one can actually synthesize up to 2 million oligonucleotides that are considerably longer, between 100 and 200 nucleotides in length. The third process is that same technology that is used in some rear projections televisions and that’s called digital light processors. A digital light processor is a tiny control area that has over one million tiny mirrors on it. These mirrors can be tuned to whatever angle you want and use digital light processors instead of making beautiful pictures on a large screen, high definition TV set to actually synthesize oligos too. And the company Nimblegen has utilized this to synthesize up to 2 million 100-200 mers. So with either of these technologies you’ve produced a tiny little array and this array can then be hybridized with the samples that you want.

Utilizing Microarrays to Measure Gene Expression

So to utilize the microarrays to measure gene expression, the very first step is to isolate the RNA from the samples you wish to measure, for example, tumor samples and matched normal samples. That RNA needs to be converted into a fluorescently labeled DNA which can then be hybridized to the array. After you hybridize the DNA to the arrays, one washes away nonspecific hybridization, takes a scanner and scans for the fluorescent intensities. If there has been lots of hybridization to a specific set of nucleotides, there must be a lot of that message corresponding to that hybridization; hence, more fluorescence means more message for a gene. One compares a tumor with a matched normal and one can, thus, identify genes that have an increased or decreased expression in that cancer versus the normal. One can also get a “signature” for gene expression. If one takes a series of cancers where there is a known outcome, the group of patients for example who responded to the chemotherapy and the cancer was eradicated versus a group of patients where you’ve used chemotherapy and it did not work at all, one can look to see if there is a signature that tells you whether or not certain types of expression are correlated with certain clinical phenotypes. And then one can also compare different tumors for relatedness.

Hyrbidization to an Affymetrix Array

When one does that, one generates basically a very nice pretty picture. This is an example of what the hybridization looks like to an Affymetrix array but it’s very similar for the other 2 platforms too. You see little sections, little squares, where there are oligonucleotides that are all the same sequence and in this bottom corner, you can see two squares where there has been considerable hybridization and hence, when one shines a laser light on it, you are going to see a considerable amount of signal. Right next to it are 2 additional squares where there has been very little hybridization and one does not see very much signal at all, and one takes a scan of the entire slide. This slide actually contains 500,000 of these tiny little forests of nucleotides and one from that can infer the expression of all 27,000 genes. One then takes the expression that one sees in a matched group of normal and tumor samples and then says “What has been occurring?”

Gene Expression Comparison Between Samples

And basically if you look for gene expression comparison, you can cluster different specimens as being more or less similar by the genes that have higher expression and lower expression. And one generates something which is called heat-maps. In a heat-map, one assigns an artificial color to different regions: lots of hybridization might be a very red color, very little hybridization might be a green color, and then one clusters samples together to see if they are similar or not and then one can compare things like patient survival, response to treatment, and other parameters.

Gene Expression Map

When one does that, one generates a map like this and you see several things. You see colors on the heat-map and again, the scale on the right tells you the amount of hybridization: red is a lot of hybridization, green is very little hybridization. Then the top is where the various samples are clustered together and you can see relatedness so that you can see color patterns that seem to show that there are different groupings and then clustering at the top shows you how your samples are. So for example, the extreme samples on the left might correspond to your normal samples and then all the samples on the other arm of this tree correspond to tumor samples. With these tumor samples, you can see they are clustering to 3 distinct groups and they are different by gene expression. Those 3 distinct groups may also be distinct in terms of their clinical parameters. So this changes the paradigm for how one treats cancer because instead of getting a patient in, doing a standard set of treatments, one gets a patient in, does a signature first and that signature tells you something about the clinical behavior of the sample and that can inform the clinicians as to better strategies for treating specific cancers depending on which of these groupings the cancer falls into. These are groupings you may not be able to see with pathology or other existing technologies so the resolution of this technology is very high.

Proteomic-Based Strategies

In addition to sort of the microarray technologies, there are also proteomic-based strategies. One of the problems and limitations of working with messenger RNA is one is using the messenger RNA as the surrogate for the amount of protein which is produced. Sometimes that is a very good correlation: the more messenger RNA you have, the more protein which is produced. But that is not always the case. In addition, one messenger RNA may produce more than one protein and when you do this determination and you see a messenger RNA has gone up, that does not mean that all of the different isoforms of the protein have also gone up. So high resolution proteomic analysis is much better at actually looking at the amounts of the individual proteins and its strength is that the million proteins in the cell really tells you what’s occurring in the cell. But an important limitation is we do not yet have the technology to examine all million proteins in a cell. At best we can look at just a few thousand. However, this is an ideal strategy for the identification of certain things like early markers for cancer and in many different cancers, having a better strategy to detect that cancer at an earlier stage means that we are going to be able to treat them as those cancers are more treatable.

Example of a Single Gene

Now, again, one of the problems is that here is a very good example of a single gene. The gene is shown with the exons being in blue but that one gene is not making one transcript. Actually, alternate splicing enables you to make multiple distinct transcripts from that same gene and in this instance this gene could be making 50 different proteins. If we were trying to measure what was going on with microarrays, it is very difficult to tell which of these isoforms are produced. If we go to the protein itself, and we discover that one of these proteins is an important marker for how a patient is going to do, we could possibly detect it.


Now, proteomics is the study of the proteins produced within the cell. It complements genomics and transcriptional profiling but has many more orders of complexity because of the large number of proteins that are produced in a cell. Traditionally, this is performed with some type of protein purification followed by mass spectrometry which is technologies that can measure the mass of the peptides which are produced.

How Do We Quantify Proteins?

Well, how does one quantify proteins? Classic proteomics is performed with 2-dimensional gel electrophoresis. In this instance one does 2 different types of separations. The first one is based on the molecular weight of proteins so a tiny tube is run and your samples are separated so the low molecular weight fragments run to the bottom, high molecular weight fragments run to the top. A single tube is then placed into a slab and the slab things are run based on their pH so that the high pH things run to the right, the low pH things to the left. By this 2-dimensional separation, one can pull apart the proteins and what you see here are two samples: a normal state and a diseased state. In the little square you can see just one area where one can look to see what is distinct and what is different between the 2. At the arrows you see 2 positions where there is 1 level of protein produced in the normal sample and a distinct level in the tumor sample. The other thing that you need to see from this is there are only a couple of thousand spots which are discernible on this gel. Hence, the vast majority of the proteins present in these cells is not seen with this technology at all.

Differentiate Between Control and Disease State

So if we look into that little small area at the difference between the control and the disease, you can see that there are two spots there. One spot is a small level of expression in the control but its expression is greatly increased in the diseased. The second one is hardly detectable at all with the control but there is much greater expression with the disease. On the bottom is sort of a scan of this so that you can see both of these proteins have increased expression in the disease. This increased expression could be indicative of the clinical behavior of the tumor but it could also be used for the detection of the cancer at a very early stage because if one finds that these proteins which are produced are actually secreted into the bloodstream, they can be detected in the bloodstream for an early detection.

Mass Spectrometry

Mass spectrometry. Then you cut the spots out from the 2-dimensional gel, you take the protein that is present in that spot and you fragment it into peptides, and then the peptides can be run on different types of mass spectrometers. All of them resolve different peptides on the basis of their molecular weight and then the molecular weight can be utilized to infer the precise amino acid sequence of the peptide.

Electrospray Ionization FT-ICR Mass Spectrometer

So one of the state-of-the-art machines in mass spectrometry is the Electrospray Ionization FT-ICR Mass Spectrometer which is shown here. Your sample can be put on the left, and then the different peptides of different size move through this. The key thing is a very large magnet, which currently is a 12 tesla magnet, that resolves the different proteins based upon their molecular weight. Then you have a detector at the other end which is detecting the peptides coming off.

LC-ESI-TOF vs LC-FT-ICR Mass Spectrometry

Then if one compares this, if one goes back 10 years, time-of-flight spectrometry was a state-of-the-art machine for doing that and the two top curves show a normal sample and a cancer sample to show the different peptides that are coming off. You can see that there are indeed some peaks that are there and there are indeed some differences between the cancer and the normal. But on the bottom one sees the resolution that comes with an FT-ICR mass spectrometer where there is less noise and the individual peptides can be seen as distinct, sharp bands. With this, one can determine a much higher resolution what are the peptides that are different between your normal and tumor sample. Once it has been detected in this way, those individual peptides can be detected very quickly. It is not necessary to look at all the proteins within a cell but just those important proteins that indicative of disease or indicative of behavior.

What's the Short-term Payoff?

So what’s the short-term payoff? The short-term payoff is we’ll have very powerful markers for the early detection of cancer. Take a cancer like ovarian cancer which is called the “silent killer of women.” The reason it is the silent killer of women is that the vast majority of women are detected when this cancer has spread outside of the ovaries. If we had better markers, protein markers which are produced in cancer cells which could be detected with mass spectrometry, we could detect ovarian cancer at a much earlier stage. In addition we are going to start seeing signatures that will tell us something about the clinical behavior of tumors. This is an aggressive cancer based upon the signature that we’ve seen. Perhaps we need an alternate type of therapy versus this is a cancer that is going to be very well treated by this chemotherapeutic agent. But in addition we are going to determine important genes involved in the development of cancer and as we find important genes, we are going to start to find the Achilles heel of the cancer cells. A new era of treatments are going to be based on novel technologies that aren’t just killing the most aggressively growing cells. Chemotherapy is an effective way for treating many cancers but it’s a poison that kills growing cells. If we understand the important genes that are altered in cancer, maybe we can target those genes and have therapies that are less harsh on the patients and more specific to the cancer.

What's the Long-term Payoff?

What are we going to see in the long-term? We are going to see increased survival of cancer patients. We are already starting to see this for a number of cancers, and a very good example of this is the early detection of cervical cancer with the Pap smear. Prior to the Pap smear, most women would be detected with a much more advanced cancer and cervical cancer used to be and still is a significant killer of women in under-developed countries. So we are going to see increased survival of cancer patients as they are detected earlier. We are also going to have a better understanding of the underlying biology of cancer development. In addition as we start to see what are the genes that are altered in cancer, we’ll have targets to selectively attack just the cancer cell. The first generation of drugs based upon this is already coming into clinical practice. For example if you have overexpression of the HER2/neu gene in breast cancer, there are specific drugs that can target that and for those types of cancers those drugs are effective. We’re also going to have better prevention strategies. As we understand what are the targets in cancer, we’ll be able to understand ways to inhibit just specifically those cancers, and we are also going to have genetically targeted treatment, treatment which is targeted specifically to that specific cancer.

Diagram of Pathways Involved in Steroid Metabolism

Now, even though we have impressive technology, this is a diagram of just a few of the pathways that are involved in the metabolism of some of the steroids and if we were to superimpose on this all the pathways that go into a cell, it would be next to impossible to draw a cartoon to show these things. So we have impressive technologies but we’re utilizing them to try to understand a very complex organism and a very complex alteration of that organism that gives rise to cancer. But with these types of technologies in the next couple of years we are going to see profound changes in the way we understand cancer and the way we treat cancer.