click to hide

Section 1: Basic Concepts in Cancer Genetics


Chapter 21.1: Gene Expression Profiling in Cancer

TECHNIQUES FOR EXPRESSION PROFILING

Many techniques have been developed to find those transcripts whose expression level changes between two samples. The first techniques to be widely used to find differentially expressed transcripts were subtractive hybridization and differential display methods. Both could identify transcripts but do not have the same capacity to assay multiple samples like DNA arrays, nor do they provide an in-depth transcriptome characterization of sequencing-based techniques. For this reason, DNA arrays and SAGE are currently the most widely used for transcript profiling of malignant cells. This is, however, a rapidly evolving field. The overview of the features of common RNA profiling techniques (Table 21.1-1) will likely require significant updating in the not so distant future.

Table 21.1-1: Key Features of Common RNA Profiling Techniques


Differential DisplayDNA ArraysSAGE

Basis of assaycDNA fragments compared on gelHybridization to spotted DNASequence ligated cDNA tags
Detection methodElectrophoresis of labeled fragmentsOptical imaging of hybridization signalAutomated sequencer
Gene identificationExcise band and sequenceDNA probes preidentifiedMatch SAGE tag to database(s)
Transcript quantificationComparison of band intensitiesAnalogue fluorescent signal from DNA spotDigital counts of SAGE tags
Probe requirementsStarting RNA onlyRequires set of arrayed DNA probesStarting RNA only.
Starting amounts of RNA (approx.)>5 μg of total RNA>5 μg of total RNA>1 μg of total RNA
Number of RNA samples that can be processed per monthFewManyFew
Number of genes assayed per sampleFewEqual to the number of genes on the arrayMost all genes expressed above the detection limit
Sensitivity of transcript detectionHigher levels easier to detect~10 mRNA copies/cellDependent on number of tags sequenced



Subtraction Methods

Various methods have been derived to find transcripts that are differentially expressed between two different cell populations.6 Subtractive hybridization is used to produce a cDNA library that has sequences that are present in one sample of RNA, but not another.78 A typical example is to subtract tumor mRNA from normal mRNA (or vice versa) to find transcripts that may have been deleted or amplified in the process of tumor formation. The general approach is to hybridize in solution the two samples (normally cDNA) that are to be subtracted. An excess of one sample (the “driver”) hybridizes to most all the unwanted common sequences from the other sample (the “tester” or “tracer”). Typically the driver is labeled in such a way that molecules containing one or both strands in common with the driver are removed or otherwise not cloned. The remaining cDNA consisting mostly of tracer can be cloned to form a library for further analysis such as sequencing.


Other subtractive methods used to find differentially expressed genes include suppression subtractive hybridization9 and representation difference analysis (RDA).10 These newer techniques incorporate polymerase chain reaction (PCR) amplification steps in order to work from smaller quantities of starting material. RDA is an effective way to compare two sets of DNA by hybridization and subtraction, frequently either genomic DNA or cDNA. Overall, the subtractive techniques have been used to locate many important cancer-related genes, but these approaches necessitate a pair-wise analysis of samples and a time-consuming cloning step that make them unsuitable for automated high-throughput gene expression profiling.



Differential Display

In 1992, differential display was described as a method to locate differentially expressed transcripts.1112 Differential display works by first producing a set of cDNA fragments that have been identically prepared from each RNA sample, usually based on restriction enzyme digestion of the cDNA or by producing PCR products with arbitrary primers. Next, the fragments are resolved on a gel, producing a characteristic pattern of bands for each sample. The bands from each sample are compared to reveal those bands that differ in intensity between lanes. The cDNA fragment within the band can be excised for further analysis.


The advantage of differential display is that it is performed by one person using equipment available in most molecular biology laboratories. The disadvantage of this technique is that in order to identify most genes they must be excised and sequenced—requiring significant labor for the gene identification step. Also, the technique can be prone to false positives that arise from various factors, including PCR-induced amplification biases. Although there are many successful variations of the differential display approach,13 differential display approaches do not allow for rapid and efficient identification of expression levels en mass that makes it suitable as a transcript “profiling” technique. It has, however, been very useful for identifying differentially expressed genes.



DNA Arrays

A method to detect nucleic acids of a specific sequence supported by a solid surface was developed over 25 years ago by Edwin M. Southern.14 In 1992, cDNA fragments were arrayed on a solid surface in large numbers and used for parallel gene expression profiling.5 The idea of large-scale transcript profiling captured the imagination of scientists starting in the mid 1990s, when methods were used to miniaturize DNA arrays;3415 introducing “chip technology” to biological research. DNA arrays have enormous potential and have an implicit promise that a reliable, low-cost, and standardized format for gene expression profiling will eventually be available to cancer and other researchers. By means of introduction, this section describes basic concepts and readers interested in applying chip technology should consult relevant publications16 and Web sites (Table 21.1-2).

Table 21.1-2: Human Transcript Profiling Databases and Resources

Web SiteURLDescription
cDNA Library sequencing
Body Mapbodymap.ims.u-tokyo.ac.jpExpression resources for normal tissues based on cDNA library sequencing.
CGAP cDNA xProfilercgap.nci.nih.gov/CGAP/Tissue s/x ProfilerExpression between pools ol cDNA libraries can be compared based on extensive database.
DNA Arrays
Affymetrixwww.affymetrix.comVendor of expression chips, and other products for expression profilling via DNA arrays.
Brown Lab Homepagecmgm.stanford.edu/pbrownContains useful information on custom cDNA arrays.
Developmental Therapeuticswww.dtp.nci.nih.govMicroarray and drug response data for NCI 60 cell lines.
Genexpress - CNRSidefix.upr420.vjf.cnrs.fr/EXPR/Expression profile of 5,058 human genes by cDNA array.
Microarray Projectwww.nhgri.nih.gov/DIR/Microarray/Protocols, descriptions and resources for cDNA Microarray technology from the NHGRI.
Molecular Oncology and Developmentchroma.mbt.washington.edu/mod_www/Protocols and links for DNA arrays from Hood lab at University of Washington.
Molecular Pattern Recognitionwaldo.wi.mit.edu/MPRProtocols, links, software and downloads for DNA arrays from Whitehead/MIT.
UCSD Array Sciencearray.ucsd.eduInformation and Bioinformatics Tools for expression information.
SAGE
SAGEmapwww.ncbi.nlm.nih.gov/SAGELarge RNA expression database from CGAP based on SAGE profiles of malignant and normal cells.
SAGEnet (Johns Hopkins)www.sagenet.orgSAGE database, protocols, references and links.
Genzyme Molecular Oncologywww.genzyme.com/sage/welcome.htmSAGE information and applications for commercial users of the technology.
Other
Cancer Genome Anatomy Project (CGAP)cgap.nci.nih.gov/CGAP homepage with links to expression databases and cancer research resources.
Digital Gene Expression Displayer (DGED)cgap.nci.nih.gov/CGAP/Tissues/GXSCGAP tool that compares gene expression between pools of SAGE and/or cDNA libraries.
Gene Expression Omnibus (GEO)www.ncbi.nlm.nih.gov/geoNCBI repository and comparison interface for all types of expression data.
Tissue Microarray Projectwww.nhgri.nih.gov/DIR/CGB/TMAProtocols and information on tissue microarrays from the NHGRI.



cDNA Arrays.

There are many variations of DNA arrays, but they can be viewed in two groups: those that array a fragment of cDNA and those that array a shorter synthetic DNA oligonucleotide. Arraying cDNAs on a membrane for hybridization with a labeled sample was the first DNA array approach (filter arrays), and it is still widely used today.523 Typically, hundreds to thousands of cDNA fragments are amplified by PCR and spotted densely onto a membrane. An RNA or cDNA test sample is then radioactively labeled and hybridized to the targets on the membrane. Expression levels are accessed by the signal intensity produced by the amount of radioactivity hybridized to each probe on the membrane. Several molecular biology companies sell membranes to researchers for use in their studies and services for doing the hybridization and/or analyzing the results.


The technology of cDNA arrays took another leap forward when researchers at Stanford University started spotting cDNA onto glass slides at densities much greater than what could be achieved with nylon membranes.4 The introduction of cDNA “microarrays” has opened the minds of scientists to the possibility that gene expression patterns could be routinely measured.24 Robotics are employed for making these arrays that can reproducibly spot well over 5000 cDNAs on a single slide.25 An additional advancement is the use of two-color hybridization (Fig. 21.1-1). Two different-colored fluorescent probes, typically red and green, are made from the test and control sample and hybridized to the same array. Each spot on the array is measured in terms of the expression ratio between probes, rather than an absolute level of expression. This approach helps to normalize array-to-array variations in hybridization or printing and provides a more accurate means of comparing expression between chips. However, this approach does result in the loss of the absolute expression levels since a ratio is being measured. A variety of commercial enterprises make and cell cDNA arrays or services. Additionally most universities have, or are, developing some type of service that provides cDNA arrays technology to investigators.

Fig. 21.1-1
Approach for expression profiling using a two-color cDNA chip. A. Two RNA samples are converted to cDNA and are labeled with different florescent dyes; the tumor sample is labeled red and the normal reference is labeled green. The labeled cDNA is hybridized to the DNA on...



Oligonucleotide Arrays.

Oligonucleotides built on a glass support by photolithography and phosphoramidite DNA synthesis chemistry are commonly known as “DNA Chips.”1526 This process builds a chip for DNA analysis in a method that is analogous to the mass production of semiconductor chips for the electronics industry. DNA Microchip technology has been developed and commercialized primarily by Affymetrix Corporation. Normally about 20 different oligonucleotides of approximately 20 base pairs (bp) in length are used to represent each gene on mRNA expression chips. The oligonucleotide sequences that represent a particular gene are chosen carefully using algorithms that have been designed to minimize cross-hybridization between different genes. After hybridization with a specially prepared and fluorescently labeled cDNA probe, the chip is read using a laser scanner. Currently, Affymetrix has produced a series of chips that will cover up to a total of 12,000 different known genes plus 48,000 sequences derived from expressed sequence tag (EST) clusters. Thus far, oligonucleotide chips have delivered a more standardized product than the cDNA spotted chips, along with a preoptimized and working infrastructure for array analysis, but at a cost. Although costs for chips have declined, they are still considerably more than glass slide systems.


Oligonucleotides for use in DNA arrays are not limited to Affymetrix chips. Longer oligonucleotides, typically more than 50 bp, can be arrayed by robotic spotting onto glass slides and used in place of PCR fragments amplified from a cDNA template. The choice of arrayed material and support is usually based on what is locally available. It is expected that as the technology continues to evolve, market competition will produce one or more dominant DNA array technologies that will deliver reliability and convenience at a reasonable cost.



cDNA Library Sequencing

Large-scale sequencing of cDNA libraries was first proposed as a rapid means to access transcribed regions from the human genome.27 Random transcribed sequences generated by cDNA library sequencing are known as expressed sequence tags (ESTs). The Merck/Washington University EST project made one of the first large-scale efforts to disseminate EST sequence data.28 The Cancer Genome Anatomy Project (CGAP)29 succeeded this effort with its Tumor Gene Index, contributing more than one million ESTs from normal, premalignant, and malignant cells. The data from these projects has greatly reduced the time and effort necessary for many gene-cloning projects, but also serves to reveal which tissues express which transcripts.


Counting transcripts by EST sequencing is a very accurate way of accessing the fractional representation of each transcript, but it is a very expensive and laborious approach. Consequently, expression levels derived from EST data are normally derived from the large public EST sequencing projects. EST-based expression data can be accessed from many of these projects via the World Wide Web as described in the Bioinformatics section below (see Table 21.1-2 for Web sites). The main advantage of this data is that it is free and easily accessed. The main disadvantages are that the individual experimenter cannot practically generate his own EST data and that the level of detection is low, because often only a few thousand transcripts are assayed for each tissue or cell type, out of the tens of thousands expressed. One must also keep in mind that cDNA libraries used to generate EST data are frequently normalized or subtracted, and that data derived from these libraries can only reveal the presence of a transcript and not quantitative expression levels.



Serial Analysis of Gene Expression

SAGE was first developed in 1995,32 as a means for efficient counting of mRNA transcripts in large numbers.33 SAGE increases the number of genes that can be counted per sequencing reaction, as compared to cDNA library sequencing, by minimizing the portion of the transcript sequenced. The method (Fig. 21.1-2) works by cloning and sequencing a 10-bp portion of the cDNA at a defined position near the 3′ end of the transcript. These 10 base pairs, normally next to the last Nla III restriction site, are known as the transcript “tags.” The transcript tags from a particular RNA sample are linked together and are cloned into a sequencing vector forming a SAGE library. Automated sequencing then produces tag sequences rapidly in large numbers by the sequencing of many clones simultaneously. Typically, more than 50,000 transcript tags can be counted, with about 2000 sequencing reactions. Although sequencing costs increase proportionally with the number of tags assayed, automated sequencing has increased in efficiency and speed. The SAGE transcript profile from various types of cells can be archived on a computer database and electronically compared to find statistically significant differences in gene expression between cell types. The gene responsible for the differentially expressed tag is identified using informatics or, in rare instances, cloned using the tag sequence. The majority of tags can be matched to a list of possibilities extracted from transcript databases such as the cDNA portion of GenBank,34 the EST clusters forming the NCBI UniGene database,39 and coding sequence extracted from the human genome sequence.

Fig. 21.1-2
Approach for expression profiling using SAGE. Gene expression is quantified in a population of cells by isolating a transcript tag from the expressed genes. These tags are paired into ditags, ligated to form concatamers, and cloned into a sequencing vector for efficient counting on an automated sequencer. Tag counts from each tissue type are stored electronically and used for comparison to other cell populations. A relative fraction of each transcript can be calculated as well. Informatics are used to match the SAGE tag to a known gene or expressed sequence tag.


Because SAGE counts transcripts by sequencing and avoids the errors inherent in hybridization-based assays, it is often regarded as a very accurate means for expression profiling. SAGE transcript levels are expressed as a fraction of the total transcripts counted, not relative to another experiment or a housekeeping gene, avoiding error-prone normalization between experiments. The absolute nature of SAGE data makes cumulative data sets useful and historical comparisons valid.3940 An additional strength of SAGE is that it determines expression levels directly from an RNA sample. It is not necessary to have a gene-specific fragment of DNA arrayed to assay each gene. This allows SAGE to identify genes that are not included in an array,41 and avoids the infrastructure necessary to create and read large DNA arrays. This flexibility has a downside. The number of samples that can be processed using SAGE is small as compared to DNA arrays because it takes 2 weeks or more of skilled labor to construct a SAGE library. The potential to analyze hundreds of samples by SAGE for a single experiment is not a practical option for the technology in its present form. However, when an in-depth and quantitative profile is desired for a small number of samples, the extra work involved in creating a SAGE library can be justified. To date, SAGE has been successful for determining the differentially expressed transcripts in well-controlled experimental systems.41 This type of data generated by SAGE is often complementary to a typical use of DNA arrays in cancer research for a wide survey of many patient tumor samples.



Follow-up Techniques

After a gene expression profile has been obtained on a set of RNA samples, it is desirable to experimentally confirm the expression differences and to extend the analysis to other samples. Normally, a small set of interesting genes is identified by using DNA arrays or SAGE, but several different techniques are more effiicient for assaying this smaller set of interesting genes. In addition, each gene expression technique has inherent errors and an independent method is required for validating the original expression levels.


Northern blotting has been the gold standard for gene expression analysis for many years. Because the transcript being assayed is identified by both molecular weight and by a long hybridization probe, there is normally a low error rate. Although northern blotting is a time-consuming approach, it is still a useful way to confirm profiling data for a limited number of genes.49 When a good antibody is available for the gene of interest, a western blot or immunohistochemistry are reliable methods for confirming expression changes. This approach is advantageous, particularly when the end point is knowledge of protein levels rather than mRNA levels.


Real-time PCR, sometimes called quantitative or fluorescent PCR, has gained popularity for rapid follow-up and confirmation of profiling data.5051 Expression determination by real-time PCR is based on continuous fluorescent monitoring of PCR products52 from a cDNA template. Under the right conditions, the number of cycles required to PCR amplify a product to a certain level is directly proportional to the amount as starting template. Different real-time PCR systems are available from at least four molecular biology vendors. Each of these systems has software for plotting and analyzing fluorescent-labeled PCR products' accumulation for the determination of starting concentration. Normally a serially diluted known sample is used for a standard curve to interpolate concentrations of unknown samples.


There are a variety of methods for detecting the accumulation of PCR products during real-time PCR. A simple method is to incorporate a fluorescent dye directly into the PCR product during amplification. A double-stranded DNA binding dye, SYBR green I (Molecular Probes, Eugene, OR), is effective for this purpose.5253 To increase specificity of PCR product detection, additional oligonucleotide can be employed in the assays that hybridize to an internal portion of the PCR product. There are a variety of systems for this purpose marketed by different vendors: TaqMan Assay (PE Biosystems), Hybridization Probes (Roche), and Molecular Beacons (Stratagene). Real-time PCR allows for a quick and low-cost assessment of the expression pattern of several genes in many tumors and can be automated. It is becoming a popular method for the follow-up of profiling data.


To look at protein levels of many samples simultaneously a tissue microarray system has been developed.55 This system allows for up to 1000 small tissue samples, made from a narrow gauge biopsy needle, to be arrayed in a single block of tissue. This block of tissue can then be used to produce hundreds of slides that can be probed by immunohistochemistry or other means. In this way, a standard set of the same samples can be probed for expression levels for many different genes. A digital imaging system is used to record and read the data. Although, robotics are now employed to array the tissues, many good quality samples must be collected and oriented for biopsy in the region of interest oriented by a pathologist. The results must also be scored in some fashion by signal intensity, done manually at this point in the technology's development. Finally, a good antibody is needed for each gene of interest that will work in the normally available formalin-fixed tissue. However, this approach has the potential to make gene expression correlations with a vast archive of preserved tumor material.



/
4

Bookmark
 
Email To Friend
 
Fields marked with an asterisk * are mandatory.

My Comments
 
Send an update suggestion or comment to the corresponding editor of this chapter.
Fields marked with an asterisk * are mandatory.
*Your Name:
*Your Email:
   To:
   Subject:
* Message:
 
   
Printer Friendly Pages