Regon Scientific Journal |
Volume 105, Issue 1, Article 3 |
Improved Methodology for Affymetrix GeneChip Sample Processing and Analysis
Hristem M. Dyanov1,2 , Miglena M. Petkova3, Peter Domer4 and Richard Quigg1
Key Words: GeneChip arrays, microarray methodology, hybridization, reverse-transcription, transcription.
1 Section of Nephrology, Dept. of Medicine, The University of Chicago, Chicago, IL 60637, USA.
2 Regon Molecular Systems, Inc., Touhy Dental Center , 7238 W. Touhy Street , Chicago , Illinois 60630 .
3 Department of Pathology, The University of Chicago , Chicago , IL 60637 , USA .
The author to whom the correspondence should be addressed:
Hristem M. Dyanov, Ph.D., Senior Research Professional, Director, Gene Arrays and Expression Laboratory,
The University of Chicago, Dept. of Medicine, Section of Nephrology, MC5100, 5841 S. Maryland Ave., Room S-511,
Chicago , IL 60637 ; E-mail: cdyanov@medicine.bsd.uchicago.edu ; Phone/Fax: (773) 834-5289.
Current Address : Regon Molecular Systems, Inc., Touhy Dental Center , 7238 W. Touhy Street , Chicago , Illinois 60630 ;
Phone: (773) 282 8020; FAX: (773) 282-4292; E-mails: dyanov@regon-inc.com ; dyanov_regon@yahoo.com
Manuscript recieved on 16 December 2004, 12.51; Published 17 December 2004
ABSTRACT
Affymetrix GeneChip hybridization system has become widely used standard for expression genomics investigations because it offers both high reproducibility and sensitivity when analyzing extremely large number of genes simultaneously. Nevertheless, because processing expenses are quite high and detection limits are often obstacle, it presents irresistible challenges for modifying standard protocols in achieving better target quality and higher detection sensitivity at lower cost. We have modified and optimized more than 80% of the original methodology and created a set of customized protocols and kits. These lead to approximately 50% savings in processing reagents and time while significantly increasing enzymatic reaction yields (5-10-fold) and minimizing detection limits (10-100-fold). Every single modification was carefully monitored and compared to the original Affymetrix-recommended protocol before being adopted as a standard procedure - only when proven to deliver significant (at least 2-fold) advantage over the originally proposed. Original reaction conditions and reagent compositions allowed for dramatic increase in reverse transcription yield (more than 2-10-fold, when starting with total RNA amounts understanding of 1 microgram and below and increased IVT reaction yield after a single-round linear amplification. Double RT-IVT-reaction amplification procedure allowed micro-array applications with staring total RNA amounts much below 100 ng or with direct Laser-Capture-Microdissection RNA-isolates from below 10,000 cells reaching as far as 1,000 cells and, possibly, below.
Soon after the messenger RNA discovery in 1960's and the understanding of the genetic code and the theory of genetic regulation of protein synthesis, the first attempts at global surveys of gene expression were undertaken in the 1970s. First studies involved radioactive hybridization experiments and the discovery of highly-abundant house-keeping genes, latter classified into several classes of structural, functional, cell-type-specific and other-types genes. In the late-1990s, gene expression studies had generated large data sufficiency based upon comprehensive cDNA-libraries surveys on entire genomes and high advances in technology development. Different-type large-scale DNA sequencing projects resulted in creation broad number of sequence-related databases such as GenBank and others.
DNA microarrays are the latest biotechnology technique to take advantage of two major structural features of the DNA double helix - the sequence complementarity and affinity of its two-paired strands to form hydrogen-bond mediated duplexes. In the recent years a turning point in the gene investigation research was reached by having the ability to simultaneously investigate and publish data on thousands of genes rather than publishing articles referring to single genes. These types of large-scale study of gene expression (Dyanov H.M. 1994; Dyanov H.M. 2004; Chee M. 1996; Milosavljevic A. 1996; Eickhoff H. 2000) , also called “parallel approaches” for gene expression investigation are the basis for the transition from "structural" to "functional genomics". The parallel approaches for gene expression are based upon hybridization to DNA fragments of different size (from 8-12-mer oligonucleotides to 1.5-2.5 kb PCR-fragments) immobilized on solid support (usually plastic, glass slides or silica wafers) – so called macro- and micro-arrays or, shortly, "chips" (termed also as "probe arrays"). In the yearly stages of development radioactive detection methods were applied, while in the recent years several original approaches for non-radioactive hybridization-target preparation (mostly fluorescent- or metal particle-based) become applicable for gene expression analyses on microarrays.
There are two common types of DNA microarrays. The first contains DNA, which (usually in the form of a cDNA, full-length ORF, PCR product or pre-synthesized oligonucleotide) is post-synthetically attached to a glass support and the second contains DNA (in the form of a single stranded oligonucleotide) synthesized in situ . The first process typically requires full-length ORFs or cDNAs to be post-synthetically printed on coated microscope slides. This type of microarray is used for the analysis of gene expression, but has certain limitations. The first limitation is that ORFs are extremely variable in length and Tm, making any thermodynamic-based comparison between two genes on the array virtually impossible. Because of this limitation, experiments are designed using a dual fluorescent label approach, in which the cDNA prepared from the control RNA is labeled with one fluorophore, and the cDNA synthesized from the experimental RNA is labeled with a distinct contrasting fluorophore. Both labeled cDNAs (or RNAs) are hybridized to the same array, and the results are tallied by comparing the ratios of fluorescent emissions of both fluorophore-containing cDNAs. The second, and probably more important limitation of the cDNA-based microarrays, is the issue of cross-hybridization of sequence-related or overlapping genes. Different genes may encode common domains and can have some degree of sequence identity or homology with proteins from other genes - most organisms contain a number of genes in gene families, and in many cases these genes have a great degree of sequence identity. These genes can only be distinguished from each other by the design and use of gene-specific hybridization probes. Third limitation is that many organisms have overlapping genes found on complementary strands of the DNA duplex; in many cases a 100% sequence identity over a distance of 100 or more nucleotides has been found comparing GenBank Ô sequence data. The expression levels of these genes cannot be accurately measured using full- or partial-length ORF-based microarrays. Lastly, many organisms employ alternative RNA splicing of genes in response to differentiation and other signals, and these alternative forms of gene expression cannot be distinguished on ORF-based arrays.
The second type microarrays is useful for expression analysis, which requires the utilization of many, relatively short oligonucleotides, for representing a single gene. The length of these synthesized oligonucleotides is limited usually by the relatively poor coupling efficiency of in situ DNA synthesis and by denaturation-renaturation thermodynamic considerations affecting hybridization specificity discrimination of probe-target duplex formation. During the past ten years for microarray preparation have been used oligonucleotides of a size ranging from 10-12 bases up to 80 bases. It is commonly known, that the size of the oligonucleotides affects the hybridization duplex stability, gene detection specificity, frequency of distribution within a gene and among genome, and, most important, single-nucleotide miss-match discrimination ability under particular hybridization conditions. Generally, the shorter the oligonucleotide is, the more significant is the impact of a single mismatch on the duplex stability, respectively the hybridization discrimination is high; with increasing the size of the oligonucleotide this impact decreases as well as the discrimination ability. Moreover, if within a long oligonucleotide, only a 25-35-base duplex hybrid is formed with it's complementary target sequence, the stability of the duplex (G+C-content dependant) is strong enough even at high hybridization temperatures to make the rest of the oligonucleotide length (sequence) irrelevant for the duplex stability (under the particular hybridization conditions). Taking in mind this “conflict of interests” between the desire to have higher sequence specificity (longer oligonucleotide) and better hybridization mismatch discrimination (or, higher hybridization specificity ) (shorter oligonucleotide) as well as some other practical considerations in microarray preparation, different groups have implemented their decisions resulting in number of DNA-microarray choices on the biotechnology market.
Affymetrix and its research team is one of the pioneers in the microarray design and preparation (Fodor S.P. 1991; Lipshutz R.J. 1995) along with some others (Bains W. 1988; Drmanac R. 1989; Khrapko K. R. 1989; Craig A.G. 1990; Lennon G.G. 1991; Pevzner P. 1991; Markos U. 1992; Drmanac 1993; Broude S. D. 1994; Dyanov, H.D. 1994 and 2004; Milosavljevic A. 1996) . The Affymetrix GeneChip Ò hybridization system has become in the past years a widely used standard for expression genomics investigations because it offers both high reproducibility and sensitivity when analyzing extremely large number of genes simultaneously. Nevertheless, because processing expenses are quite high and detection limits are often obstacle, it presents irresistible challenges for modifying standard protocols in achieving better target quality and higher detection sensitivity at lower cost. In this paper we are presenting large number of modifications of the standard GeneChip expression analysis protocol adopted in our experimental practice after investigating many products and experimental conditions in order to achieve better experimental performance at lower cost.
MATERIALS AND METHODS
Nucleic acids purification
For RNA isolation, tissue should be flash-frozen in liquid nitrogen immediately upon dissection from the organism in order to minimize RNA degradation. Frozen tissue should be stored preferably under liquid nitrogen or dry ice; it also could be stored in freezer at -80 ° C after initial nitrogen freezing. RNA was prepared preferably by TRIzol Ô method according to the protocol with one additional phenol-chloroform and one additional chloroform extraction steps. For reducing the tRNA amount, an additional and final RNA precipitation step in 4M LiCl (without alcohol) overnight could be recommended. Column-based purification of RNA was performed as described by the manufacturers; more details are available in the text. RNA was aliquoted in 50-100 µg portions (or appropriately less for small-scale isolations) and long-term stored under alcohol at -60 – -80 ° C or short-term stored (up to 1-3 MO) as DEPC-treated aqua solution; no more than 5-6 freeze-tow cycles were tolerated per each aliquot due to possible increase of RNAzyme effects at high concentrations. Reverse-transcription, IVT- and cRNA fragmentation reactions were performed according to the GeneChip protocol or with modifications as described in the text. cDNA was purified by phenol-chloroform/PhaseLock gel extraction and cRNA was purified by different columns or 3-4 M LiCl aqua solution. Nucleic acids were precipitated by ammonium-acetate/80% alcohol (or LiCl) in the presence of at least 1 m µg of carrier per each 0.1 ml nucleic acid solution; we recommend addition of up to 10 µg of glycogen-based carrier per any amount of nucleic acid below 10 µg dissolved in 0.1-1.0 ml of aqua solution. If LiCl precipitation is performed without addition of alcohol no color-less glycogen is needed, because it precipitates purely in LiCl aqua solution; nevertheless, if an excess of dye-carrying glycogen (like pellet-paint) is added, its colored traces in the precipitate enhance the RNA-pellet visualization. LiCl precipitation is highly recommended for RNA isolation from glycogen-rich tissues where glycogen excess removal is desired.
RNA Precipitation in the presence of 4 M LiCl ( final concentration )
RNA precipitation in 0.5-4 M LiCl aqua solution is the only known method for quantitative and qualitative purification of total RNA (Walterscheid J.) . To a 40-80 µl of IVT-reaction mixture were added 500 µl of 4 M LiCl and mixed. RNA was precipitated by incubation from 1 hour (if cRNA amount ³ 20 µg) to overnight (if cRNA amount £ 20 µg) at -20°C. Pellet spin-down RNA in microcentrifuge at maximum speed for 30-60min (g-force dependent) and the supernatant was very carefully discarded by reversing the tube upside-down (reversing slow but complete without moving-back). Optimizing the centrifugation time with the g-force is essential for the effective cRNA recovery; the RNA precipitate is a gel-alike, almost invisible, and because of this removing any liquid left on the tube bottom & walls is not recommended. If any lost in RNA quantity is observed, centrifugation time and/or speed should be increased. Pellet was washed once with 1 ml of 80% ethanol and sedimented for 5-10 min at maximum speed. After removing the ethanol (RNA pellet is now visible) RNA was dried for 10-15 min at 37-42°C on heating block (the tube entry and cap was covered with sterile tissue supported by rubber ring). RNA was dissolved and stored in DEPC-treated water or 0.5 mM EDTA as follows: when starting with 0.1 µg of total RNA with single-step IVT amplification, dissolved in 20 µl of H 2 0 (super-extended IVT reaction; expected cRNA yield is 15-30 µg); when starting with 1 µg of total RNA, dissolved in 40 µl of H 2 0 (extended IVT reaction; expected cRNA yield is 70-150 µg); when starting with 10 µg of total RNA dissolved in 80-100 µl of H 2 0 (standard IVT reaction; expected cRNA yield is 150-300 µg).
Enzymatic reactions
All enzymatic reactions were performed according to the GeneChip Ò Expression Analysis Technical Manual without or with modifications as described in the text below. According to our work scheme, when starting with 5 µg of total RNA or more, cDNA product was resuspended in a total volume of 12 µl (divided as: 1 µl for electrophoresis, 1 µl for Agilent Bioanalyzer analysis), 5 µl for IVT template, 5 µl for storage); when starting with 1 µg of total RNA or less, cDNA was resuspended in 10 µl volume and all was used for IVT template. cRNA was resuspended in 20-100 µl depending upon the yield amount and appropriate portion amount was fragmented for array hybridization.
Data Analysis
Generally, microarrays were scanned as recommended by Affymetrix at the lower voltage setting with double-scan setting and the analyses were performed by scaling at Target Intensity value of 500 (Normalization – 1) against “All Probe Sets”. All other parameters were as default. Data sorting was performed in the Microarray Suite ver.5.0 followed by transfer into MS-Excel for additional calculation and graphical presentation.
RESULTS AND DISCUSSION
RNA purification
The RNA purification is probably the most critical stage of any microarray related investigation. Moreover, to ensure well preserved biological material for RNA isolation is often extremely challenging. It is well known, that the mRNA half-life in vivo is regulated by its poly-A tail size and other signaling, and normally ranges from 2 to 20 minutes, but mRNAs with a half-life as short as 20 seconds has been observed. Recent microarray expression experiments (including our own) has detected that even extremely short-time alterations in the oxygen supply, temperature and other life-related factors can immediately influence the mRNA expression levels and the longer the alterations is, the bigger the effects is due to the mechanisms of cascade regulation of gene expression. The logical conclusion would be, that the optimal time for fixing any biological material with stopping any enzymatic activities to ensure a maximum expression-level preservation would be from 0.5 to 5 minutes from the moment any of the life- or normal function supporting factors have been altered. Nevertheless, to ensure such a quick manipulation is often impossible; for example, in the case of surgical or biopsy invasion, it is very difficult to ensure freezing the sample in liquid nitrogen or dry ice within seconds after blood supply alteration. On the other hand, while efficient RNA recovery without altering the mRNAs representation from a well-preserved source material is not critically important for a detection-type microarray experiment, it is essential for any type of gene-expression-level pattern investigations. Depending upon the desired expression investigation goal, different levels of RNA loss and mRNA-type miss-representation could be tolerated ranging from 50% to 5%.
We tested the performance of several methods for RNA purification with different starting amounts of cells or tissues containing from 2x 10 3 to 10 6 cells. We tested the TRIZol method, QIAGENE's RNeasy Mini Kit (cat. # 74104), Stratagene's RT-PCR Miniprep Kit (Cat. # 400800) , ARCTURUS PicoPure™ RNA Isolation Kit (Cat. # KIT0202) and PROMEGA's Wizard DNA purification resin applied by us for RNA purification. All of our tests were performed in at least 3 independent purification experiment and some of our test results are presented on figures 1A and 1B . Other methods for RNA isolation were investigated before (Dyanov H.M. 1995) .

We found that all kits and protocols tested did perform within their limitation as described by the manufacturers, although two main problems were discovered with most commercial kits – low reproducibility when recovering RNA from limited number of cells (below 2x 103 cells) and systematic loss of certain-size RNA fractions (short-size or long-size RNAs); which has to be considered when designing microarray analysis study. For RNA isolation starting from 2x 103up to 5x 106cells or corresponding amounts of tissues most reproducible results were obtained by the TRIZol™ Reagent method within the limitation as discussed in earlier articles (Dyanov H.M. 1995) . Almost surprisingly, RNA isolation and/or purification using PROMEGA's Wizard DNA purification resin (combined with DNase digest when needed for DNA removal) when RNA amount is within the range from 1 µg up to 100 µg per 1 ml of resin was very successful; no more than 10% of the isolation experiments demonstrated different level of reproducibility problems in the RNA amount recovered (as similarly experienced for DNA purification utilizing the resin; data not shown).
QIAGENE's RNeasy Mini Kit demonstrated best performance in RNA isolation from 104 and down to 102 cells within limitation discussed below ( figure #2 ). All other kits failed to show reproducible RNA recovery when starting with less than 103cells. We recommend the QIAGENE's RNeasy Mini Kit also for RNA isolation from laser capture microdissection (LCM); the only disadvantage of this kit we discovered was the size-dependent degree of loss of the short RNAs (below 300 bases)( figure 1 ). We do not have reasonable explanation why the short-RNA fraction loss is less significant when mRNA ladder is processed ( figure 2 ) and much significant for native RNA isolates ( figure 1A ), but the last fact ( figure 1A ) has been routinely detected for mouse tissues, human cell lines and bacterial cells by tree independent investigators from tree independent labs in the University of Chicago. A speculation of possible ribozyme activities existing in the samples (but absent in the RNA ladder) and triggered by- or on the column could be the only explanation for our observation. Both Stratagene and ARCTURUS kits described showed lack of practically acceptable reproducibility when isolating RNA from LCM samples corresponding to 103 cells or less. Since the Arcturus kit demonstrated best performance in direct RNA recovery experiment ( figure 1B ), we suggest, that the problems experienced when isolating RNA from LCM samples are due to problems extracting RNA from fixed tissue; perhaps improving LCM tissue disruption could lead to better yield reproducibility. Lately, a novel Lysis Buffer substitute for all of the above mentioned kits could be recommended provided by Regon Molecular Systems, Inc. (LysisMAX buffer, cat. # L101). A significant problem discovered in the STRATAGENE's kit performance was the size-dependent degree of RNA recovery loss concerning large-size RNAs (above 5 kb) exceeding 50% for RNAs larger than 7-9 kb. The maximum-RNA-load limitation observed within the range of high reproducibility for the QIAGEN's RNeasy Mini Kit was 20 µg and for the STRATAGENE's RT-PCR Miniprep Kit was 40 µg – for both the maximum load described by their manufacturers was 100 µg. We found that both the amount- and the reproducibility of the RNA recovery from the columns was improved when 2-3 minute incubation with the elution buffer was performed instead of less and a two-step elution (20-50 µl each) instead of only one is highly recommended.

RNA quality and quantity estimation
Because of the requirement to achieve highest possible RNA quality for gene expression analysis, the approaches for RNA quality validation become an essential issue. From many years, the standard procedures for RNA quantity and quality estimation were the spectrophotometry and RNA electrophoresis. During the past 2-3 years many investigators involved in microarray data analysis had observed pure microarray hybridization result obtained from RNA samples validated as “excellent” via both of the above-mentioned methods. In our practice we also found that estimating the RNA quantity and quality by spectrophotometry is very sensitive to the instrument conditions and maintenance as well as the presence of aromatic-ring- (or else-) containing substances, such as NTPs genomic-DNA contaminants or phenols). After re-evaluating the concentration of many RNA samples (supplied to us from outside sources with spectrophotometrycally pre-estimated concentrations), we found the electrophoresis to be practically more reliable and reproducible method for quantitative and qualitative RNA evaluation than the spectrophotometry data - compromising with the quantitation precision up to some extend.
On the other hand, very-low levels of exonuclease RNA degradation (which is impossible to detect by the above means) could destroy the entire expression analysis experiment altering the Reverse-transriptase ability to read from the 3'-end of mRNAs with digested poly-A tails. At last, a new methodology was introduced based on the Agilent Bioanalyzer for quantitative and qualitative analysis of RNA utilizing micro-capillary electrophoresis technique. The technology offers almost all desirable – low detection limits and relatively high reproducibility in qualitative and quantitative RNA evaluation. The technology offers almost all desirable – low detection limits and relatively high reproducibility in qualitative and quantitative RNA evaluation. For practical re-assurance we adopted and recommend evaluating every RNA sample by the all three methods (were possible).
GeneChip Reagent composition and replacement
The entire microarray-based experiment includes large number of reagents and components many of which are available only at noticeable cost. Their contribution to the overall cost of the microarray experiment is also significant, reaching up to 30% of the sample preparation expenses (without including RT and IVT reagents). In our practice we were able to replace up to 80% of all brand-name reagents recommended in the Affymetrix GeneChip protocol with custom-prepared ones (refer to the Materials and methods section or contact authors for detail information) and the final result was up to 10 times expense reduction for a particular reagent. The experiment outcome result of every single reagent replacement was precisely compared to the originally used by monitoring its influence during the entire sample-preparation and array hybridization process (data not shown). Adopted as a standard were only those modifications, which offered at least 50% increase of reaction yield- and quality- or time/efforts decrease with no other influence on the microarray result; most replacements offering more than twice improvement ( Table 1 ).

The reaction products yielded according to our standard Affymetrix GeneChip Ò protocol are shown on figure 3A . Our first observation during the course of utilization of the Affymetrix technology was that nucleic acid loses generated during the purification stages could be equal or grater than those resulting from enzymatic reaction inefficiency or de-optimization. This focused our attention on improving all purification/precipitation steps especially because of the intention further to improve the technology implementation for much lower starting total-RNA amounts, possibly down to 0.1 µg and lower. Generally, replacing the column purification with salt/alcohol precipitation and improving the precipitation carrier addition achieved yield increase of twice or more without much time delays, since this way all sample concentration steps were omitted. We found that increasing and/or replacing the precipitation carrier with glycogen-based co-precipitant as much as 10-20 µg per reaction (in a volume up to 1 ml) highly improved nucleic acid recovery even at relatively short precipitation times (down to 1 hour) without negative influence on the following enzymatic reactions ( figure 3B ). On the other hand, LiCl precipitation of cRNA had become a preferred choice (as described below and shown in Table2 ).

RT Reaction Results
Reverse-transcription (RT) reaction is one the most critical steps in any microarray-based gene expression investigation experiment, because any inhibition in the RT priming or synthesis reaction could dramatically miss-lead the experimental data interpretation by generating artificial results. In the routine microarray-processing practice, the rate of RT reaction failures could be overall as high as 50% due to different-level reaction inhibition as a result of multi-component performance errors or RNA source reaction inhibition; this level is actually very difficult to estimate without significant experimental and calculation efforts. In all cases, improving the quality and reducing the cost of the RT reaction is a very valuable task. For the GeneChip expression investigation experiments, Affymetrix protocol recommends the utilization of Invitrogen's RT reaction kits (for example, SuperScript Ô Double-stranded cDNA synthesis kit; Invitrogen cat. # 11917-010). Applying these or some others commercial kits generates an approximate single-reaction cost of about $ 60. After testing large amount of reagents and suppliers we mastered a custom kit replacing almost all of the Invitrogen's reagents except the SuperSript II RTase; by purchasing bulk reagents in large quantities we were able to decrease the cost of a single RT reaction down to approximately $ 5 with increasing the RT yield twice. Replacing the Invitrogen-brand RT buffers with custom-prepared not only decreased the cuts from approximately $2 per reaction to approximately 2 cents per reaction, but also was a starting base for customizing the entire RT reaction ( Figure 4 ). After testing all RT reaction components supplied from different vendors by comparing the results to those obtained with original Invitrogen reagents we fount than almost all suppliers offer high performance without much variations in the final cDNA yield (data not shown) leaving a freedom of choice based upon economical criteria. Nevertheless, we recommend the Invitrogen's SuperSript II RTase because of its almost unique ability to yield preferably full-size transcripts of a size from 0.26 to 10 kb instead of shortened fragments as a result of RT interruption ( figure 4 ). Otherwise, NEBiolabs enzymes offered best performance at lowest price. We found that the most critical step for the RT reaction outcome is the improvement of RT primer annealing. We found that 2 min primer annealing (as recommended in the GeneChip Ò Protocol) is inefficient to explore RT of the entire mRNA population since the annealing time of 20 minutes and above increases the cDNA yield at least 2-3 times. Based on our investigation we modified the Affymetrix GeneChip Ò protocol and recommend further for any type of microarray gene expression study with total RNA starting amount of 10 µg and below to be used as follows: 200-300 pmoles of RT primer instead of 100 pmoles (especially critical when total RNA is less than 1µg)( figure 5-C , position 1 to 4); 100 U of SuperScript RTase per each 5 µg of total RNA instead of 100U per each 8 µg of total RNA (but not less than 100U of RTase if the total RNA is less than 1 µg); primer annealing of at least 20 min prior to RTase addition – in cases of RT of total RNA amounts below 1 µg we recommend annealing of 40 minutes and increased First-strand synthesis for up to 2 hours instead of 1 hour. In all cases we found better RT reaction performance in reaction volumes of 20 µl compared to 10 µl probably due to certain inhibitory effects as those known for glycerol (enzyme storage buffer stabilizing component) final concentrations exceeding 5% or less.


IVT Reaction Results:
A very important component of the microarray-based gene expression experiment is the In vitro transcription (IVT) reaction; its efficiency dramatically impacts the hybridization outcome when starting with low amounts of total RNA. We tested IVT reaction kits from different vendors (data not shown) and found that Enzo Diagnostics IVT reaction kit recommended in the Affymetrix GeneChip protocol yields at least twice more cRNA than other competitors. Nevertheless, several inconveniences are associated with its utilization for the linear amplification of RNA pools in studied sample. First, it significantly contributes ($ 70-80 per reaction) to the final experiment expenses. Second, since the type of biotinylated NTPs and labeled/non-labeled NTP concentration- and ratio is undisclosed, is impossible to investigate any considerations on bio-NTP incorporation rates. Third, because of the above, it is impossible neither to improve the IVT reaction as it is nor to investigate how variations in the type and/or the total number of biotinylated NTPs may affect the hybridization result or kinetics. For economical and detection-sensitivity reasons we further investigated the IVT reaction efficiency and introduced several improvement.
Our first observation was that, although the Qiagen's RNeasy purification column is excellent performer for purifying limited amounts of RNA, using it for cRNA purification significantly decreases the cRNA yield - because of the column's inability to attach more than 20 µg of RNA. If two passages of cRNA solution were performed on the column, the RNA amount bound increases up to 40 µg, but resuspending the carry-over cRNA is highly challenging - only large amounts of elution buffer and multiple elution steps did allow RNA recovery of more than 20 µg but never exceeding 80% of the amount attached on the column (data not shown). Second, we found that approximately 10-90% (size-dependant) of the short RNA fragments (shorter than 350 bases) could be lost, probably due to inefficient attachment to the column (as reported above; figure 1 ). Comparing the Stratagene's kit to the Qiagen's RNeasy column, we found that the Stratagene kit demonstrates a 40 m g-RNA column purification limit; but with significant looses (exceeding 50%) in large-RNA (larger than 5-6 kb) population recovered while demonstrating no significant looses recovering the short RNAs. These observations led to a preference of replacing the column purification methods with a LiCl-based precipitation procedure, which usually consumes about 30 minutes more time; nevertheless the work-protocol design makes it very convenient to perform this precipitation overnight without actual time-loss. We found the precipitation in 3-4 M LiCl for more than 1 hour at -20 ° C followed by one-hour centrifugation at more than 10,000 rpm to offer quantitative RNA recovery (Walterscheid J.) , leading to significantly high IVT reaction yield ( Table 2 ); the relation of centrifugation speed and time is essential for quantitative-efficient precipitation (data not shown). As well known, LiCl precipitation offers major advantage over other RNA precipitation methods in that it does not efficiently precipitate DNA, protein or carbohydrate (Barlow J.J. 1963) .


We found two easy ways to improve the standard (Enzo kit-based) IVT reaction sensitivity utilizing total RNA amount of 1 µg and below – by prolonged time incubation and by addition of more T7 RNA polymerase. The results from our investigation on IVT reaction processivity are presented on figure 6 and table 2 . The addition of more T7 polymerase with a highest possible enzymatic activity at the earliest stages of polymerase reaction did increase dramatically the reaction yield, rather than an addition at later stages. No inhibition of the reaction was observed as a result of the additional enzyme at any stage during 3 to 48 hours of IVT incubation in the presence of RNase inhibitor. The data analysis with Affymetrix Microarray Analysis Suite demonstrated high level of consistency when comparing Human cell line total RNA processed from different starting amounts (0.1, 1.0 and 10.0 µg)( table 2 and figure 7 ). We observed that only the 3'/5' mRNA hybridization ratios were inaccurately calculated when starting with 0.1 µg of total RNA – probably due to low 5'-end hybridization signal detected possibly as a result of less-efficient template transcription closer to the 5'-end, an increased impact of possible ribozyme activities at low RNA concentrations or else. Nevertheless, the 3'-end transcription seems to be very efficient, since the comparison results were highly similar ( figure 7 ).

Sample detection limits were improved further by implementing double-step IVT linear amplification. An improved double RT-IVT-reaction amplification procedure (utilizing Arcturus RiboAmp Ô RNA Amplification Kit) allowed micro-array applications with staring total RNA amounts much below 100 ng or with direct Laser-Capture-Microdissection RNA-isolates from 10,000 cells and below reaching as far as 1,000 cells (and, possibly, below; data not shown). Detailed result analysis is going to be published else-were and referred to at www.regon-inc.com.

In order to improve sensitivity and the IVT reaction efficiency, we investigated the influence of different reaction components on the final cRNA yield. Most significant influence was observed for the T7 RNA polymerases used depending upon the vendor origin and enzymatic activity. We tested the performance of T7 polymerases (with enzymatic activities ranging from 40 to 100 U per microlitter) from seven different vendors by replacing the T7 polymerase within the Enzo IVT reaction kit (for comparison accuracy). Only those with activity above 80U/µl (activity always according to the supplier statements) had shown to be more or less applicable to high-output IVT reaction ( figure 8 ) and only two of all demonstrated more efficient performance than the T7 polymerase supplied with the Enzo kit. Since the reagent quantities supplied within the Enzo kit are tightly limited and highly priced, the only option to improve the IVT reaction when attached to the Enzo kit is to utilize external T7 polymerase addition. Because our goal was to achieve more flexibility in improving the IVT reaction sensitivity, we developed our custom IVT reaction composition. As a result we found that the ribo-NTP (rNTP) composition is the other highly important factor for the reaction efficiency (data not shown). Tris-derivatives of the rNTPs show better performance than any of others – probably in connection with their pH adjustment, since even a slight decrease in the T7 polymerase reaction optimum of pH=7.9 significantly influences the IVT yield. We recommend final rNTP concentrations of at least 7 mM, which is especially important for the rGTP. We suspect that high rGTP concentrations are so important in relation to rGTP utilization by the guanylyl-transferase activity reported for all RNA polymerases, which may promote initial 5'-cap synthesis (known to greatly increase the IVT reaction efficiency) immediately followed by the RNA transcription reaction (Schenborn E . ; Melton D.A. 1984; Krieg P.A. 1987; Bujnicki J.M. 2001) .
Our IVT reaction, investigations tested on Affymetrix microarrays, were performed earlier with USB T7 polymerase ( figure 9 , positions 2 & 3 and figure 8, position 2), before obtaining more efficient T7-polymerase enzymes ( figure 8 positions 3 and 4). The IVT reaction with USB T7-polymerase is less-efficient than those with the with Enzo T7-polymerase; nevertheless, it did allowed custom IVT reaction kit utilization for investigating the influence of different-number and type of Bio-rNTPs on the Affymetrix microarray-hybridization output result. Both the Enzo IVT reaction kit and our custom kit yielded equal amounts of product with the USB T7-polymerase ( figure 9B , positions 2 and 3; please, note that because of the non-denaturing gel nature the result shown on figure 9A could not accurately serve quantitative purposes). Comparing the Affymetrix Test-2 array hybridization results ( figure 9C ) we found no major differences in the expression patterns originating from the same cDNA product but processed by two different (Enzo- and USB-) T7 polymerases in two different reaction conditions (Enzo and our custom) – with gene expression variations within the range of 10% of genes affected; as usually reported for the Affymetrix arrays. Moreover, because of the very-similar 3'/5' ratios (sequence-dependant) for the corresponding mouse-housekeeping- and other genes obtained for these experiments, it could be concluded, that the two bio-NTPs included within the Enzo kit are also bio-UTP and bio-CTP.

The influence of the biotinylated-rNTPs/non-biotinylated rNTPs ratio was also investigated ( figure 10 ). Certain IVT inhibitory effect was observed when increasing the portion of non-labeled NTP obtained as a sodium-salt powder from Sigma. We did not perform similar investigation for the Tris-rNTP derivates; nevertheless, such an effect could be related to the influence of the pH (insufficient neutralization), the presence of unknown inhibitor or the mono- and/or bivalent metal ion composition (known to influence the IVT efficiency; for detailed info refer to the Ambion's US patent # 5,256,555). Both the Affymetrix Test-2 and U74A array hybridization results demonstrated highly similar results comparing the Enzo- and our custom IVT reaction kit based on the USB T7 RNA polymerase.
We particularly investigated how the type and total number of bio-rNTPs affects the IVT reaction and expression microarray data. For this purpose we first arranged custom IVT reaction kit based on USB T7 polymerase and compared its performance to the Enzo Diagnostics kit ( figures 9 and 10 ). Later we advanced its performance efficiency to match or outperform the Enzo kit by introducing highly efficient T7 polymerases ( figures 8 and 11 ). The origin of the biotinylated rNTPs was found to be critical especially for the intensity and quality of hybridization result (data not shown). Bio-rNTPs obtained from Enzo Diagnostics and Invitrogen demonstrated consistently best performance. Any of all four bio-rNTPs supplied from Perkin Elmer from three different shipments during a period of one year demonstrated comparable high cRNA yield but approximately ten times lower hybridization signal compared to the above mentioned suppliers (overall- and gene-to-gene signal comparison), which we believe is a result of improperly higher (ambient) temperature of storage and shipment adopted. The bio-rNTP/rNTP molar ratios required adjustment and optimization when changing the total number of Bio-rNTPs in the reaction from 2 to 4.

Generally, introducing more bio-rNTPs decreases the IVT efficiency (cRNA yield), which was compensated by decreasing the corresponding bio-rNTP/rNTP molar ratios (data not shown). Because of the undisclosed kit composition it was not possible to adjust the Enzo kit to perform well with more than 2 bio-rNTPs. On opposite, we successfully composed our custom kit to yield equal amounts of cRNA when utilizing different-number of bio-rNTPs (ranging in total from 1 to 4) while outperforming the Enzo kit cRNA yield ( figure 11 – A and B).
Correlating with the increase of the total number of bio-rNTPs, the Affymetrix Test-3 microarray result demonstrated decreased 3'/5'-end hybridization ratios of the corresponding mouse house-keeping genes on one hand and on the other hand – bringing the 3'/5'-end ratios of different genes to more closer numbers. These perfectly correspond to the theoretical expectation, since increasing the number of bio-labeled rNTPs makes the label incorporation less sequence-dependant. A surprising discovery was the fact, that with the increase of the total number of bio-rNTPs from 2 to 4, the number of genes detected as “expressed” on the Test-3 array also decreased, finally matching the exact number of mouse house-keeping genes expressed in our sample to only two genes – and the only those two, presented on the Test-3 array as mouse housekeeping and found to be expressed in our sample. When examining the hybridization images ( figure 12 ), it seems that the incorporation of more bio-rNTPs interfere with the duplex stability, probably resulting in weakened duplexes which favorizes perfect-matches resulting in highly specific discrimination ( figure 12 B and C images) On figure 12-C the value-decrease of gene-specific hybridization signals was approximately 3 times when using three bio-rNTPs and approximately 10 times when using four bio-rNTPs compared to those obtained with only two bio-rNTPs; while in opposite the average signal for the expressed genes was increased by 3- and 6-fold correspondingly and at a similar (3x and 6x) levels for all-genes average array signal ( figure 11 ) . Hybridization signal differences were observed for the same oligonucleotide sequence when hybridized to its perfect-match target labeled with different number and/or type bio-rNTPs (example on figure 12C images for Gene 3, oligonucleotide cell position-7). No such a differences were observed for the pre-biotinylated control mRNA spikes ( figure 12-A images) suggesting absence of any influence on the hybridization kinetics by factors other than labeling itself. These observations suggest for the existence of an effect with possible significant role for the design of not only microarray-based gene expression experiments but also for single-base polymorphism and related experiments; we did not further investigate this effect because it was aside the main focus of our research plans.
We tested how the scanner voltage and the amount of fragmented cRNA hybridized may impact the GeneChip expression analysis result. For the purpose, the quality and the concentration of a total RNA isolated from mouse kidney was estimated precisely by electrophoresis, spectrophotometer and Agilent Bioanalyzer, two reverse-transcription reaction were carried out (each with 10 µg of total RNA) followed by IVT reaction for 20 hours. Yielded cRNA samples were combined, precipitated, fragmented and the final concentration was estimated on Agilent Bioanalyzer. Two double-amount hybridization cocktails were prepared - one calculated for two U74A arrays containing 20 µg of fragmented cRNA each and another for two arrays containing 60 µg of fragmented cRNA each. One array hybridized with 20 µg of cRNA and one with 60 µg cRNA were scanned at “High”-voltage scanner setting and the other pair arrays (with 20 µg and 60 µg cRNA) were scanned at “Low”-voltage setting. Samples were stained by double-step “sandwich” streptavidin-phycoerythryn/antibody protocol. After the scanning, the arrays were re-stained again the same way and scanned one more tome in order to investigate how the re-staining may affect the expression analysis output.
The comparison-analyses results are presented on figure 13 . After extremely precise comparisons of the analysis results, hundreds of single-gene hybridization images and different-type of single-gene signal calculation and local-background calculation during a period of two weeks we become to very interesting conclusions. In general, increasing the amount of hybridized cRNA from 20 µg (standard Affymetrix protocol) to 60 µg decreases the background, increases the signal resolution, exposes the presence of more gene transcripts ( figure 14 ) and improves overall the expression analysis data. Increasing the scanner voltage does not influence significantly the analysis result and image quality, especially for genes with both the very-high- and very-low signal values. Slight improvement in p-values was observed for genes with moderate signal values in the range of 1000-3000. At higher cRNA amounts (60 µg) the impact on the higher voltage is grater (larger increase in present genes) than those at lower cRNA amounts (20 µg) ( figure 13-A and -C ) - at low cRNA amounts the higher voltage almost does not lead to an increase in the number of genes present in the sample compared to the low-voltage scan result, while when high cRNA amounts were hybridized the scanning at high voltage does increase the number of genes present in the sample . In conclusion, we highly recommend using larger amounts (60 µg or more; but constant for all hybridizations within each experimental study and within studies considered for comparison) of fragmented cRNA than the 2 0 µg recommended by the standard protocol.
Additional re-staining is not recommended since the hybridization signal calculated seems to be less-linear and more genes has been declined from present to absent ( table 4 ) probably due to the influence of the increased background values detected. The bleaching effect observed for the fluorescent dye used is very significant and slightly voltage dependant (data not shown), approximately 1-2% lower signal between the first- and the second scan and reaching 10% of signal decrease after the second scan and higher after each following scan – decreasing dramatically the number of genes detected as “expressed in the sample”. As a result, single-scan choice may be a better recommendation especially in cases of limited samples or weak labeling efficiencies. The impact of the bleaching is so dramatic, that it is important to point that, if as a result of any failure, an array scan in process is aborted – it becomes mean-less to repeat the scan since the expression array data are going to be miss-representative. Repeating the array hybridization on another array with another hybridization cocktail is a recommended choice; once hybridized, hybridization cocktail cannot be used on another array since the oligonucleotide amounts on the array surface are at unlimited excess compared to any possible mRNA fraction amount in the sample capturing almost all available RNA molecules. Finally, some unexpected and possibly highly-disturbing findings were observed regarding the GeneChip Suite (version-5) software performance in calculating gene signals and local backgrounds, which will be discussed in details else-were.


Figure 12. Comparison of oligonucleotide hybridization-specificity (mismatch-discrimination ability) between mouse cRNA target-pools labeled with different number of bio-NTPs. Test-3 microarray hybridization images representing control- and house-keeping genes. The images (top-to-bottom) of each group represent biotinylated cRNA sample containing Two (Affymetrix standard protocol, Enzo kit), Three and Four (Affymetrix standard protocol, our custom kit) different Bio-rNTPs.


At last, the very attractive idea of designing washing conditions for the microarrays was tested. After unsuccessful testing of many different solution compositions and washing conditions we applied a most extreme wash we could possibly design – one-hour rotation wash at 80°C against three changes of 80% formamide in 1% SDS- or lauroyl sarcosine aqua solution followed by single-pass flow-wash using 50 ml non-stringent buffer per array at room temperature. Approximately 20 arrays were tested. We found (data not shown), that all streptavidin-dye and antibodies were washed since no signal was detected after scanning. Nevertheless, after re-staining the array without any re-hybridization, the original signal was restored almost completely. Based on this observation we concluded that first, array staining is a highly accurate and reproducible process, and, second, a covalent bond must be formed between the biotinylated target-cRNA and the oligonucleotide probe on the array. We suspect that the energy from the scanner laser beam could be responsible for inducing a cross-linkage between the target and the probe. A very gentle RNase treatment was found to completely remove the residual target preventing from obtaining any signal after re-staining, which to some extend supports the above hypothesis. Based on all these findings we recommend that re-using any type of arrays, which has been scanned on laser-based scanners, should be accepted extremely carefully after evaluating the scanning type and laser voltage effects (so called dye-“bleaching” effects). Other-type of scanner sensors allowing lower-energy lighting (such as CCD- or CMOS-sensor cameras) could be a better alternative allowing re-use of microarrays.
CONCLUSSION
Microarray-based technologies are now a significant part of the modern investigation research and biotechnological practice. Nevertheless, they all experience some common problems related to reagents costs, detection sensitivity, reaction efficiency, data quality and interpretation and etc. In this article we presented our routine findings exploring the Affymetrix GeneChip methodology and improvements applicable to wide range of microarray-based protocols abroad the Affymetrix microarrays. The exploratory nature of this study was designed to investigate broad range of factors and components involved in microarray technology in order to serve as initial experimental base for further technology improvements. Because of the array high cost most of the microarray results were not statistically supported, which we would recommend for future investigations.
REFFERENCES
Bains W., a. S. G. C. (1988). "A novel method for nucleic acid sequence determination." J. Theor. Biology 135 : 303-307.
Barlow J.J., M. A. P., Williamson R., and Gammack D.B. (1963). "A simple method for the quantitative isolation of undegraded high molecular weight ribonucleic acid." Biochem. Biophys. Res. Commun. 13 : 61-66.
Broude S. D., S. T., Smith C. S. , and Cantor C. R. (1994). "Enhanced DNA sequencing by hybridization." Proc. Nat. Acad. Sci. USA 91 : 3072-3076.
Bujnicki J.M., F. M., Radlinska M., and Rychlewski L. (2001). "mRNA:guanine-N7 cap methyltransferases: identification of novel members of the family, evolutionary analysis, homology modeling, and analysis of sequence-structure-function relationships." BMC Bioinformatics 1 (2): 2.
Chee M., Y. R., Hubbell E., Berno A., Huang X.C., Stern D., Winkler J., Lockhart D.J., Morris M.S., and Fodor S.P. (1996). "Accessing genetic information with high-density DNA arrays." Science 274 : 610-614.
Craig A.G., N. D., Hoheisel J.D., Zehetner G., and Lehrach H. (1990). "Ordering of cosmid clones covering the Herpes simplex virus type I (HSV-I) genome: a test case for fingerprinting by hybridization." NAR 18 (9): 2653-2660.
Drmanac, R., Drmanac, S., Strezoska, Z., Paunesku, T., Labat., I., Zeremski, M., Snoddy, J., Funkhouser, W.K., Koop, B., Hood, L., Crkvenjakov, R. (1993). "DNA sequence determination by hybridization: A strategy for efficient large-scale sequencing." Science 260 : 1649-1652.
Drmanac R., L. I., Brukner I., and Crkvenjakov R. (1989). "Sequencing of megabase plus DNA by hybridization: Theory of the method." Genomics 4 : 114-128.
Dyanov H.M., a. D. S. G. (1995). "Isolation of DNA-free RNA from a very small number of cells." BioTechniques 18 : 558-562.
Dyanov H.M., S. D. a. C. R. (1994). A strategy for arrayed cDNA-library characterization by oligonucleotide hybridization . Impact of nucleic Acid-Based Technology: Revolution in Clinical Diagnosis, Applications and Research, Amsterdam, The Netherlands.
Dyanov H.M. and Salbego, D. (2004) Detection and Search for New Genes: Arrayed cDNA Library Characterization by Oligonucleotide “Fingerprinting“ Hybridization. Regon Scientific Journal , 113 [Array Technology]: 1.
Eickhoff H., S. J., Ivanov I., Meier-Ewert S., O'Brien J., Malik A., Tandon N., Wolski E.W., Rohlfs E., Nyarsik L., Reinhardt R., Nietfeld W., Lehrach H. (2000). "Tissue gene expression analysis using arrayed normalized cDNA libraries." Genome Research 10 (8): 1230-1240.
Fodor S.P., R. J. L., Pirrung M.C., Stryer L., Lu A.T. and Solas D. (1991). "Light-directed, spatially addressable parallel chemical synthesis." Science 251 : 767-73.
Khrapko K. R. , L. Y. P., Khorlyn A. A. , Shick V. V., Florentiev V. L., and Mirzabekov A. D. (1989). "An oligonucleotide hybridization approach to DNA sequencing." FEBS letters 256 : 118-122.
Krieg P.A., a. M. D. A. (1987). Recombinant DNA, Part F . J. A. a. M. Simon, Academic Press. 155, page 397: 628.
Lennon G.G., a. L. H. (1991). "Hybridization analyses of arrayed cDNA libraries." Trends In Genetics 7 (10): 314-317.
Lipshutz R.J., M. D., Chee M., Hubbell E., Kozal M.J., Shah N., Shen N., Yang R. and Fodor S.P. (1995). "Using oligonucleotide probe arrays to access genetic diversity." Biotechniques 19 : 442-447.
Markos U., a. S. E. M. (1992). "Parallel analysis of oligodeoxyribonucleotide (oligonucleotide) interactions. I. Analysis of factors influencing oligonucleotide duplex formation." NAR 20 (7): 1675-1678.
Melton D.A., K. P. A., Rebagliati M.R., Maniatis T., Zinn K., and Green M.R. (1984). "Efficient in vitro synthesis of biologically active RNA and RNA hybridization probes from plasmids containing a bacteriophage SP6 promoter." Nucl. Acids Res 12 : 7035-7056.
Milosavljevic A., Z. M., Strezoska Z., Grujic D., Dyanov H., Batus S., Salbego D., Paunesku T., Soares M.B., Crkvenjakov R. (1996). "Discovering distinct genes represented in 29,570 clones from infant brain cDNA libraries by applying sequencing by hybridization methodology." Genome Research 6 (2): 132-141.
Pevzner P. , L. Y. P., Khrapko K. R. , Belyavsky A. V. , Florentiev V. L. , and Mirzabekov A. D. (1991). "Improved chips for sequencing by hybridization." J. Biomolecular Structure and dynamics 9 : 399-410.
Schenborn E ., a. S., P. "Ribo m7G Cap Analog: A Reagent for Preparing In Vitro Capped Transcripts." Promega Notes 74 : 18-20.
Walterscheid J., a. M. S. "The Use of LiCl Precipitation for RNA Purification." Ambion TechNotes 1 (3): 3-10.
---------------------------------------------------------------------------------------------------------------------------
© 2004-2005 REGON Molecular System, Inc. All Rights Reserved.