Regon Scientific Journal |
Volume 105, Issue 1, Article2 |
In a Search for New Genes: Arrayed cDNA Library Characterization by Oligonucleotide “Fingerprinting“-Hybridization
.M. Dyanov1,2 , D. Salbego, S. Savcovic, H. Kreuzer, H. Serrato, S. Batus, D. Grujic, M. Zeremski, Z. Strezovska, T. Paunesku, S. Little, M. Koutrev 1 and P. Dyanova 1
Center for Mechanistic Biology and Biotechnology, Bldg. 202, Room A-249, Argonne National Laboratory Argonne, Illinois 60439-4833.
1 Iriston Corporation, 6037 W. Giddings St., Chicago, IL 60625.
2 Corresponding author. Present address: Regon Molecular Systems, Inc., Touhy Dental Center, 7238 W. Touhy Street, Chicago, Illinois 60630 dyanov@regon-inc.com, dyanov_regon@yahoo.com.
Manuscript received on 18 November 2004, 15.48; Published 19 November 2004
ABSTRACT
This successful strategy was developed, tested and presented in 1995 for large-scale cDNA library clustering and characterization based on oligonucleotide hybridization with more than 200 oligonucleotide probes and computerized data analysis. It creates oligonucleotide “fingerprints” allowing unique identification of a cDNA clone from many thousands other clones. Our proof-of-concept methodology was designated to test the ability of the approach to provide gene identification fingerprints, comprehensive catalogues of genes and to reveal gene expression patterns. Our final results by using only 217 computationally designed probes demonstrated very precise clustering – all of the analyzed cDNA clusters contained cDNA clones, which corresponded to the same or a highly similar cDNA/mRNA type some of which were identified by GenBank similarity search. Briefly, two research teams have previously separated about 132 000 cDNA clones from infant brain cDNA libraries, have amplified their inserts by PCR, and arrayed them onto GeneScreen™ nylon membranes. cDNAs were hybridized with oligonucleotides at miss-match eliminating conditions and clustered computationally based on their hybridization signatures. The quality of the cDNA clustering results from several small- and large-scale clustering experiments (including from 27 600 to 60 000 clone-dots and from 107 to 217 oligonucleotide probes) was tested by cluster-cluster and insert-inserts hybridization comparison. We observed that although 107 probes produced a correct clustering, for many clusters an increase in oligonucleotide probe number from 107 to 217, as well as a threshold parameter increase from 60 or 80 to 100, significantly increases the precision of clustering. Data comparison demonstrated that the clustering based on 217 probes was more precise even than the cDNA insert-inserts hybridization. Also all 200 used control clones with known sequences were clustered correctly according to their overlapping order. In order to particularly characterize the expression patterns of non-normalized and normalized libraries, about 120 clones that represent more than 100 of all distinct clusters have been particularly sequenced and GenBank-identified. In this paper we demonstrate the ability of the method for particular characterization of both non-normalized and normalized cDNA libraries and for estimating the effect of normalization procedure. Based on our data with 217 probes we propose, that the oligonucleotide hybridization-based approach with 200-1000 specially designed probes allows almost complete characterization of any cDNA-library. Such approach gives unique opportunity to study and compare total expression patterns and to identify many new transcription active genes by computational library subtraction. Our technology allows creating clone-characterized library, containing up to 1-2x105 separated cDNA-clones ready for commercial use. Moreover, almost any clone may have a specific oligonucleotide-based signature allowing its precise detection or identification in different biological specimens.
INTRODUCTION
Stable gene libraries that represent messenger RNA harvested from a certain type of cells or tissues are actually cDNA libraries. Interest in obtaining mRNA sequence information has accelerated in the past few years, primarily in order to provide more precise information for all new cell- and tissue specific genes, and their expression. Because an mRNA serves as a template for synthesis of any particular protein in a cell, cDNA sequence can be used to locate the gene or coding sequence associated with the synthesis of specific cellular proteins. In this sense the cDNA-sequence information can be used to obtain gene sequence information, which in turn can aid in the recognition and mapping of genomic DNA sequences.
The total number of mammalian genes is usually estimated to be 50,000 to 150,000 [Chaudhari, 1983 #7], [Drmanac, 1994 #17], [Fields, 1994 #19], [Milner, 1983 #31]. However, this number might be higher as a result of three facts: First, that both strands of DNA are informative and transciptable [Dyanov, 1992 #18], Dyanov, H. M., Bakaeva, T. G., Ludwig, M. Z., Dzitoeva, S. G., and Korochkin,L .I., in preparation]. Second, most eucaryotic genes have a mosaic (intron-exon) structure and one exon (or intron) can play a role in creating more than one protein due to several types of post-transcriptional modifications. Finally, the total number of the wide variety short-life low-copy number poly-A-minus mRNAs, siRNAs, and etc. in different cell-types is not studied very well [Chaudhari, 1983 #7]. In addition, many gene expression studies show that the number of mRNA, transcribed from a gene, can vary up to few thousand times. Because of the above, the identification of new tissue specific genes via cDNA libraries screening, especially if their mRNAs have low-copy number and/or short life, can be extremely difficult.
The possibility to use the large-scale oligonucleotide hybridization technique to define and catalogue tissue specific genes has been proposed before [Drmanac, 1991 #13], [Lennon, 1991 #28]. The advantages of such a strategy are that the short oligonucleotide probes are specific, hybridizationally highly-discriminative and evenly spaced throughout the genome, which provides the opportunity for specific fingerprinting of any genome fragment – even in mixed genome samples. The latest is extremely important for detection and sequence identification of bacterial or viral infections, or commensal co-existences almost impossible to evaluate by other means, especially in such a large-scale analysis. Moreover, by this approach the characterized cDNA libraries can be created and future redundant analysis of existing contigs is avoided. In an variation of this approach, small cDNA-fragments from cloned in cDNA libraries can be used as probes, supplementing the genomic clone ordering with localization of transcriptionally active sequences (similar to SAGE analysis). An additional advantage of the oligonucleotide hybridization-based approach is that exon localization can be performed computationally by simultaneous oligonucleotide hybridization analysis of both cDNA- and genomic DNA-libraries. Furthermore, data obtained from oligonucleotide hybridization with motif-specific and/or randomly generated oligonucleotides provide high-resolution DNA-mapping, gene structure information, and positioning of specific regulatory elements [Bruzik, 1992 #5], [Melmer, 1992 #30]. A complete characterization of arrayed total cDNA-libraries by a large number of oligonucleotide hybridizations could be useful for identification and analysis of gene expression patterns in tissue- or stage-specific libraries as well. It can provide information for specific clone coding by oligonucleotides applicable for DNA fragment identification in medical diagnostics, populations and ecosystem’s contamination with bacterial and viral genomes, and for construction of DNA arrays for expression pattern investigations as well. In this direction the presented here technology could have many applications in the areas of fundamental research, clinical and epidemiological practice, as well as in biotechnological marketing.
MATERIALS AND METHODS
Briefly, about 150,000 single recombinant clones were isolated (see acknowledgments) from non-normalized and normalized (subtracted) brain cDNA libraries constructed from a three-months-old post-natal human brain with a lafmid BA vector and placed at our disposal by Dr. M.B. Soares (Columbia University). The E. coli recombinant clones where grown on 1.2-1.5% LB agar or in LB media (SIGMA Chem. Co., St. Louis, MO; Difco Laboratories, Detroit, MI) and arrayed into 96-well (Fisher Sci., Pittsburgh, PA, cat.# 08-757-155, 1994) or 864-well (General Atomics, Helix; San Diego, cat.# HE864-PC-50, 1994) micro-titre plates by single-colony transfer or by the method of limiting dilution (the library containing a very low percentage of non-recombinants, was precisely diluted in LB-media and tested to provide a single cell in each 10 µl media). The 96-well plates with 100-200 µl/well media were incubated overnight at 37 °C with intensive (300 rpm) shacking. The 864-well plates with 10 µl LB-media and 1 l mineral oil layer (for evaporation prevention) were incubated 48 h at 37°C stationary or overnight at 37°C on a vibrating platform at about 60 horizontal vibrations/s). All "master-plates" containing arrayed cDNA clones were coded by specific bar-codes, so that a master-plate (arrayed-cDNA-clone) library was created. Master-plates were used for PCR-amplification of the inserted fragments and, after addition of sterile glycerol (Sigma) to 20-30% concentration, the separated-clone library was stored at -70°C. Each corresponding insert was amplified by PCR; one 96-well plate contained following reagents at final concentration: 1 x Tfl-PCR buffer (Epicentre Technologies Co., Madison, WI), 3 Mm MgCl2, 210 µM of each of four dNTP, 0.4 µM of both M13 extended sequencing primers (5'-GGGTTTTCCCAGTCACGACG-3' and 5'-CACAGGAAACAGCTATGACG-3'), 0.3 U/25 µl reaction of Tfl-polymerase (Epicentre Technologies Co., Madison, WI). Into each well containing 25 µl PCR-mixture and 25 µl mineral oil were transferred 2-4 µl of cell suspension from 25- to 50-fold-diluted master-plate cell suspension using 1-mm metal pin array (3 repetitions of the transfer procedure per each plate) (Figure 1). The 864-well plates contained 10 µl PCR-mixture and 5 µl/well mineral oil per well. Each well was inoculated by 3 transfers using 0.5-mm pin-array directly from the master-plates cell suspension. For single-strand PCR-amplification were used about 4 µl from a 25-times diluted overnight cell suspension transferred by 3 procedures with 1-mm pin-array or 10 procedures with 0.5 mm pin-transfer and only one of the primers. All PCR amplifications were performed in a BioOven II and a BioOven III (BioTherm, Fairfax, VA) under the following conditions (for 6-10 plates):
For 96-well plates: For BioOven II - initial denaturation of 5 min at 94°C; followed by 27 cycles of denaturation (1.5 min at 94°C), annealing (0.5 min at 48°C), and elongation (2 min at 74°C), and a final elongation of 5 min. For BioOven III - initial denaturation of 5 min at 96°C; followed by 25 cycles of denaturation (2.5 min at 94°C), annealing (2.5 min at 48°C), and elongation (2.5 min at 74°C), and a final elongation of 5 min at 72°C. For 864-wells plates: initial denaturation of 5 min at 96°C; followed by 25 cycles of denaturation (5 min at 96°C), annealing (4 min at 50°C), and extended elongation (3 min at 78°C), and a final elongation of 5 min at 74°C. Finally, a PCR-amplified inserts' library was created and stored at -20°C. The top oil layer was created to prevent the liquid evaporation during spotting and (especially) during freezing /melting cycles.
High density clones' arrayed membranes were prepared using GeneScreen™ nylon membrane on a Biomek 1000 (Beckman Inc., Fullerton, CA) adapted for this purpose (I. Labat, unpublished information) as described in details (Dyanov, H.M. and Salbego, D., (1995) unpublished).
The radioactive labeling reaction (20 µl volume) contained the following: 9 µl of corresponding oligonucleotide solution (10 ng/µl), 20 U of T4 polynucleotide kinase (NE BioLabs, cat. # 201L, 1994), 2 µl of 10x kinase buffer, and 60 µCi γ-33P-ATP (Amersham Life Sci. or DuPont NEN) for oligonucleotides with 0-1 G/C content or 30 µCi for oligonucleotides with 2-6 G/C content, or 15 µCi for oligonucleotides with 7 and higher G/C content. Reaction mixtures were incubated at 37°C for 60 min and kinase activity was destroyed by incubation at 60-65°C for 10 min. The labeling efficiency was detected by deposition of 0.5 µl of labeling mixture on PEI paper (E. Merck, Darmstadt, Germany; cat.# 5579., 1992) and chromatography for 30-50 min in 0.5-1 M KH2PO4. A last modification gave preference to the use of a GF/C glass membrane (Wathmann LabSales Hillsboro, Oregon) as the chromatography carrier because it allows a much higher speed (only about 30 s) and absolutely strong separation of labeled DNA/non-incorporated 33P-dNTP signals.
The hybridization mixtures were prepared by addition of all 20 µl of the labeling mixtures into 3-7 ml of hybridization buffer. All arrayed membranes were initially prehybridized for 5 min in hybridization buffer contained 7% sodium lauroylsarcosine and 0.3 M sodium phosphate, Ph 6.8-7.5. Then each membrane was placed between two polyethylene shields (Fisher Sci. cat. # 01-812-10, 1994). If necessary, two membranes were placed back-to-back between the same shields so that each DNA-containing side of both membranes was free to probe diffusion. Then each hybridization mixture was placed between the two nylon shields so that after closing at the shields, the capillary force distributed the mixture evenly on the top (DNA-containing side) of each membrane. In our case all 20 membrane-replicas were arranged as a sandwich [membrane(s) between nylon shields/Watmann separating shield/membrane between nylon shields] into a plastic box for radiation safety. Then the hybridization was performed in a refrigerator. This experimental design allows hybridization of up to 100 probes (each probe with two membranes) per person per day in 2-3 plastic boxes. The hybridization was performed for 1-1.5 h at 8-13°C for oligonucleotides with more than 1 G/C content and at 0-3°C for oligonucleotides with zero G/C content. After the hybridization, each membrane was rinsed three times in cold (0-5°C) 6xSSC buffer (500 Ml of SSC for passing of 20 membranes) in a plastic box and transferred into another box with 1 L of 6xSSC, where all 20 membranes were collected before washing. Each membrane was washed for a different length of time in 100 Ml of 6x SSC at 10-15°C (for 1 and more G/C contained oligo) or at 0°C (for zero G/C contained oligo). Because the strength of hybridization depends on many factors (difficult to prediction), the washing process for each membrane was controlled by a Geiger-counter: the optimal wash was detected when the total sector-radioactivity was in a range of 50-150 cpm per 50-mm-diameter sector, measured 10 mm from the membrane surface. The washing-time data were collected in a datafile so that the next experiments with the same probe(s) did not need monitoring of counts.
After a removal of the washing solution by pressing the membrane between Watmann shields, membranes were exposed in a Phosphoimager cassette at 5-15°C for 6-12 h and then were scanned on the Phosphoimager SP (Molecular Dynamics, Sunnyvale, CA). Screen images were transferred into a database for further processing using our image analysis program DOTS (J. Jarvis),(13).
To remove the hybridized oligonucleotide probe(s) from the targets and to prepare the membrane for the next hybridization, all membranes were immersed one by one in 500 ml of hybridization buffer and incubated at 65°C for 2 h with slow shaking. After additional washing twice in 1 L of 6x SSC for 2 min, the membranes were stored in plastic box with 100 Ml of 6x SSC at 4°C until the next hybridization. Membranes can be reused over 40 times.
The membrane hybridization images were proceeded by the DOTS program (Jonathan Jarvis, in [Drmanac, 1992 #14]) and the DOTS-optimized images data were proceeded by a clustering program package [Milosavljevich, 1993 #31], which creates clone signatures, consisting of hybridization intensities of a particular clone with a set of oligonucleotide probes. These signatures (used for comparison and clustering) were created by two types of intensities scaling to ensure both reproducibility and trustworthiness: 1) Mass-probe scaling - to measure DNA molarity relations between all clones in each membrane as well as between the target-sites molarities of each probe hybridized on any DNA insert. In fact, dots/probe intensities were divided by dots/mass-probe intensities (mass-probe is an oligonucleotide, complementary to the unique vector-specific site at the end of any insert). 2) Dot intensities rank scaling (within a membrane and within a signature) - to measure the intensity of any dot by its rank among the intensities of all other dots on the same membrane; such scaling within a membrane is based on an assumption that, because each membrane contains a very large number of randomly picked clones, the rank-scaled intensities are reproducible across different membranes, replicas, and experiments. Or, in other words, if after rank scaling one clone has a higher signal than another clone, the same relation between them will exist on any membrane replica where they are hybridized with the same probe. So, the rank scaled intensities will remain the same value while the non-rank scaled intensities may vary non-linearly as result of fluctuations of hybridization conditions and time, probe concentration, and exposure time. In practice, the values of the intensities were rank scaled and transformed to a scale of 0 to 255. Additionally, a second rank scaling (within the signature) was performed between the rank-scaled dot intensities forming a signature - so that each rank-scaled intensity is replaced by its new rank among all other intensities within the same signature. For example [for a scale range from 0 to 1 (!)], for the new scaled dot intensity with a particular probe will be assigned value of 0.9 if its initially scaled intensity is greater than the initially scaled intensities of 90% of probes within the same signature. The mass-probe scaling ensures reproducibility within any particular membrane, the dot rank scaling within a membrane ensures reproducibility across different membranes and the rank scaling within a signature eliminates false clone similarity generated on a base of similar signatures as a result of closely similar (usually weak) hybridization signals with any one or more probes. The dots with mass-probe intensities below the threshold value were discarded from the analysis
RESULTS
The cDNA inserts of 132 000 recombinant clones (55 000 from non-subtracted library and 77 000 from subtracted library) isolated from infant brain libraries (amiable placed at our disposal from Dr. Marcelo Bento Soares, Columbia University) [Adams, 1991 #1], [Adams, 1993 #2], [Adams, 1993 #3] were amplified by PCR and arrayed onto GeneScreen nylon membranes. At the same time three types of membranes have been prepared, containing 3 456, 7 776 and 31 104 clone-dots spotted using different array designs. A complete strategy for clone separation, growth, separated-clone-library storage, inserts amplification and spotting onto nylon membranes has been developed. The final optimized variant is shown on figure 1. This newest methodology tolerates faster clone growth, storage and inserts amplification - all in 864-well micro-plate. The new experimental design allows very compact and rapid proceeding for performance of 20-200 membrane-probe hybridizations per day/person (corresponding to about 3x105-1x106 sequenced bases per day) that is about a 10-fold increase of sequencing data output compared to the anything published.
In a small-scale experiment one high-density membrane (15/22 cm; 31,104 dots) containing 27,500 PCR-amplified inserts from normalized (subtracted) infant brain cDNA library was hybridized under a strong miss-match eliminating conditions with 270 oligonucleotide probes. The membrane pattern was specially designed to allow obtaining of experimental information for hybridization data representativity and reproducibility, an equal gradient of probe distribution onto the membrane, as well as measurement of hybridization background and computational trustworthiness. By these reasons, membrane pattern includes several empty positions, a dot-duplication of all clone inserts and also dot-duplications of several 1-2-kb control DNA-sequences (totally 192 dots), covered in overlapping order a part of a known cosmid-inserted sequence, representing the 12 kb dystrophin gene [Pizzuti, 1992 #35]. Twenty replicas of the same pattern were prepared by using Biomek-1000 (Beckman Instruments, Fullerton, CA, USA) robot and our modification of the spotting program (Ivan Labat, not published, patent pending). Ten replicas contained eight spots of the same cDNA fragment on each dot and another ten - 10 spots on each dot (in order to deliver desired amount of cDNA per dot-location); all performed by a 0.3 mm metal flat-pin array, designed to spot simultaneously from all 864-wells of each plate. All 20 membrane replicas were hybridized first with a mass-probe, highly specific and complementary to the vector-specific linker part of all inserts in order to measure DNA concentration relations (or absence) of all the dots. Then, large-scale hybridization experiments were performed with about 270 oligonucleotide probes with the sequence formula N0-2OLIGO6-10N0-2 (where N is a random dNTP and "OLIGO" is a 6-10-mer oligonucleotide). 450 hybridizations were performed; 370 of them showed a high quality result (i.e. uniform probe signal distribution on membrane surface and strong visual signal/background discrimination). Only one membrane image for each hybridized probe was chosen for the computerized analysis. During the experiment all data-production cycle was performed with a significant decrease (2-3-fold) in time and efforts, and about 3-4-fold increase in data production. In our investigation only one researcher was able to produce hybridization data on 40 membrane replicas with 40 oligonucleotide probes per day. This corresponded to about 622 080 single dot-hybridizations or about 3x105 - 1x106 determined (sequenced) bases per day. In a large-scale variant it can allow at least a 10-fold increase of sequencing data production or up to 3-3.5x106 single dot-hybridizations, corresponding to at least 2-5x106 sequenced bases per day (if 8-mer oligonucleotide probes).
The data from all experimental stages were transferred and stored in a computer database in a manner that allows easy accessibility and processing [I. Labat, B. Hauser, D. Salbego and A. Milosavljevic, unpublished]. Data was analyzed in two steps. First, the hybridized membranes were screened on a Phosphoimager SP (Molecular Dynamics, Sunyvale, CA), images were transferred into the database and analyzed by a DOTS program [J. Jarvis, in [Drmanac, 1992 #14]]. DOTS is a very complicated software package for precise image analysis and measurement, which, generally, measures and localizes exactly the signals and their pikes onto the corresponding membrane formats as well as performs an image-file transfer, image printing, and final report. At the second step, the optimized membrane images' data were proceeded by a clustering program package [Milosavljevich, 1993 #31], which creates clone signatures, consisting of hybridization intensities of a particular clone with a set of oligonucleotide probes.
Finally, based on pair-wise comparison of their signatures, clones were clustered by their mutual similarity. Under this assumption the clustering algorithm was used to correctly cluster all clones based on a similarity threshold and a list of their signatures. The clustering algorithm was practically repeated to avoid false separation of identical clones into distinct clusters – in the second run each original signature was replaced by the rank-scaled average of all the signatures from the same cluster obtained in the first run. The input data in each clustering analysis was the membrane type and replica name, its patterns, all data connections of any dot position with the corresponding PCR- and cell culture plate, the total list of used oligonucleotide probes and their sequences, and the membrane images proceeded by the DOTS program. The final output data was (figures 3 and 7): the type of clustering experiment (membranes, total number of dots, and total number of probes used), a list of the cluster types (total number of clusters and corresponding number of clones in each of them), total numbers of dots included in these clusters, and some more technical information including the threshold used. Moreover, some additional information can be obtained for any particular cluster, its clone members and their location, probes used, the value of the corresponding intensity of any hybridization result, and the relative variability of the scaled dot-intensities of all clones in any particular cluster (based on which clusters of similar size can be compared by their clustering clone variability)(figure 1).

Figure 1. SBH production line - scheme of an optimized procedure for high density (31,104 dots) membrane preparation and data analysis. RP - indicate BIOMEK robot platform; PF - indicate plastic frame for a double 3MM Wattmann (DW)/GeneScreen membranes (GM) "sandwich deposition; C - indicate position of membrane corner; MS - position of membrane signature; 1 - position of first-dotted DNA-insert; and DS - denaturation solution. With numbers are indicated different clones; with letters - different type of oligonucleotide probes. The mass-probe scaling measures the DNA molarity relations between all inserts on each membrane; the rank scaling within a membrane ensures reproducibility across different membranes; the rank scaling within a signature eliminates false similarities generated on a base of similar signatures as a result of similar hybridization signal with any or more probes.
To optimize the input clustering parameters and to approve the correctness and reproducibility of the clustering results, four basic approaches were used for data analysis.
First, all control clones of known sequences and overlapping order were investigated. Different threshold parameters (threshold: 60, 80 and 100) were used for clustering to ensure the correctness of clustering as in their overlapping order. We found that a threshold 80 ensured a correct clustering on the low-density membranes (3,456 and 7,776 dots). On the high-density membranes (31,104 dots) only a threshold of value 100 allowed a precise clustering as well as on low-density membranes. Figure 2 shows the results from the clustering of control clones. Only 7 dots, representing 3 different control inserts (from all 192 control clone dots that represent 46 different control cDNAs), were clustered incorrectly. Later investigative experiments demonstrated that these have been contaminated.
Second, an insertinserts hybridization approach was used to prove the fidelity of our clustering results. By this approach, one insert, a member of a chosen computationally-generated cluster, was labeled and used as a hybridization probe on a high density membrane containing all other members of the same cluster. The results from 50 hybridization experiments indicated that our clustering approach produced more precise clustering and detection of similar clones than the insertinserts hybridization (figures 4, 5, and 6). In fact only for a few clusters the highly discriminative insertinserts hybridizations were able to precisely separate the similar clones; which already were easily separated by oligonucleotide hybridization computational clustering (figure 4). Trough our investigation we detected a very interesting cluster, were the insertinserts hybridization approach exposed 20 similar cDNA-inserts; however the insertinserts hybridization approach failed to separate these inserts into different cDNA groups. The oligonucleotide clustering separated these cDNA-inserts into 5 different clusters (figure 5). A member of one of the clusters (cluster # 968, containing 8 members) was identified by sequencing and GenBank search as a cDNA complementary to the Human elongation factor mRNA. We found two other inserts (members of two other clusters) to have high insertinserts hybridization similarity, but they were not analyzed by GenBank search. By comparison of insertinserts hybridization and clustering results we revealed that a decrease of the threshold parameter to 80 and 60 (figure 5), which decreases the accuracy of the clustering, brings most of these clones into one common cluster. Therefore, the clustering approach (threshold = 100) separated more precisely all these similar clones into several distinct clusters, which was impossible to be achieved by insertinserts hybridization, not by lower threshold values. The same results were obtained for another group of clones, where a single member was sequenced, but GenBank identification filed to identify it.




Third, a computational analysis including clusters' comparison was used to analyze the clone clustering in different clustering experiments with different number of oligonucleotide probes used (correspondingly, the total number of 106; 108 and 217 probes was used). This experiment was based on the assumption that for some of the clusters, the basic number of cDNA clones will remain the same in different clustering experiments, if the total number of the oligo probes used covers their sequences well enough to produce precise hybridizational “fingerprint”. In other words, we investigated the heterogeneity of the corresponding clusters from different clustering experiments (including different number of clones and probes) and analyzed the percentage of members that remain the same in all corresponding clusters among the different clustering experiments, and those that moved into other cluster(s). Additionally, we used the approach presented here to investigate how the number of the hybridization probes (oligonucleotides) and the total number of analyzed clones reflects the clustering results (figure 7) - in order to investigate brain transcriptional activity and to check whether the library normalization reflected on the transcripts abundance (figure 6).

Figure 6. Comparison of clustering results obtained from the clustering experiment including BH-, DL-, BSM- and ZBS membranes (Clustering experiment CE-1) against the clustering experiment including BH-membrane only (Clustering experiment: CE-3). CE-1 included about 59,000 dot-cDNA-inserts (42,000 from normalized cDNA library and 17,000 from non-normalized library) and 106 oligonucleotide probes (each hybridized with all membranes). CE-3 included about 27,500 dot-cDNA-inserts and 217 oligonucleotide probes. On the table are presented as follows: on row 1 - the total number of cDNA clones in each particular cluster; on row 2 - the total number of all clusters, containing the same total number of cDNA clones presented in row # 1; on row 3 - the c# indicate the specific cluster number and N= indicate the normalization (subtraction) factor, calculated as N=A/B x 2.7 or N= A x 2.7, if B = 0, where: A is the total number of clones found in any particular cluster, representing clones only from non-normalized library; B is the total number of clones in the same cluster, representing clones only from normalized library; 2.7 = total number of all analyzed clones from normalized library versus the total number of all analyzed cDNA inserts from non-normalized library. On row 4 -the GenBank accessory number and part of the name of the sequence from GenBank, corresponding to the sequence of one or more sequenced cluster members (as identified by FASTA-program). The arrows type <*─m(n) ─*as solid arrow indicates the direction of clustering comparison, as well as the location of the major-pool of clones located together into the new cluster (in CE-3). The total number of cDNAs in a particular cluster is represented by m; n represents the total number of cDNA inserts (clones) from the original cluster (from CE-1), clustered together by the CE-3 clustering experiment. Such number is presented on the figure only if the second clustering experiment (CE-3) outputs reduced total number of inserts; With another words, n indicates how many are the inserts from the compared cluster (CE-1) that are presented into the cluster compared to from the other clustering experiment (CE-3). The m and n are presented on the figure as numbers. The <─m(n)─* as broken arrow indicates the location of the second (high-number) pool of clones located together into an another cluster of the clustering experiment compared to (CE-3). The blue spread-lines indicate the observation, that many single clones or small-number insert clusters (from CE1) are located in many other clusters of CE-3; in fact this is an indication for very-high cluster heterogeneity. The “total clones=“ indicates the total number of clones used by the program for clustering after a discarding of those of the dots, which produced hybridization intensities below the threshold used. The “total distinct =“ indicates the total number of clusters created in the corresponding clustering experiment. Note, that in the left part of the figure (in rows 1 and 2) the data from CE-3 are duplicated to allows a better data visualization. †) Additionally to this clusters' group belong: Cluster #210 (N=5); M19311 Human calmodulin mRNA,complete ... ††) Additionally to this clusters' group belong: Cluster #881; (N=19) Human adenosine deaminase mRNA.... †††) Additionally to this clusters' group belong: Cluster #397 (N=3); M79136 Human expressed sequence tag...
Finally, 120 inserts, members of several different clusters, we particularly sequenced in order to be identified (average sequenced length 0.2-1.0 kb). GenBank identification results for 120 inserts were compared to our clustering data to identify the verisimilitude of our clustering approach, as well as the nature of the corresponding clusters (respectively - cDNA families). About 70 of all tested inserts were recognized by GenBank similarity search (FASTA program). By our approach they were clustered into 44 distinct clusters (figure 6). The rest of the inserts (totally 50) failed to be recognized and probably represented new mRNAs.
Assuming the results of our primary data analysis we optimized and postulated the parameters of accurate data analysis. Only a threshold of 100 is appropriate for generation of accurate clustering on both high- and low-density membranes. Moreover, in few cases one of the members of each pair of clones has been clustered in the group of clusters with single clone in each corresponding cluster as a result of too precise signature discrimination. In all of such cases the second duplicate has been located correctly. This option practically generates some percentage of “trash”-clusters (clusters inappropriate for gene discovery investigation) as a part of the single-member clusters' population; note that in our experimental design all inserts were spotted in duplicates therefore each cluster should contain a minimum of two cDNA members if no failure in hybridization or clustering procedures was generated). Another percentage is those of the clusters containing single clones, the second duplicate of which has been computationally discarded from the analysis as having incorrect signatures. Nevertheless, the final goal is an extremely accurate clustering with a quite low total number (<220) of oligo probes implemented in the fingerprinting approach. In our clustering analysis, we accepted the basic cluster homogeneity in different clustering experiments (how many clones remain as the members of same cluster) as a relative criteria for correct clustering; especially for those clusters for which we have not generated insertinserts hybridization data. We have taken in mind also, that some percentage of clone transfer is generated in case of comparison experiments with 107 (106) probes and with 217 probes – due to an increase of clustering accuracy. Our initial expectation was that clusters containing the highest number of clones from the non-normalized library contain relatively correctly clustered clones, because such library is expected to represent abundant transcripts and because most of our oligonucleotide probes were especially designed to represent the coding regions of the most abundant transcripts deposited in GenBank.
We compared the list of clones in hundreds of clusters from four types of clustering experiments against each other (figures 7 and 8) including clustered clones from both normalized- and normalized & non-normalized libraries. To assess the accuracy of clustering in the bigger clusters we compared these clusters in the following direction: BH BH+DLBH+DL+BSM+ZBS membranes as well as in the reverse direction. (Capital letters mark the membrane-type codes, corresponding to the initials of the investigator, who prepared them).
Analyzing the experimental data of clones only from normalized library, we observed an increased level of clone divergention (transfer) directly proportional to the increase of the total clone number as well as the total number of oligo probes used (figure 7). Moreover, in parallel with the increase of the total probe number, such "transfer" affected an increasing number of both types of clusters – very-small (1-2 members) and very-big (>20 members). In contrast, most of the moderate-size smaller clusters (with few cDNA members) remained the same, "losing" only 1 or 2 of their members. We compared this data to the completely opposite types of data obtained from a clusters comparison including clustering of clones from non-normalized library against those from normalized library. Based on the above comparison we propose that most of the bigger clusters (containing from 17 to about 350 members) of the non-normalized clone library clustering contains clones clustered incorrectly, probably as a result of not-enough probe coverage on their sequence. For several clusters this speculation has been verified by insertinserts hybridization results. For example (compare figure 3 and 7), for the clusters containing as a member a clone identified as Human Guanidine Nucleotide Binding Protein (HGNBP) mRNA (the biggest cluster on figure 6) was shown that only 4 clones from BH membrane hybridized positively as a members of the HGNBP cDNA family; all four of them were later clustered precisely in a separated cluster when using increased probe number of 217 oligo probes. All of the analyzed small clusters demonstrated a total correlation with the insertinserts hybridization data, which allows us to speculate, that in our experimental situation most of the small clusters (including 2-8 clones) from normalized library clustering were clustered highly accurately, especially when 217 probes were used (figure 7). Moreover, only 13% from the small clusters (106 analyzed clusters, with up to 5 members in each cluster) showed high divergention; in contrast, 90% of the analyzed big clusters (26 totally analyzed) were with the high divergention (BH membrane clustering on figure 7).

Figure 7. Comparison of clusters and clustering results obtained from three different clustering experiment – each containing different number of cDNA inserts from normalized (subtracted) library only (spotted onto the BH-, DL-, BSM- and ZBS membranes and different number of oligonucleotide hybridization probes. The type of comparison and the clustering parameters are presented on the figure. All indications and abbreviations are the same as the presented under figure 6. Note, that, in the left part of the figure containing two rows of comparison data - from clustering experiments #1 (BH membranes) and #2 (including BH- & DL membranes) – are duplicated (top- and bottom rows) for better data visualization
In opposite, the analysis of the clustering experiment including clones from non-normalized library (figure 6) against the clustering experiment including clones from normalized library only (BH membrane), demonstrates significantly lower divergention for all big clusters (only 60 of them analyzed; containing 14-2011 members each). Twenty-four of them (40%) contained clones presented only in the non-normalized library and they represented cDNAs, completely eliminated by the subtraction procedure. Sixty percent of other analyzed clusters demonstrated different level of normalization (N; figure 6) and seven of them (12%) replaced the major number of their clones into the big clusters from normalized library experiment ;seemly, representing highly expressed genes clustered correctly in the big clusters with no divergention of members within different clustering experiments. Only four (6.7%) of all analyzed clusters (from the clustering including clones from non-normalized library) showed higher clone divergention and all of them represents clones only from normalized library. Based on the above, we concluded, that most of the bigger clusters from non-normalized library were clustered correctly; which had been expected from a point of common biological sense, as discussed above (since library is not normalized, they contain all the highly expressed genes in the sample).

Figure 8. High-density membranes hybridized with oligonucleotide probes containing regulatory sequences. On A is represented 1/2 part of the membrane; on B, C, D, E and F - 1/4 part of the membrane. A represents membrane image after hybridization with N21 oligonucleotide probe (NTTATAAAAN) corresponding to the TAAT-box promotor (TATAAAA); because of the double-stranded nature of the arrayed cDNA inserts, the positively detected TAAT-box sites are located in the antisense cDNA strand or in the sense cDNA strand as the additional intra-gene transcription initiation signals, commonly presented in different mRNAs. B represents membrane image after hybridization with E29 oligonucleotide probe (NNCTGACCAN) complementary to the docking site for regulatory sequences recognized by TFIIIA zinc-finger transcription factor (AGGTCA). C represents membrane image after hybridization with E41 oligonucleotide probe (GCCGCCC), corresponding to the constitutive enhanser for metal-responsive promoters (CCGCCC). D represents membrane image after hybridization with E15 oligonucleotide probe (NNCAAGGAGN), corresponding to the 16S rRNA Shaino-Dalgarno region (AAGGAGG) for initiation of translation. E represents membrane image after hybridization with E5 oligonucleotide probe (NNTGGTGGAN), complementary to the additional binding site for 18S rRNA in eucaryotes (CCACC). F represents membrane image after hybridization with 7-32 oligonucleotide probe (NNTTTATTAN), complementary to the polyadenilation signal site (AATAAA).
The oligonucleotide hybridization fingerprinting allows, on a very early stage of library investigation, the identification and screening of gene families - based on positive hybridization results with probes, complementary to any of known regulatory sequences of translation active DNA regions corresponding to the common- or specific protein motifs (figure 8). The data obtained from oligonucleotide hybridization with motif-specific and randomly generated oligonucleotides in addition to providing high-resolution gene mapping also yields gene structural information, such as position and putative function of specific structural and regulatory elements in the genes and among the genomes. This serves not only genome mapping purposes, but precisely mapping novel genes as well. To investigate transcripts' representation in the infant brain libraries, about 80 inserts from the non-normalized library and 40 from the normalized library were chosen randomly or as a members of different distinct clusters, particularly sequenced (95 of them - commercially) and tested for similarity recognition against GenBank (at July, 1994). They represented 85 distinct cDNA clusters; 65 of them presented in the non-normalized library (figure 6). Seventy-five percent of all sequenced clones from the non-normalized and only ten percent of clones from normalized library were recognized. This observation clearly shows that the infant brain library normalization (subtraction) performed in Dr. Soares laboratory increases probability for detection of new-type-cDNAs, basically by 5-10-fold (much higher than proposed before [Drmanac, 1994 #17]). Moreover, it demonstrates the highly significant role of the library subtraction (normalization) approach for gene investigation and detection of new types of mRNAs (genes). Such approach, compared with our strategy for library fingerprinting, significantly increases the opportunity for discovery and characterization of novel cDNAs from any particular library. Here we demonstrated, that our clustering approach, applied to non-normalized libraries, ensured in fact a computerized library normalization - most of the bigger clusters (with more than 15-20 members) represented known (highly-abundant) transcripts and, in opposite, most of the small clusters (with 2-10 members) represented clusters, which failed to be recognized by GenBank similarity search (i.e. represent unknown genes). Additionally, if analyzing non-normalized libraries, our approach allows for full transcription pattern- and expression activity characterization applying simple calculations (not explored in the present study, but the software has such capabilities tested in details otherwise). This presents unique opportunity for characterizing relations in transcription activities between any transcripts (genes) in any particular library from any particular tissue or organ and between different libraries. The advantage of this approach over any other known array-based strategy is in its ability to cover completely any (including unknown) genetic content even within mixed genomes.
DISCUSSION
The need for high-density DNA maps has become more apparent in human molecular genetics for analysis and characterization of many genetic diseases for which no direct cloning methods exist. Moreover, in many close areas, such as new drugs discovery research, the need for gene expression level investigation [Schena, 1995 #42] and tissue transcription pattern analysis has been increased - including complex gene (expression) interactions involving multiple genomes (host↔infectious agents↔comensal residing microorganisms). Finally, the Human Genome Project has been created to stimulate a much faster development of human genetic medicine and biotechnological marketing by stimulating the development of new approaches and methodologies for large-scale genome mapping, sequencing, and gene analysis. Much of the anticipated beneficial outcome of the Human Genome Project is related to the mapping and sequencing of new- or known defective alleles that result in disease. Many new large-scale strategies for complex genome analysis and/or sequencing were developed [Hooft van Huijsduijnen, 1992 #23], [Jayaraman, 1992 #24], [Jones, 1992 #25], [Khudyakov, 1994 #26], [Meier-Ewert, 1993 #29], [Milosavljevich, 1993 #32], [Saiki, 1986 #36], [Southern E.M., 1992 #38], [Southern, 1994 #39]. The tag-sequencing approach (including typically 200-500 bases sequencing from one or both ends of a clone insert) has been used to generate contiguous sequence information [Adams, 1991 #1], [Adams, 1993 #2], [Adams, 1993 #3]. Although such approach, as well as many close modifications [Carrano, 1989 #6], [Church, 1988 #8], have problems to discriminate members of homologous gene families because the 5'- and 3'- ends of mRNA contain non-translational or regulatory sequences which supply only a limited information about the specific gene function. Moreover, it is too high cost-, time-, and efforts consuming to prepare arrays containing highly-specific gene signatures.
The Sequencing-by-hybridization (SBH) and the large-scale hybridization on arrayed libraries combined with computerized data analyses are another alternative for generating both sequencing data and transcription pattern characterization of a particular library. Many research groups independently started development of this DNA-sequencing approach [Bains, 1988 #4], [Church, 1988 #8], [Drmanac, 1987 #10], [Drmanac, 1993 #15], [Craig, 1990 #9], [Lehrach, 1990 #27], [Hoheisel, 1994 #22], [Southern, 1994 #39], [Southern, 1988 #37], [Southern E.M., 1992 #38], [Mirzabekov, 1994 #33]. Basically, a set of oligonucleotides (7-12-30 bases in length) has been used in match-specific hybridization experiments to DNA-target sequence(s) and positive results were detected, collected and analyzed. DNA-fragments that range in length from hexamers to megabases can be used both as probe and target, and useful results can be obtained even with complex DNA mixtures [Hoheisel, 1993 #21]. In fact, this is a fingerprinting approach which provides many unique and desirable features for genome analysis such as expression pattern analysis and transcripts-type recognition - practically of any particular cell group. In contrast to the other methods, the DNA hybridization technique provides partial sequence information and, depending of the number and sequence of the oligomer probes used, allows an application for mapping, characterization of clones, sequence comparison and complete sequence determination. The oligonucleotide hybridization technique includes the advantage of STS mapping relies on the usage of well-defined markers. Moreover, it produces an exclusively high data output and the amount of the obtained experimental information is independent of both the genome size and content of the studied system. The probes, although some of them anonymous, are relatively evenly spaced throughout the genome, allowing a high quality of genome mapping. Finally, the assays utilizing oligomer probes are much less subject to artifacts caused by repeat sequences than those using larger DNA probes or artifacts, produced in the experimental procedure.
Our approach is based on oligonucleotide hybridization fingerprinting technique compared with computerized data analysis. By this approach, a large-number of clones were computationally sorted in clusters according to their hybridization pattern using several hundreds of oligonucleotide probes, containing a 6-10 bases-long score sequence. This method revealed several significant advantages. It allows clone clustering in an overlapping manner and construction of a linear map of oligonucleotide hybridization sites in the genome. Practically in an extended variant this should lead to both clone- and genome mapping (in case of genomic library analysis), and/or to complete transcription pattern characterization (in case of cDNA libraries analysis) non-dependent from the genome content and size. As a result in both cases the sequence catalogues of separated library clones could be prepared. The most significant advantages of the presented here technology are as follows: First, by computer analysis it is possible to analyze simultaneously both cDNA- and genomic libraries, leading (without additional efforts or costs !) to simultaneous mapping of the transcriptionally active genome regions, respective - gene mapping. Moreover this approach technically (high number of oligo-probes required) allows tracking simultaneous record for exon expression in parallel to the mRNA expression yielding significantly more valuable and precise expression data; i.e. mRNAs resulting from different splicing(s) should be clustered in different clusters even if containing common exons – a limited proof of this concept has already been examined even in the case of so limited number of fingerprinting oligo probes (<200). A particular modification of the approach presented here [Dyanov, 1994, #42], [Dyanov, 1994 #43] (Regon Molecular Systems, Inc. intellectual property rights) was later utilized for modification of Affymetrix GeneChip microarray data interpretation yielding significantly higher informational content than originally anticipated by the vendor [Scott, 2003 #45]. Data obtained from oligonucleotide hybridization with motif-specific and randomly generated oligonucleotides not only provide high-resolution DNA-mapping, but also yield gene structure information, such as the position and putative function of structural and regulatory elements (figure 7). Second, the oligonucleotide hybridization fingerprinting allows identification and screening of gene families creating family "passports" - based on the positive hybridization results with probes, complementary to any of the known regulatory sequences or translated DNA-regions corresponding to the common or specific protein motifs. By simultaneous analysis of two or more cDNA libraries this approach allows a computerized cDNA library subtraction in a manner which leads to characterization of the specific transcription pattern in any particular tissue or organ and finding a large number of new genes - analyzing clones from the small clusters (most of them representing DNAs unrecognized by GenBank similarity search. The usage of a new generation highly representative random-primed directionally cloned cDNA libraries [Dyanov, H.M., 1994, #43] would stimulate obtaining information for the poly-A-minus mRNAs. Since the computer analysis allows comparing data from several libraries simultaneously, investigators can use this approach for comparative analyses of gene expression or mutant-allele(s) expression in one or many libraries. A preparation of chips of different cDNA libraries (containing at least 1-2x105 clones) will be commercialized very soon not only to answer the needs of genome research laboratories, but for clinical diagnostics and drugs discovery laboratories also.
ACKNOWLEDGMENT
We would like to acknowledge Dr. Radomir Crkvenjakov for supervising a part of this work and for critical suggestions, Dr. Radoje Drmanac for helpful discussions and remarks, Dr. M.B. Soares for providing the infant brain cDNA-libraries and Dr. R.A. Gibbs for providing sequencing information and clones representing the human dystrophin gene used as a controls in our experiments, Z. Strezoska, M. Zeremski, T. Paunesku, D. Grujic, S. Batus, J. Meyer, A. Gemmell, K. Nadas, S. Little and H. Kreuzer for cDNA clone separation, growth and inserts' PCR-amplification, I. Labat & group for developing robot-, data storage- and data analysis programs, and Dr. J. Jarvis and Dr. A. Milosavljevic for developing data storage and -analysis programs. Work supported by the U.S. Department of Energy, Office of Health and Environment Research, under contract No. W-31-109-ENG-38.
ABBREVATIONS
mRNA - messenger RNA.
cDNA or copy-DNA - a DNA synthesized from a mRNA by reverse transcription (RNA-mediated DNA polymerization).
CE - clustering experiment
REFFERENCES
1. Adams,M.D., Kelley,J.M., Gocayne,J.D., Dubnik,M., Polymeropoulos,M.H., Xiao,H., Merril,C.R., Wu,A., Olde,B., Moreno,R.F., Kerlavage,A.R., McCombie,W.R, Venter,J.C. (1991) Complementary DNA sequencing: expressed sequence tags and Human Genome Project. Science, 252, 1651-1656.
2. Adams,M.D., Kerlavage,A.R, Fields,C., and Venter,J.C. (1993) 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nature Genetics, 4. 256-267.
3. Adams,M.D., Soares,M.B., Kerlavage,A.R., Fields,C., and Venter,J.C. (1993) Rapid cDNA sequencing (new expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nature Genetics, 4, 373-380.
4. Bains,W. and Smith,G.C. (1988) A novel method for nucleic acid sequence determination. J. Theor. Biol., 135, 303-307.
5. Bruzik,J.P. and Maniatis,T. (1992) Spliced leader RNAs from lower eukariotes are trans-spliced in mammalian cells. Nature, 360, 692-699.
6. Carrano,A.V., Lamerdin,J., Ashworth,L.K., Watkins,B., Branscomb,E., Slezak,T., Raff,M., de Jong,P.J., Keith,D., McBride,L., Meister,S., and Kronick,M. (1989) A high-resolution, fluorescence-based, semiautomated method for DNA fingerprinting. Genomics, 4, 129-136.
7. Chaudhari,N. and Hahn,W.E. (1983) Genetic expression in the developing brain. Science, 220, 924-928.
8. Church,G.M. and Kieffer-Higgins,S. (1988) Multiplex DNA Sequencing., Science, 240, 185-188.
9. Craig,A.G., Nizetic,D., Hoheisel,J.D., Zehetner,G., and Lehrach,H. (1990) Ordering of cosmid clones covering the Herpes simplex virus type I (HSV-I) genome: a test case for fingerprinting by hybridization. Nucl. Ac. Res., 18, 2653-2660.
10. Drmanac,R. and Crkvenjakov,R. (1987) Yugoslav Patent Application 570/87; US Patent Application # 723,712 (Jun. 18, 1991).
11. Drmanac, R., Labat,I., Brukner,I., and Crkvenjakov,R. (1989) Sequencing of megabase plus DNA by hybridization: Theory of the Method. Genomics, 4, 114-128.
12. Drmanac,R., Lennon,G., Drmanac, S., Labat, I., Crkvenjakov,R., and Lehrach,H. (1991) Partial sequencing by oligonucleotide hybridization: concept and applications in genome analysis. In Cantor, C.R. and Lim,H.A. (ed.), Electrophoreses, Supercomputing and The Human Genome. World Scientific, Singapoore.
13. Drmanac,R., Drmanac,S., Labat,I., Vicentic,A., Gammell,A., Stavropoulos,N., and Jarvis,J., (1992) SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes. In Lim,H.A., Fickeu,J.W., Cantor,C.R. and Robbins,R.J. (ed.) Proceedings of Second International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis, World Scientific, Singapore.
14. Drmanac,R., Drmanac,S., Stresoska,Z., Paunesku,T., Labat,I., Zeremski,M., Snoddy,J., Funkhouser,W.K., Koop,B., Hood,L. and Crkvenjakov,R. (1993) DNA sequence determination by hybridization: a strategy for efficient large-scale sequencing. Science, 260, 1649-1652.
15. Drmanac,S. and Drmanac,R. (1994) Processing of cDNA and genomic kilobase-size clones for massive screening, mapping and sequencing by hybridization. BioTechniques, 17, 328-336.
16. Drmanac,R., Drmanac,S., Labat,I., and Stavropolos,N. (1994) Requirements in screening cDNA libraries for new genes and solutions offered by SBH technology., Proceedings of the 3rd International Workshop of Transcribed Sequences. In press.
17. Dyanov,Ch.M., Korochkin,L.I., Bakaeva,T.G., and Lyozin,G.T. (1992) Evidence for two different transcription ways of genome DNA sequences containing PEBme-II protein gene in Drosophila melanogaster. Transcriptional Regulation in Cell Differentiation and Development Conference, Brighton.
18. Fields,C., Adams,M.D., White,O., and Venter,C. (1994) How many genes in the human genome? Nature Genetics, 7, 345-346.
19. Grujic,D., Strezoska,Z., and Crkvenjakov,R. (1994) High throughput PCR procedure for up to 6-kb lengths of DNA. BioTechniques, 17, 291-294.
20. Hoheisel,J.D., Maier,E., Mott,R., Mc.Carthy,L., Grigoriev,A.V., Schalkwyk,L.C., Nizetic,D., Francis,F., and Lerach,H. (1993) High resolution cosmid and P1 maps spanning the 14 Mb Genome of the fission yeast S. pombe. Cell, 73, 109-120.
21. Hoheisel,J.D. (19940 Application of hybridization techniques to genome mapping and sequencing. TIG, 10, 79-83.
22. Hooft van Huijsduijnen,R.A.M. (1992) PCR-generated probes for study of DNA-protein interactions. BioTechniques, 12, 830-832.
23. Jayaraman, K. and Puccini,A.J. (1992) A PCR-mediated gene synthesis strategy involving the assembly of oligonucleotides representing only one of the strands. BioTechniques, 12, 392-398.
24. Jones,P., Watson,A., Davies,M. and Stubbings,S. (1992) Integration of image analysis and robotics into a fully automated colony picking and plate handling system. Nucleic Acid Res., 20, 4599-4606.
25. Khudyakov,Yu.E., Gaur,L., Singh,J., Petel,P. and Fields,H.A. (1994) Primer specific solid-phase detection of PCR products. Nucl.Ac. Res., 22, 1320-1321.
26. Lehrach,H., Drmanac,R., Hoheisel,J., Larin,Z., Lennon,G., Monaco,A.P., Nizatic,D., Zehethner,G., and Polustka,A. (1990) Hybridization fingerprinting in genome Mapping and Sequencing. In Genome Analysis Volume 1: Genetic and Physical Mapping., pp. 39-81.
27. Lennon,G.G. and Lehrach,H. (1991) Hybridization analyses of arrayed cDNA libraries. TIG, 7, 314-317.
28. Meier-Ewert,S., Maier,E., Ahmadi,A., Curtis,J., and Lehrach,H. (1993) An automated approach to generating expressed sequence catalogues. Nature, 361, 375-376.
29. Melmer,G. and Buchwald,M. (1992) Identification of genes using oligonucleotides corresponding to splice site consensus sequence. Hum. Mol. Genet., 1, 433-438.
30. Milner,R.J. and Sutcliffe,J.G. (1983) Gene expression in rat brain. Nucleic Acid Res., 11, 5497-5520.
31. Milosavljevich, A. (1993) Discovering sequence similarity by the algorithmic significance method. In Hunter,L., Searls,D., and Shavlik,J. (ed.), Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, California.
32. Mirzabekov,A.D. (1994) DNA sequencing by hybridization - a megasequencing method and a diagnostic tool?. TIBTECH, 12, 27-32.
33. Nisson,M., Malmgren,H., Samiotaki,M., Kwiatkowski,M., Chrowdhary,B.P., Landegren,U. (1994) Padlock probes: circularizing oligonucleotides for localized DNA detection. Science, 265, 2085-2088.
34. Pizzuti,A., Pieretti,M., Fenwick,R.G., Gibbs,R.A., and Caskey,C.T. (1992) A transposon-like element in the deletion-prone region of the dystrophin gene., Genomics, 13, 594-600.
35. Saiki,R.K., Bugawan,T.L., Horn,G.T., Mullis,K.B.,and Erlich,H.A. (1986). Analysis of enzymatically amplified -globin and HLA-DQ DNA with allele specific oligonucleotide probes. Nature, 324, 163-166.
36. Southern,E. (1988) United Kingdom Patent Application GB 88/0400; International patent application PCT GB 89/00460.
37. Southern E.M., Markos,U., and Elder,J.K. (1992) Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: Evaluation using experimental models. Genomics, 13, 1008-1017.
38. Southern,E.M., Case-Green,S.C., Elder,J.K., Johnson,M., Mir,K.U., Wang,L., and Williams,J.C. (1994) Arrays of complementary oligonucleotides for analyzing the hybridization behavior of nucleic acids. Nucleic Acid Res., 22, 1368-1373.
39. Urdea,M.S., Running,J.A., Horn,T., Clyne,J., Ku,L., and Warner,B.D. (1987) A novel method for the rapid detection of specific nucleotide sequences in crude biological samples without blotting or radioactivity; application to the analysis of hepatitis B virus in human serum. Gene, 61, 253-264.
40. Zhang,Y., Coyne,M.Y., Wil,S.G., Levenson,C.H., and Kawasaki,E.S. (1991) Single-base mutational analysis of cancer and genetic diseases using membrane bound modified oligonucleotides. Nucl. Ac. Res., 19, 3929-3933.
41. Dyanov, H.M., D. Salbego and R. Crkvenjakov (1994). A strategy for arrayed cDNA-library characterization by oligonucleotide hybridization Impact of nucleic Acid-Based Technology: Revolution in Clinical Diagnosis, Applications and Research., Amsterdam, The Netherlands.
42. Dyanov, H. M. DNA-library characterization and clone coding by oligonucleotide-based clone finger-printing, (1994), Argonne Natl. Lab., ANL-IN-94-109, invention report; US patent pending.
43. Dyanov, H.M., A method for efficient construction of highly representative cDNA libraries. (1994), Argonne Natl. Lab., ANL-IN-94-074, invention report; US patent pending.
44. Dyanov H.M. SBH: A new experimental design with a ten-fold increase of sequencing data collection (2004) Regon Scientific Journal, 105 [Array Technology]: 1; www.regonjournal.com.
45. Scott, R., Wright S.J., Kurtz, S.A., Clark, T., Dyanov, H., Quigg, R. Method for determining biological expression levels by linear programming; patent filed July 18, 2002; Ref: 220391US.
---------------------------------------------------------------------------------------------------------------------------
© 2004-2005 REGON Molecular System, Inc. All Rights Reserved.