WO2001009310A1 - 5'ENRICHED cDNA LIBRARIES AND A YEAST SIGNAL TRAP - Google Patents

5'ENRICHED cDNA LIBRARIES AND A YEAST SIGNAL TRAP Download PDF

Info

Publication number
WO2001009310A1
WO2001009310A1 PCT/US2000/020541 US0020541W WO0109310A1 WO 2001009310 A1 WO2001009310 A1 WO 2001009310A1 US 0020541 W US0020541 W US 0020541W WO 0109310 A1 WO0109310 A1 WO 0109310A1
Authority
WO
WIPO (PCT)
Prior art keywords
pcr
cdna
cdnas
primer
restriction enzyme
Prior art date
Application number
PCT/US2000/020541
Other languages
French (fr)
Inventor
Peter J. Kretschmer
May M. Luke
Pamela Toy Van Heuit
Yifan Xu
Original Assignee
Berlex Laboratories, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Berlex Laboratories, Inc. filed Critical Berlex Laboratories, Inc.
Priority to AU63863/00A priority Critical patent/AU6386300A/en
Publication of WO2001009310A1 publication Critical patent/WO2001009310A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence

Definitions

  • This invention relates, e.g., to methods for generating cDNA libraries which are enriched for 5' sequences.
  • the 5' enhanced libraries can be used, e.g., to select sequences that encode signal peptides.
  • TdT-Mediated RACE Rapid Amplification of cDNA Ends
  • cDNA is synthesized from an RNA sample using either an oligonucleotide primer complementary to a known sequence in a gene of interest, or oligo- dT, respectively.
  • the 3' end of the first strand cDNA is then modified by the addition of a homopolymer tail (generally about 10 to about 25 nucleotides in length), using terminal deoxynucleotidyl transferase (TdT); and the cDNA second strand is then synthesized using a primer comprising a cloning site adaptor and a 3 '-homopolymer tail which is complementary to the appended cDNA homopolymer.
  • TdT terminal deoxynucleotidyl transferase
  • a primer comprising a cloning site adaptor and a 3 '-homopolymer tail which is complementary to the appended cDNA homopolymer.
  • These double stranded cDNAs are amplified using PCR (polymerse chain reaction), employing, as a first, 5', primer, the cloning adaptor (anchor)/homopolymer tail primer which was used to make the second strand
  • the second (3' end) PCR primer is an oligonucleotide complementary to a known sequence internal to the cDNA primer.
  • the second primer is oligo-dT (see, e.g., Frohman et al (1988). PNAS 85, 8998-9002; Loh et al (1989). Science
  • RNA oligonucleotide containing an anchor sequence is ligated to the 5' ends of mRNAs in a population, using RNA ligase; the oligonucleotide sequence is incorporated by reverse transcriptase into a first cDNA strand; and a DNA oligonucleotide which comprises sequences of the RNA oligonucleotide or a portion thereof is used as a PCR primer for formation of a second cDNA strand and for subsequent PCR amplification thereof (see, e.g., Maruyama et al.
  • RLM-RACE has been used in conjunction with first strand priming with ohgonucleotides complementary to known sequences (in attempts to generate cDNA clones of known genes of interest) and with oligo-dT primers (in attempts to generate full-length cDNA libraries).
  • a DNA anchor oligonucleotide is ligated to the 3' ends of first strand cDNAs in a population, using T4 RNA ligase.
  • the anchor-ligated cDNA is then used directly in a PCR reaction (see, e.g. Troutt et al. (1992), PNAS 89,
  • This invention relates, e.g., to methods for generating cDNA libraries which are enriched for 5' sequences.
  • the 5' enhanced libraries can be used, e.g., to select sequences that encode signal peptides.
  • the invention encompasses the use of novel oligonucleotide primers (which comprise both defined sequences that can be used as PCR primers and random oligonucleotide sequences that can be used to prime reverse transcription), a method to preferentially select cDNAs which have copied substantially completely to the 5' ends of their mRNA templates, a single step procedure to both generate second strand cDNA and amplify it by PCR, novel separation steps which enhance the unidirectionality of the libraries, and/or a novel yeast vector for use in selecting signal peptides.
  • oligonucleotide primers used in the method preferably comprise, 5' to 3', 1 ) a defined sequence which can be used as a PCR primer and, optionally, which contains at least a portion of one or more restriction enzyme site(s), and 2) a random oligonucleotide sequence.
  • Such oligonucleotide primers are used to prime first strand cDNAs from an RNA sample of interest. Of the cDNAs so generated, those which contain the 5' ends of genes are preferentially selected. As used herein, the terms 5'-enhanced-, 5'-enriched- and
  • 5'-specific-cDNAs or cDNA fragments refer to cDNA populations which disproportionately represent the 5' termini and nearby sequences of expressed mRNAs.
  • nearby sequences is meant, e.g., sequences which lie within about 100 or more nucleotides of the 5' terminus of the mRNA template.
  • Nearby sequences can encompass, e.g., sequences which are at the 5' terminus, or which lie about 10, 20, 30, 40, 50, 60, 70, 80,
  • the 5'-termini plus nearby sequences are sometimes referred to as 5'-sequences, 5'-portions, 5'-regions, 5'-ends, or 5'- ends of expressed genes.
  • the 5' enhanced cDNAs are optionally amplified by PCR (Polymerase Chain Reaction) and are directionally cloned into a vector of choice, to form a cDNA library. See Example 1 for a schematic representation of part of the method.
  • the procedure does not depend upon ligation of linkers, adaptors, or ohgonucleotides, yet provides great flexibility in allowing for directional cloning into a variety of vectors.
  • intermediates generated during the formation of a library are substantially separated from residual, unbound, oligonucleotide primers, thereby enhancing the unidirectional character of the library.
  • the cloning vector is a novel yeast vector which can be used to select for cDNA clones that encode signal sequences (signal peptides, leader sequences or peptides, secretory leader sequences).
  • Advantages of the method include its simplicity, flexibility, efficiency and reproducibility, and the ability to generate cDNA libraries which are complex, representative, highly enriched for the 5' ends of genes and highly unidirectional.
  • Low amounts of RNA can be used (e.g., as little as 25 ng of poly A + RNA) thereby allowing for in-depth screens for low abundance RNAs.
  • cDNAs are primed internally from mRNA templates, the invention allows one readily to generate cDNA fragments of an optimal size for cloning into vectors.
  • the numerous applications of the method is the generation of EST
  • ESTs expressed Sequence Tag databases which complement existing EST data bases derived from oligo-dT primed cDNA libraries, many of which are deficient in 5' specific sequences.
  • Libraries of 5' enhanced cDNAs (ESTs) can be used, e.g., to identify and/or quantify levels of expressed genes, to detect changes in the pattern of mRNA expression in a cell or tissue associated with a physiological or pathological change, to identify potential gene targets for drugs, or to screen for the action of a drug or to detect its side effects.
  • 5' enhanced cDNAs are cloned directionally into a novel yeast vector which allows for the selection of inserted "leader” sequences that permit secretion of a "leaderless” selectable marker derived from yeast invertase.
  • a cloning region comprising at least two different restriction enzyme sites, preferably rare ones, which allow for directional cloning of cDNA inserts, and three stop codons located upstream of the cloning region, in different reading frames, which prevent undesired translation from upstream sequences.
  • rare restriction enzyme sites is meant herein those restriction enzyme recognition sites that occur rarely in DNA, e.g., recognition sequences of 8 or more bases. The use of rare restriction sites allows one to clone long DNA sequences, e.g. , intact genes.
  • One embodiment of the invention is a method to generate first strand cDNAs from an mRNA sample of interest, comprising, a) hybridizing said mRNA with first primer ohgonucleotides, each comprising, 5' to 3',
  • the separated first strand cDNAs described above which are effective for unidirectional cloning are substantially free of unbound first primer.
  • Another embodiment of the invention is a method of generating first strand cDNAs as above, wherein said reverse transcriptase adds terminal C's to the 3' ends of first strand cDNAs, said method further comprising, a) hybridizing separated first strand cDNAs containing 3' terminal C's to a second PCR primer oligonucleotide which comprises, 5' to 3',
  • the preceding method further comprises separating (substantially all) molecules of said second primer oligonucleotide from said extended first strand cDNAs, under conditions effective to produce extended first strand cDNAs which are effective for unidirectional cloning.
  • first strand cDNAs described above are substantially free of second primer molecules.
  • first strand cDNAs generated from an mRNA sample of interest wherein the nucleotide sequence of the 5' end of each cDNA molecule is, 5' to 3',
  • Another embodiment is a method for constructing a directional EST library enriched for the 5' ends of expressed genes, comprising, a) generating double stranded cDNA molecules which are enriched for the 5' ends of expressed genes (mRNA), wherein each of said double stranded cDNAs is bounded by two different restriction enzyme sites, A and B, and wherein the B sites lie (substantially) at the ends of cDNA sense strands corresponding to the 5' end of said expressed genes, and the A sites lie (substantially) at the opposite ends of the sense cDNA strands, and b) ligating said double stranded cDNA molecules directionally into a vector.
  • mRNA double stranded cDNA molecules which are enriched for the 5' ends of expressed genes
  • Sense strand cDNA corresponds to the strand of double strand template DNA which is expressed (transcribed) into mRNA. Therefore, the B restriction site lies at or close to the portion of cDNA which corresponds to the 5' end of the transcribed mRNA, which can encode the N-terminus of a protein.
  • Another embodiment is a method for constructing a directional cDNA library enriched for cDNAs coding for signal peptides, comprising, a) generating double stranded cDNA molecules which are enriched for the 5' ends of expressed genes (mRNA), wherein each of said double stranded cDNAs is bounded by two different restriction enzyme sites, A and B, and wherein the B sites lie (substantially) at the ends of cDNA sense strands corresponding to the 5' end of said expressed genes, and the A sites lie (substantially) at the opposite ends of the sense cDNA strands, b) ligating said double stranded cDNA molecules into a yeast vector, wherein said vector comprises, 5' to 3',
  • yeast cloning vector which comprises, 5' to 3', 1) stop codons in each of three reading frames, just upstream of 2) a cDNA insertion region wherein a site for restriction enzyme B lies 5' of one for restriction enzyme A, and
  • RNA samples from which cDNA libraries of the invention can be made include, e.g., biological samples derived from a human or other animal source, such as cells, tissues or organs, encompassing, among many others, blood cells, primary cells taken from developing embryos or mature animals, biopsy samples, histology tissue samples, or cell lines.
  • Other possible sources include bacterial, fungal, viral or parasite preparations (including cells, tissues or organs infected by such entities) and plants.
  • the first step in the method of the invention is isolation or provision of an mRNA population.
  • mRNA can be obtained from any source. Methods of extraction of RNA are well-known in the art and are described, for example, in J. Sambrook et al, Molecular Cloning: A Laboratory Manual (2nd ed.). Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989, Vols. 1-3, especially Vol. 1, Ch. 7, "Extraction, Purification, and Analysis of Messenger RNA from Eukaryotic Cells.” Other isolation and extraction methods are also well-known. Typically, isolation is performed in the presence of chaotropic agents such as guanidinium chloride or guanidinium thiocyanate, although other detergents and extraction agents can alternatively be used.
  • chaotropic agents such as guanidinium chloride or guanidinium thiocyanate
  • the mRNA is isolated from the total extracted RNA by chromatography over oligo-dT-cellulose or other chromatographic media that have the capacity to bind the polyadenylated 3'-portion of mRNA molecules.
  • total RNA can be used.
  • “priming” or “to prime” refers to the apposition (or pairing) of an oligonucleotide or nucleic acid to a template nucleic acid, whereby said apposition enables a transcriptase to polymerize nucleotides into a nucleic acid which is complementary to the template nucleic acid.
  • transcriptase refers to any enzyme which can copy a nucleic acid sequence into its complement, e.g., reverse transcriptase, DNA-dependent DNA polymerase, etc.
  • primer oligonucleotide or “oligonucleotide primer” refers to any oligonucleotide, containing any number of nucleotides, which can prime the synthesis by a transcriptase of a complementary nucleic acid from a nucleic acid template.
  • a primer possesses a free 3' OH group which upon apposition to the nucleic acid template is recessed relative to the 5' end of the template and thus is capable of acting as a site of initiation of the synthesis or polymerization of a nucleic acid polymer, the sequence of which is complementary to the template strand, in the presence of nucleotides and a transcriptase and at a suitable temperature and pH.
  • a primer oligonucleotide can be used, e.g., to prime the synthesis of first strand cDNA, using a reverse transcriptase, to prime the synthesis of second strand cDNA from first strand cDNA template, using a DNA-dependent DNA polymerase, or to amplify DNA by PCR, using a thermostable DNA polymerase.
  • Oligonucleotide primers can be any type of oligonucleotide, e.g., ribonucleotide, deoxyribonucleotide, PNA, or chimeras or mixtures thereof.
  • the primer is deoxynucleotide.
  • Primer sequences can include one or more of the bases A, T, U, C, or G, or non-naturally occurring nucleosides such as, e.g., inosine.
  • the universal nucleotide phosphoramidite (Glen Research) and the universal base phosphoramidite (Clontech) can introduce modified bases into primers that can pair equally well with all four natural bases.
  • Primers may be derivatized with chemical groups to optimize their performance or to facilitate characterization of extension or amplification products.
  • primers can be substituted with biotin, using known synthetic techniques.
  • the nucleic acid backbone can comprise one or more known linkages, such as, e.g., phosphodiester bonds, or sulfamate, sulfamide, phosphorothionate, methylphosphonate, or carbamate linkages.
  • both the probe and the template are in solution.
  • methods of the invention may be performed wherein the primers are attached to a solid phase such that attachment does not interfere with their ability to prime nucleic acid synthesis.
  • the advantage of this embodiment is that all the products are covalently bound to a solid phase support, thus simplifying their isolation, characterization and/or cloning.
  • a first primer oligonucleotide is used to generate first strand cDNA.
  • This first primer oligonucleotide comprises two moieties,
  • PCR-A PCR primer sequence
  • RE-A restriction enzyme region
  • PCR primer sequence encompasses a sequence which is capable of serving as a primer in a PCR reaction but which is not necessarily used in such a reaction.
  • restriction enzyme site refers to a sequence of nucleotides which is recognized by a restriction endonuclease. It can form a part of the cleavage site or it can be adjacent to the cleavage site as it is for type IIS enzymes (e.g., Alw I or Fok I).
  • restriction site end refers to the portion of the molecule produced by restriction enzyme cleavage. Restriction site ends can be blunt or can have 5' or 3' overhangs.
  • PCR-A primer sequence moiety of a first oligonucleotide primer is of sufficient length and composition to be able to prime synthesis during subsequent PCR amplification, while maintaining a low level of undesired background synthesis.
  • oligonucleotide primers for PCR amplification and the design factors relevant for such a use, are discussed below.
  • a PCR-A primer can comprise a restriction enzyme region, RE-A, to facilitate insertion of cDNA generated with that PCR-A primer into a cloning vector.
  • An RE-A region can be positioned internal to, or 5' or 3' of, non-restriction site-containing sequences of a PCR-A primer.
  • An RE-A region can contain one or more restriction enzyme sites. If an RE-A is located in the 5' portion of a PCR-A p ⁇ mer, it can also contain, located at the 5' most end of the RE-A, a partial rest ⁇ ction enzyme site (portion of a rest ⁇ ction site) Such a partial rest ⁇ ction site can be completed to a full rest ⁇ ction site in a subsequent step(s) (e g , du ⁇ ng PCR amplification, or by ligation to an adaptor) by appending a contiguous sequence bea ⁇ ng the remaining nucleotides required for rest ⁇ ction enzyme recognition
  • an RE-A can contain 6 bases of an 8 base recognition site, and the remaining 2 bases can be added in a subsequent reaction
  • 4 bases of a recognition site can be completed with 4 more bases, 5 bases with 3 bases, and so on
  • a PCR-A p ⁇ mer, (or any of the oligonucleotide p ⁇ mers desc ⁇ bed herein) can contain any number of complete
  • the "random sequence" moiety of a first oligonucleotide p ⁇ mer is a sequence which is not designed to be directed to a specific sequence in the nucleic acid sample to be copied
  • p ⁇ mers of random sequence is meant that the position of apposition of the p ⁇ mers to the nucleic acid template are substantially indeterminate with respect to the nucleic acid sequence of the template under the reaction conditions used in the methods of the invention
  • Methods for estimating the frequency at which an oligonucleotide will appear in a nucleic acid polymer are desc ⁇ bed, e g , m Volmia et al (1989), Comp App Bwsci 5, 33-40 It is recognized that the sequences of random p ⁇ mers may not be random to the extent that physical and chemical efficiencies of the synthetic procedure will allow
  • random primers designed to pnme synthesis from defined sources may be less than random in order to compensate for favored arrangements of bases, e g , percentage GC content in certain organisms, etc
  • Ohgonucleotides having defined or arbitrary sequences can be considered "random" if their use causes the locations of their apposition to the template to be indeterminate All these examples of p ⁇ mer types are defined to be random so long as the positions along the template nucleic acid strand at which the p ⁇ med extensions occur are largely indeterminate -l ilt is not necessary that apposition of the random primer to the template be at the site of a sequence identical to that of the primer.
  • a primer which apposes to the template with some mismatch is within the scope of the invention if the mismatched primer- template structure can still serve as a site from which to enzymatically synthesize extension products of the primer which are complementary to the template.
  • One of ordinary skill in the art without undue experimentation, will be able to design many reaction conditions, both stringent (allowing only a perfect complementary sequence match between the primer and the template) and nonstringent (allowing some mismatch in the primer-template pairing) within the scope of the methods of the invention (see, e.g., Nucleic Acid Hybridization, a Practical Approach, B.D. Hames and S.J. Higgins, eds., IRL
  • Random primers can be generated by a variety of methods, e.g. , by random digestion or physical disruption of natural sources (for example, by digestion with endonucleases or by sonication or shearing), or, preferably, by synthesis with a commercial oligonucleotide synthesizer. Primers can be synthesized with an oligonucleotide synthesizer using conventional methods, and are available from a number of commercial sources.
  • the random primer sequences within a first primer oligonucleotide can be of any length effective to prime synthesis from a nucleic acid template, e.g., 4-mer, 5-mer, 6-mer, and up to about 50 bases or more.
  • the optimal length of the primers will depend upon many factors, including base composition, chemical composition of the sugar moieties, nature of the backbone, and hybridization conditions.
  • the random primer has a length of between about 5 and about 40 bases.
  • the random primer has a length of between about 6 and 20 bases, preferably 6, 7, 8, 9, 10 or 11 bases, more preferably 7, 8 or 9 bases.
  • a mixture of random primers comprising 7, 8 and 9 random bases is used to prime cDNA synthesis. Examples 2 and 3 describe conditions in which priming by random primers gives rise to substantially random priming of first strand cDNAs.
  • Conditions of hybridization of a first oligonucleotide primer to mRNA can be readily optimized by a skilled worker, using well known methods in the art. See, e.g., Sambrook et al. (1989), Molecular Cloning: A Laboratory Manual (2d ed.), Vols. 1-3, Cold Spring Harbor Press, New York; Hames et ⁇ /. (1985), Nucleic Acid Hybridization, IL Press; Davis et al.
  • first oligonucleotide primers By choosing specific ratios of the first oligonucleotide primers to mRNA templates, a skilled worker can design reaction mixtures that will generate first strand cDNAs of approximately any desired length. Useful sizes of first strand cDNAs range from about 200 nucleotides (which is long enough to encompass the 5' UTR (untranslated region) of a gene of interest, and short enough to ensure cDNA production from short genes represented in the mRNA population) to about 1 ,000 nucleotides (a length which would not be expected to comprise stop codons of most genes).
  • Example 2 shows that, under conditions of the invention, ratios ranging from about 7: 1 to about 25: 1 can result in cDNA lengths of about 200 to 1,000 nucleotides, whereas ratios of 62:1 give rise to first strand cDNAs which are shorter. Under the conditions tested, ranges of between about 5:1 and about 60:1 yield a cDNA product which can be used in the methods of the invention.
  • Another factor in choosing the ratio of first primer oligonucleotide to mRNA template is that too large a ratio of primer to template can result in undesirable priming of second strand synthesis, using the first strand cDNA as template.
  • Such second strand priming creates second strand cDNA which contains, at its 5' end, PCR-A sequences; if such cDNA molecules are inserted into a directional cloning vector designed to insert first strand cDNAS with PCR-A sequences at their 5' ends, the degree of directionality in a resulting library will be compromised.
  • Example 3 That too high a ratio can lead to undesirable second strand priming is illustrated in Example 3, which indicates that, under the reaction conditions employed in the preceding experiment, ratios of 7:1 or 25:1 result in substantially unidirectional cDNAs, whereas a ratio of 62:1 does not.
  • first primer oligonucleotide Following hybridization of first primer oligonucleotide to mRNA, the first primer is extended in the presence of deoxynucleotide substrates by a transcriptase.
  • a transcriptase Many types can be used in such a reaction, e.g., enzyme from Moloney mouse leukemia virus (MoMuLV), avian myoloblastosis virus, etc.
  • Superscript II a modified reverse transcriptase from MoMuLV (Life Technologies) is used.
  • reverse transcriptases contain a terminal transferase activity which adds uncoded-for C's to the 3' ends of reverse transcripts that have been copied completely (or substantially completely) to the 5' end of their mRNA templates.
  • substantially completely is meant herein that the transcriptase has, e.g., copied to (reached) within about 100 or more nucleotides of the 5' end of an mRNA template.
  • the reverse transcriptase can copy to the 5' terminus of the mRNA template, or to a position which lies, e.g., about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160 or more bases from that 5' terminus.
  • the reverse transcriptase can copy to a position which lies substantially downstream of the 5' terminus.
  • the reverse transcriptase need copy only for enough so that the cDNA encompasses the AUG region, including e.g. , sequences required for efficient and/or specific translation.
  • Such a cDNA can extend, for example, about 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 bases 5' of the AUG.
  • the complex of template and first strand cDNA is substantially separated (e.g., purified) from unbound first oligonucleotide primer (i.e., first oligonucleotide primer that has not been incorporated at the 5' end of first strand cDNA).
  • first oligonucleotide primer i.e., first oligonucleotide primer that has not been incorporated at the 5' end of first strand cDNA.
  • a variety of methods are available for performing such a separation including, e.g., dialysis or microdialysis (see, e.g., Bauer et al. (1993), Nucleic Acids Research 21, 4272), separating columns, and preparative gel electrophoresis (e.g. , on agarose or polyacrylamide gels).
  • separating column is meant any column which can separate single or double strand cDNAs from unincorporated ohgonucleotides or from short ohgonucleotides (e.g., an oligonucleotide whose synthesis has been primed by an oligonucleotide primer but which has been prematurely terminated and released from its template).
  • separating columns also known as exclusionary columns or sizing columns
  • gravity- based or spin columns made of Sephadex or Sepharose e.g., G-50, G-75, chromaspinTM columns, or Pharmacia Microspin S400 HR columns, or QIAGEN (or other silica gel membrane based) columns.
  • first strand cDNA is separated from unbound molecules of first primer oligonucleotide by a two step procedure: binding, washing and eluating bound material on a QIAGENTM column, followed by preparative electrophoresis on a 4% agarose gel.
  • First strand cDNAs which are substantially free of unbound first oligonucleotide primer are cDNAs which, when subjected to subsequent steps of this invention, give rise to a substantially unidirectional population of cDNA clones.
  • substantially unidirectional is meant that at least about 80%, e.g., about 80, 82, 85, 87, 90 or 92%, and preferably at least about 95%, e.g., about 95, 97 or 99%, of the cDNAs are oriented in a single orientation (direction).
  • Example 4 demonstrates a two-step purification procedure of the invention.
  • TdT-mediated RACE takes advantage of the fact that completely extended cDNA is preferentially (at least to some extent) acted upon by deoxynucleotidyl terminal transferase;
  • RLM-RACE requires ligation of defined RNA ohgonucleotides to the 5' ends of mRNA templates, and only cDNA which has incorporated those oligonucleotide sequences into its 3' end is amplified in subsequent PCR reactions; and LA-PCR relies on the ligation of defined ohgonucleotides with T4 RNA ligase to the 3' end of first strand cDNA.
  • a second PCR primer oligonucleotide can be added to the 3' ends of first strand cDNAs by any of these end-specific methods, and can be used in the generation of
  • a first strand cDNA which has substantially reached the 5' end of its mRNA template and comprises 3' terminal, uncoded-for, C's is further extended at its 3' end by 1) hybridization to a second PCR primer oligonucleotide which comprises PCR-B sequences and, at its 3' terminus, a sequence of about 2 to 10 G's, preferably 3 G's, and which serves as a short extended template for reverse transcription, and 2) incubation with a transcriptase.
  • the reverse transcriptase extends the 3' end of the cDNA by appending to it the reverse complement of the sequence of PCR- B.
  • the cDNA product obtained by such a procedure is termed "extended first strand cDNA.”
  • Optimization of the reaction conditions e.g., hybridization conditions in which the G's of the second primer oligonucleotide (e.g., GGG) can hybridize to 3' terminal C's of first strand cDNA
  • GGG the second primer oligonucleotide
  • annealing and reverse transcriptase steps at a temperature which is slightly lower than that for most reverse transcriptase reactions, e.g., 37° C instead of 42° C.
  • a PCR-B sequence is of sufficient length and base composition to be able to prime synthesis during subsequent PCR amplification, and it optionally can comprise a restriction enzyme region, RE-B, which has the same characteristics as RE-A, described above. Factors pertaining to the design of a PCR-B sequence, including the choice of restriction enzyme sites in an (optional) RE-B region within it, are discussed below.
  • a non restriction enzyme site-containing portion of a PCR-B primer is contiguous and 5' or 3' to an RE-B region.
  • the second PCR primer oligonucleotide is 5'-
  • the transcriptase used to extend first strand cDNAs can be the same as or different from that used to generate the first portion of the cDNA. In a preferred embodiment, the same type of reverse transcriptase is used in both steps.
  • Example 6 shows that in extended first strand cDNA prepared by the methods of the invention, the second oligonucleotide primer interacts predominately with the 3' ends of first strand cDNAs that have been copied substantially to the 5' ends of their mRNA templates.
  • extended first strand cDNA molecules are substantially flanked at the ends corresponding to the 5 ' ends of expressed genes by PCR-B sequences, and at the opposite ends by PCR-A sequences. This asymmetry allows the cDNAs to be cloned in any desired orientation in a vector, and thereby allows a skilled worker to generate a unidirectional library.
  • the complex of template and extended first strand cDNA (or, optionally, extended first strand cDNA which has been separated from the template, e.g., by treatment with RNAse H) is substantially separated (e.g., purified) from second oligonucleotide primer.
  • Any of the methods described above for separation of first strand cDNA can be used for the separation of extended first strand cDNA.
  • a combination of a QIAGENTM column and a 4% agarose gel is employed, as described herein.
  • first strand cDNAs which are substantially free of second primer oligonucleotide, when subjected to the subsequent steps of this invention, give rise to a substantially unidirectional population of cDNA clones. That is, a substantial number of the cDNA fragments are flanked by different, distinguishable restriction sites at each end (substantially all of the 5' ends have one restriction site and substantially all of the 3' ends have a different restriction site), enabling directional insertion.
  • substantially is meant at least about 80%, e.g., about 80, 82, 85, 87, 90 or 92%, and preferably at least about 95%>, e.g., about 95, 97 or 99%.
  • Extended first strand cDNA can be converted to double stranded cDNA by conventional procedures in the art.
  • second strand cDNA can be generated by hybridizing extended first strand cDNA to a primer oligonucleotide which comprises PCR-B sequences, followed by elongation of the primer oligonucleotide with a DNA- dependent DNA polymerase.
  • double stranded cDNA can optionally be amplified by PCR.
  • Double stranded cDNA whether it is PCR amplified or not, can be cloned into a vector of choice by any of a variety of art-recognized procedures.
  • one or both of the DNA ends can be "polished" (e.g., by treatment with T4 DNA polymerase) and inserted into a vector by blunt-end ligation; linkers or linker adaptors can be added to the ends of a cDNA fragment to provide a desired restriction site end (or different restriction site ends at each end of the cDNA fragment) which is compatible with a cloning restriction site(s) in the vector; or double strand cDNA which contains an RE restriction region at either or both ends can be digested with appropriate restriction enzyme(s) and inserted into compatible restriction site(s) in a cloning vector.
  • restriction enzyme sites can be added to one or both ends of a double stranded cDNA during PCR amplification, and the cDNA can then be inserted into a cloning vector via those restriction enzyme ends.
  • a second cDNA strand is both generated and amplified by PCR in a single reaction. After separated, extended first strand cDNA is generated, it is converted to second strand cDNA and is amplified in the same reaction mixture by PCR.
  • the 5' primer in the reaction (corresponding to the 5' end of the mRNA), herein designated as a "third PCR primer oligonucleotide,” comprises PCR-B sequences; and the 3' primer, herein designated as a "fourth PCR primer oligonucleotide,” comprises PCR-A sequences.
  • each of the third and fourth primer ohgonucleotides also comprises all or a portion of at least one restriction enzyme site.
  • restriction enzyme sites provide for great flexibility, allowing a skilled worker to flank cDNAs with a wide variety of restriction enzyme sites, compatible with any desired cloning vector.
  • the third PCR primer oligonucleotide comprises, 5' to 3', an Asc I restriction site and PCR-B sequences; and the fourth PCR primer oligonucleotide comprises, 5' to 3', an Fse I restriction site and PCR-A sequences.
  • additional spacer sequences can be added to the 5' terminus of the primer. Amplified, double stranded cDNAs can be cloned into vectors by any of the methods described above.
  • amplified cDNAs flanked by RE-A and RE-B sites are cleaved, simultaneously or sequentially, by enzymes which cut within restriction sites in the RE sequences, and are then cloned into compatible restriction sites in a cloning vector.
  • enzymes which cut within restriction sites in the RE sequences and are then cloned into compatible restriction sites in a cloning vector.
  • the RE cutting sites are positioned between PCR-A and/or PCR-B and cDNA sequences, one can excise substantially all information extraneous to the cDNA by cleaving the fragments with appropriate restriction enzymes before cloning.
  • amplified cDNAs flanked by restriction sites that have been introduced during PCR amplification are cleaved at those sites, simultaneously or sequentially, by appropriate restriction enzymes and are cloned into compatible site(s) in a cloning vector.
  • oligonucleotide primers are governed by factors relevant both to the ability of the fragments to prime PCR amplification and to the nature of the (optional) restriction enzyme sites included therein.
  • oligonucleotide primers 5' and 3' primers, e.g., the first and second oligonucleotide primers, or the third and fourth oligonucleotide primers, as described herein
  • the restriction site ends at each end of a double strand cDNA fragment which is to be cloned are different, in order to facilitate directional cloning.
  • one of these restriction site ends has a 5' overhang and the other a 3' overhang; the two enzymes prefer the same reaction buffers and temperature; and extensive double digestion is possible.
  • the fragment is bounded by restriction sites of 8 or more nucleotides, and in a most preferred embodiment, the sites are Asc I and Fse I.
  • a skilled worker can design the remaining sequences (e.g., non-restriction enzyme-containing PCR-A sequences and random sequences of a "first oligonucleotide primer” as used herein; non-restriction enzyme-containing PCR-B sequences of a "second oligonucleotide primer” as used herein; or non-restriction enzyme-containing sequences of a third or fourth oligonucleotide primer as used herein), taking into account a number of variables.
  • primers can be designed such that undesirable codons or longer sequences can be excised, e.g., by one or more restriction endonucleases, before or after the fragments are cloned into a cloning vector.
  • sequences which could unfavorably bias a subsequent screening reaction include, e.g., codons for hydrophobic amino acids that could falsely mimic signal sequences.
  • a further factor in designing primer ohgonucleotides is that a pair of primers to be used for PCR amplification should contain approximately equal AT and GC contents, and be as close to the GC-AT ratio of the template as possible. Therefore, when, for example, a PCR-A or PCR-B element comprises a GC-rich sequence, such as a restriction enzyme site for Asc I or Fse I, other sequences in the primer can be adjusted so as to be relatively richer in AT content.
  • Other considerations, including ways to reduce undesirable background during PCR amplification, are well-known to one of skill in the art and are discussed, e.g., in "Selection of primers for polymerase chain reaction" in Methods in Molecular Biology, Vol. 15, B.A.
  • PCR reaction conditions can be optimized by one of skill in the art, using routine, art-recognized conditions (see, e.g., Newton, C.R. and Graham, A., eds. Polymerase Cham Reaction, BIOS Scientific Publishers Limited, 1997, especially Chapter 5; Also, Innis,
  • thermophyllic polymerases including enzymes with 3' — > 5' proofreading exonuclease activity, are commercially available. Thermocycling parameters, including denaturing and annealing conditions, the temperature and length of incubation, and the number of cycles, can readily be optimized by the skilled worker.
  • Example 7 illustrates the formation of second strand cDNA and its amplification by PCR.
  • Example 8 and Fig. 4 show that the double strand cDNAs made by the methods of the invention retain the unidirectional character observed in the preceding steps.
  • amplified double strand cDNA fragments flanked by two different restriction sites are inserted unidirectionally into the insertion region of a cloning vector, i.e., into a vector which comprises two different, separable, restriction enzyme insertion sites.
  • Cloning vectors which have been cleaved so as to form two different, non-compatible, restriction site ends are less likely to self ligate (without insertion of a fragment) than are vectors cleaved by a single restriction enzyme, particularly if one of the restriction site ends has a 5' overhang (e.g., Ascl) and the other has a 3' overhang (e.g., Fsel).
  • Vectors can be designed to grow in a variety of hosts, including various bacteria, yeast, insect cells or higher eukaryotic cells, or to shuttle between two different host organisms. A wide variety of markers are available for the selection of those vectors which have inserted foreign DNA. For some applications, the inserted sequences need not be expressed. For other applications, it is desirable that the inserted sequences are expressed. Appropriate vectors of both types are plentiful and well- known to those skilled in the art. Examples include, e.g., pBR322, pUC vectors, PET vectors, etc.
  • cDNA-containing fragments are inserted into a cloning vector which is designed to select for (to trap) sequences that encode functional signal peptides which direct secretion of polypeptides to which they are attached.
  • a cloning vector which is designed to select for (to trap) sequences that encode functional signal peptides which direct secretion of polypeptides to which they are attached.
  • yeast vector described in U.S. Pat. No. 5,536,637, in which a cloning region (with a single restriction enzyme site) is positioned upstream of a non-secreted yeast invertase gene.
  • cDNA fragments that contain functional signal peptides secretory leader sequences
  • they allow the "non- secreted" yeast gene to be secreted following transformation of the other recombinant vector into a yeast cell.
  • Yeast which secrete invertase can then be selected on medium containing sucrose or raffinose, allowing one to identify and isolate cDNAs that encode functional signal peptides.
  • selectable markers can be used, including known secretory genes, e.g., IL-2 receptor ⁇ (see, e.g., U.S. Pat.
  • a vector of the invention contains one or two of the following properties: 1) The cloning region contains at least two restriction enzyme sites, thereby allowing directional cloning of inserts. As noted above, the two restriction sites are preferably 8-base or longer restriction enzyme recognition sites and, most preferably, are Asc I and Fse I. 2) Upstream of the cloning region are three stop codons, in different reading frames.
  • Example 9 describes the construction of a cloning vector which exhibits a high degree of efficiency for selecting signal peptides, and Example 10 describes the generation of a directional cDNA library, using the vector.
  • RNA libraries generated by the methods of the invention are complex and representative.
  • Example 11 demonstrates by colony hybridization analysis that such libraries are representative, and
  • Example 12 demonstrates by PCR analysis that they are complex.
  • Example 13 shows that at all the steps of the generation of cDNAs by methods of the invention, the cDNAs exhibit unidirectional characteristics.
  • Fig. 1 shows a schematic representation of the method of cDNA synthesis.
  • Fig 2A schematically illustrates PCR primer pairs which can be used to distinguish
  • Fig. 2B shows a gel which illustrates the degree of "correct" first strand PSA- specific cDNA synthesis under several conditions.
  • Fig. 3A schematically illustrates PCR primer pairs which can be used to distinguish “correct” from “incorrect” extension of first strand PSA-specific cDNA.
  • Fig. 3B shows a gel which illustrates "correct” extension of first strand PSA- specific cDNA.
  • Fig. 4 shows a gel which evaluates the unidirectional character of PCR-amplified cDNA.
  • Fig.5 is a Table summarizing colony hybridization of 12 cDNA libraries to probes for three differently expressed genes.
  • First strand cDNAs are generated from poly A + mRNA templates by priming with a reverse transcription mixture comprising three first oligonucleotide primers, each of which comprises a common PCR-A sequence.
  • the three primers contain stretches of either 7, 8 or 9 random nucleotides.
  • the primers are made by conventional means in an oligonucleotide synthesizer and/or are obtained from a commercial source.
  • First strand cDNA which reaches the 5' terminus of its mRNA template is tailed with C's by the reverse transcriptase.
  • the first strand cDNAs preferably in the form of cDNA/mRNA complexes, as shown here, are separated from e.g., unincorporated first primer ohgonucleotides by column chromatography and gel purification.
  • a second oligonucleotide primer which comprises 5' to 3' a defined PCR-B sequence followed by GGG, is hybridized to a separated first strand cDNA/mRNA complex, possibly by virtue of hybridization of the GGG sequence of the primer to the C-tails of the cDNA, and the hybridized complex is incubated with reverse transcriptase. The reverse complement of PCR-B is thereby appended to the C-tailed cDNAs.
  • the extended cDNA/mRNA complexes are separated from, inter alia, unincorporated second oligonucleotide primers by column chromatography and gel purification.
  • PCR amplification and addition of restriction enzyme sites Separated extended first strand cDNA/mRNA complexes are subjected to 15-22 cycles of PCR amplification, using as primers: a) a 5 ' primer (third primer oligonucleotide) which comprises, 5' to 3 ', an Asc I restriction site and PCR-B sequences, and b) a 3' primer
  • fourth primer oligonucleotide which comprises, 5' to 3', an Fse I restriction site and PCR-A sequences.
  • the amplified double strand cDNA is separated from unincorporated third and fourth primer ohgonucleotides by column chromatography.
  • the double strand cDNA is digested with the restriction enzymes Asc I and Fse I, and the digested, double strand cDNA is separated and size-selected by gel electrophoresis.
  • First strand cDNA is synthesized using reagents from a SMARTTM PCR cDNA Library Construction Kit (Clontech). Poly A + mRNA from normal human prostate is primed with an equimolar mixture of three first primer ohgonucleotides, SEQ ID Nos: 1 ,
  • Each first primer oligonucleotide also contains, 5' to the random sequences, a common PCR-A sequence which has, at its 5 ' end, a partial Fse I site.
  • RNA 0.5 ⁇ g of poly A" mRNA is mixed with primers in a total volume of 5 ⁇ l.
  • the mixture is heated at 72 °C for two minutes and placed on ice for two minutes.
  • First strand cDNA synthesis buffer (lx) 50 mm Tris, pH 8.3; 6 mm MgCl 2 ; 75 mm KC1
  • dNTPs deoxynucleoside triphosphates
  • Both the ratio of primers to mRNA template and the length of time during which reverse transcriptase is allowed to elongate cDNA can affect the size of cDNAs produced during the reaction. For example, if one assumes that the length of the mRNAs is about 1.5 kb, and the desirable average distance from the 5 ' end to put down a primer is 0.6 kb, and every random primer in the first strand reaction can anneal and prime a cDNA, then one would use primers at a 2.5 fold molar excess to the mRNAs. If not all primers participate in priming, a larger ratio would be needed.
  • primer/mRNA ratios of 7: 1 , 25: 1 and 62: 1, and elongation times of 20, 40 and 60 minutes are examined.
  • Nine samples are prepared, using each possible combination of ratios and elongation times. Samples 1-3 are incubated with 7:1 ratios, samples 4-6 with 25:1, and samples 7-9 with
  • Samples 1, 4, and 7 are incubated for twenty minutes, samples 2, 5 and 8 for forty minutes and samples 3, 6 and 9 for sixty minutes.
  • the radioactively labeled cDNAs are analyzed by electrophoresis on an alkaline agarose denaturing gel and autoradiography, following art - recognized procedures (See, e.g. , Sambrook et al, ibid., Chapter 6.20). For each sample, the cDNAs appear as a smear of about 0.2 to about 7 kb with no apparent banding. This indicates that priming with the random primers is, indeed, random.
  • the molecular weight range of the smears increases for samples which are incubated for longer times, and decreases as the ratio of primers/mRNA goes up, as expected from the theoretical considerations noted above.
  • first oligonucleotide primers as discussed in Example 1, prime first strand cDNA synthesis from mRNA, but do not also prime subsequent cDNA synthesis using first strand cDNA as a template.
  • the cDNAs of Example 2 are analyzed by PCR, with particular attention to cDNAs corresponding to the highly expressed prostate-specific gene, prostate specific antigen (PSA).
  • PSA prostate specific antigen
  • Fig. 2A shows schematically that PCR amplification with primers PSA-up (SEQ. ID No: 4; a unique sequence within PSA) and PCR-A' (SEQ. ID No: 25; PCR-A and an Fsel restriction enzyme site) would be expected to detect and amplify first strand PSA cDNAs of the "correct” or desirable orientation.
  • amplification with PCR-A' and PSA-down SEQ. ID No: 5
  • Fig. 2B top panel, depicts PCR products generated by PSA-up and PCR-A'.
  • Lanes 1-3 contain PCR products generated from first strand cDNAs made with 7-fold ratios of primer to mRNA, lanes 4-6, 25-fold, and lanes 7-9, 62-fold.
  • the incubation time is twenty minutes; for lanes 2, 5 and 8, 40 minutes; and for lanes 3, 6 and 9, 60 minutes.
  • Samples in all subsequent Examples herein are loaded onto gels in this order.
  • significant amounts of PSA-specific first strand cDNA are generated under all reaction conditions.
  • the heterogeneous sizes of the bands confirms the finding in Example 1 that the cDNA synthesis results from random priming.
  • Fig.2B bottom panel, depicts PCR products generated by PCR-A' and PSA-down.
  • sequence analysis of a number of PN44 inserts indicates that they have different 3 ' ends, and sequences of random clones show no pattern of preferred priming, further confirming that cDNA priming with the methods of the invention is random.
  • the first strand cDNA/mRNA complexes from Example 2 are substantially separated (e.g., purified) from unincorporated first primer ohgonucleotides, using a
  • QI AquickTM PCR purification kit i. e. , chromatography on a QIAGENTM column, followed by preparative electrophoresis on a 4% agarose gel (FMC).
  • Nucleic acid larger than a double strand 200 bp marker is isolated and extracted from the gel with a QIAquickTM Gel Extraction Kit (QIAGENTM). This process encompasses at least one cycle of binding the extracted nucleic acid to the QIAquickTM and eluting it from the column. If first strand cDNA is not separated by such a stringent, two-step, procedure, residual first primer oligonucleotide interferes with subsequent steps in the genesis of double stranded cDNA,
  • first oligonucleotide primer is incorporated into cDNAs at positions other than the 5' end of first strand cDNA
  • a second oligonucleotide primer which comprises, 5' to 3'; a defined PCR-B sequence followed by GGG is hybridized to first strand cDNA (preferably in the form of a cDNA/mRNA complex) which has substantially reached the 5' end of its mRNA template and has been modified by the addition of terminal C's by the terminal transferase activity of the reverse transcriptase.
  • the second oligonucleotide primer is a "SMARTTM Oligo" obtained from Clontech.
  • any primer with this sequence SEQ. ID No: 26
  • any derivative or modification thereof comprising, e.g., naturally or non-naturally occurring bases or sugars, can be used.
  • Second oligonucleotide primer (2 ⁇ m), first strand cDNA synthesis buffer (lx), dNTPs (1 mM), DTT (2 mM) and Superscript II Reverse Transcriptase (200 U) are added to 5 ⁇ l of each of the separated, size selected, first strand cDNAs from Example 4 (0.25 ⁇ g) to a final volume of 10 ⁇ l. The mixture is incubated at 37°C for 1.5 hours.
  • the extended first strand cDNAs are substantially separated (e.g., purified) from residual second oligonucleotide primer and other oligomers of less than 200 bp with a QIAquickTM PCR Purification column followed by preparative electrophoresis on a 4% agarose gel. Nucleic acids of 0.2 to 12 kb are excised and extracted from the gel.
  • Example 6 ANALYSIS OF EXTENDED FIRST STRAND cDNA BY PCR It is desirable that the second oligonucleotide primer, as discussed in Example 5, serves as a template for extension synthesis predominantly at the 3 ' end of first strand cDNAs, which have reached the 5' ends of their mRNA templates. To show that this is, indeed, achieved under the conditions of the invention, the cDNAs of Example 5 are analyzed by PCR. (See Figure 3A and 3B)
  • Fig. 3A shows schematically that PCR amplification with primers PCR-B' (SEQ. ID No: 24; PCR-B and an Ascl restriction enzyme site) and PSA-down would be expected to detect and amplify "correct” or "sense" strand extended first strand PSA cDNA in which the reverse complement of PCR-B has been appended to the 3' end of cDNA that has substantially reached the 5' end of its mRNA template.
  • primers PCR-B' SEQ. ID No: 24; PCR-B and an Ascl restriction enzyme site
  • Fig. 3B left panel, depicts PCR products generated with PCR-B' and PSA-down.
  • cDNA samples used in lanes 1 -6 are as described in Example 2 and are loaded onto the gel in the order described in Example 3.
  • a single band of about 220 bp is observed for all of the samples. This band is of the length expected for the extended first strand PSA cDNA of the desirable orientation.
  • Similar experiments with a PN44-specific primer confirm that PN44 extended first strand cDNA also appends the reverse complement of PCR-B to its 3 ' end, and that the extension also is appended substantially only to cDNAs which are copied completely to the 5 'end of the mRNA template.
  • Amplification products with PSA-up and PCR-B' are shown in the right panel of
  • Figure 3B Only very short products can be detected, similar to those in the negative control lanes, 7 and 8, indicating that under all conditions examined, the reverse complement of PCR-B is appended predominantly at the 3' ends of first strand cDNA, as desired, rather than appended at the 3' ends of second strand cDNA.
  • Double stranded cDNA is made by PCR amplification of the six separated extended first strand cDNAs (samples 1-6) discussed in Examples 2 and 5.
  • Duplicate samples of the template are mixed with dNTPs (200 ⁇ M), lx PCR reaction buffer (40 mM Tricine-KOH (pH 9.2); 15 mM KOAc; 3.5 mM Mg (OAc) 2 ; 75 ⁇ g/ml Bovine serum albumin), 0.2 ⁇ M 5' end PCR primer (SEQ.
  • third oligonucleotide primer which comprises, 5' to 3', an Asc I site and PCR-B sequences), 0.2 ⁇ M 3' end primer (SEQ ID No: 25; fourth oligonucleotide primer, which comprises, 5' to 3', an Fse I site and of PCR-A), dd H 2 O and Advantage® cDNA Polymerase Mix (lx) (0.8 mM Tris-HCl (pH 7.5); 1% glycerol; 1.0 mM KC1; 0.5 mM (NH 4 ) 2 SO 4 ; 2.0 ⁇ M EDTA; 0.1 mM ⁇ - mercaptoethanol; 1.1 ⁇ g/ml TAQSTART antibody; 1-5 units of Klen-Taq-1 DNA polymerase to a total volume of 100 ⁇ l in 0.2 ml PCR tubes.
  • the stock solutions and conditions for performing PCR are provided by an Advantage® cDNA PCR Kit (#K 1905-
  • the optimum number of PCR cycles is the lowest number which generates a sufficient quantity of cDNA inserts to make a large library.
  • samples amplified for eight cycles do not generate enough material for gel purification; samples amplified for fifteen cycles generate enough cDNA to generate a large library (over one million).
  • the optimal number of cycles can be routinely determined by one of skill in the art.
  • Control reactions show that the amplification reaction is dependent upon the presence of both primers and the template.
  • Amplified cDNA is separated by using a QIAquickTM PCR Purification Kit (QIAGEN).
  • Example 2 corresponds to the cDNA samples in Example 2.
  • Fig. 4A shows that "sense" PCR products are generated from all six samples when the PSA-down and PCR-B' primer pair is used.
  • the right panel shows that, by contrast, no products large than primer-dimers are generated after amplification with the primer pair, PCR-B' and PSA-up. Similar results are obtained using PN44- specific primers.
  • pEHB3 The parental plasmid for this construction, pEHB3, was obtained from Chris A. Kaiser, Department of Biology, M.I.T. It was constructed in Kaiser's laboratory in the following manner: The 4.5kb EcoRI fragment of pRB58 (Carlson and Botstein, (1982), Cell 28, 145-154) was cloned into the yeast vector pRS316 (Sikorski and Hieter, (1989), Genetics 122, 19-27) to form a precursor of pEHB3. This precursor was converted to pEHB3 by site-directed mutagenesis (Sambrook, et al, Molecular Cloning: A Laboratory Manual (2nd ed.). Cold Spring Harbor, N.Y., 1989, Vols. 1-3, especially Vol. 2, Ch.
  • pEHB3 is a yeast shuttle vector which comprises an E. coli origin of replication and an Amp gene for growth and selection in bacteria; CEN 4/ARS1 and URA3 sequences for growth and selection in S. cerevisiae; and, inserted into a polylinker site, a 4.5 kb EcoRI fragment of yeast genomic DNA that contains the invertase gene, SUC2.
  • SUC2 invertase signal sequence of pEHB3 was deleted and replaced by SEQ ID No.: 27 using standed mutation and recombinant methods.
  • This sequence comprises, 5' to 3', stop codons in three different reading frames, an Ascl site, and a Fsel site.
  • the resulting signal trap vector was called pSST22.
  • Double strand cDNAs, flanked at the 5' and 3' ends by Asc I and Fse I sites, respectively, as described in Example 8 are digested by Asc I and Fse I using the manufacturer's recommended conditions, extracted with phenol/CHCl 3 , and subjected to preparative electrophoresis on a 4% agarose gel. Fragments in the size range of 0.2 to 1 kb are excised and extracted as described above, and are ligated to the cloning vector described in Example 9, which has been prepared for ligation by cleavage at its Asc I and Fse I cloning sites, followed by dephosphorylation using alkaline phosphatase (e.g., Cat.
  • alkaline phosphatase e.g., Cat.
  • the ligation mixture is transformed into XL10 gold competent cells following the manufacturer's instructions (Stratagene). Transformants are pooled from ampicillin (100 g/ml) plates, and DNA "minipreps" or "megapreps" are prepared. Plasmids taken from E. coli can, if desired, be transfected into yeast, so that clones containing leader sequences or transmembrane regions can be isolated. Selection methods are described, e.g., in U.S. Patent No. 5,536,637. Modifications of previously disclosed selection methods can, of course, be used. For example, instead of step g) of the method disclosed in U. S. Patent No.
  • 5,536,637 purifying DNA from the yeast cells
  • ligated DNA can be transfected directly into yeast cells.
  • clones which can grow on the selective medium are cloned and verified by restreaking on the selection medium; and the inserted cDNA that confers the ability to grow on the selective medium is directly amplified from the yeast colony by PCR before it is analyzed further.
  • Double strand cDNA is made as described in the preceding examples, and samples 1-6 of both 15-cycle and 22-cycle amplifications are cloned into the PSST22 vector via Asc I and Fse I insertion sites to produce a total of 12 libraries.
  • the fact that each of these libraries is representative is shown by colony hybridization to probes for genes of known expression levels (See Figure 5). Routine, art- recognized colony hybridization methods (see, e.g., Sambrook et al, ibid.) are used. Three genes whose expression levels in normal prostate have been estimated, are examined: PN44 (highly abundant), UBA (medium abundance) and Cl (low abundance). Using the PCR primer pairs, SEQ. ID Nos: 6 and 7 for PN44, SEQ. ID Nos: 8 and 9 for UBA, and
  • PCR amplification is performed, using the PCR primer sets specific for each of the ten genes as described above. All but Hevin are detectable in the amplified cDNA inserts (Hevin is not detected in all experiments tried). By this measurement, the amplified cDNAs are complex enough to contain the 5' end of six out of six genes expressed at
  • E. coli clones from three libraries constructed using the above method have been analyzed by randomly picking colonies and sequencing their inserts.
  • each sequence that is the same or significantly homologous to a known and transcribed sequence (including untranslated RNAs such as rRNAs and Alu-like RNAs) is examined to see if the sequence is in the sense orientation, and if so, at the 5' end.
  • the other two sequences contain 18S rRNA, a template likely to be sheared or high in secondary structures.
  • a second library from prostate tumor mRNA has 9 out of 9 in the sense orientation, and 3 out of 9 contain the 5' ends; the 6 remaining include one 18S rRNA and three very long genes, so that shearing of the template could be a factor.
  • a third library also from prostate tumor mRNA has the worst statistics with 7 out of 9 in the sense orientation, and 2 out of those 7 containing the 5' ends. Again, two rRNAs are among those that are not at the 5 ' end. The difference in quality between these libraries may be due to the fact that primer to mRNA molar ratio could not be ascertained for the latter two due to the scarcity of the prostate tumor mRNA.

Abstract

A method is disclosed for generating cDNA libraries which are enhanced for 5' sequences. First strand cDNA synthesis is initiated with oligonucleotide primers which comprise, 5' to 3', 1) a defined sequence which can be used as a PCR primer and, optionally, which contains at least a portion or one or more restriction enzyme site(s), and 2) a random oligonucleotide sequence. CDNAs which have copied their mRNA templates substantially to the 5' ends are preferentially selected, optionally amplified by PCR, and cloned directionally into a vector. A novel cloning vector is described which allows for the selection of cDNAs that encode signal sequences.

Description

5' ENRICHED cDNA LIBRARIES AND A YEAST SIGNAL TRAP
This application claims the benefit of the filing date of U.S. Provisional Application Serial No. 60/145,974 filed July 29, 1999.
Field of Invention This invention relates, e.g., to methods for generating cDNA libraries which are enriched for 5' sequences. The 5' enhanced libraries can be used, e.g., to select sequences that encode signal peptides.
Background of the Invention
Currently, most methods for generating cDNA libraries require that reverse transcriptase mediates first strand cDNA synthesis from mRNA templates by priming with oligo-dT(oligomer of thymidylic acid), which is complementary to the 3' poly A tail of mRNAs. Because of the difficulty of achieving full-length synthesis of this first strand, due, e.g., to poor processivity of reverse transcriptases, (a particular problem for copying very long genes), or to secondary structure of the template which impedes the progress of the transcriptase, the 5' ends of genes tend to be under-represented in such libraries.
Several methods have been described for generating 5' end-enhanced cDNAs for specific genes of interest or in cDNA libraries (for reviews, see, e.g., Schaeffer, B., "RNA Ligase-Mediated RACE: An Effective Method for the Cloning of Full-Length cDNA Ends", in Gene Cloning and Analysis: Current Innovations, 1997, Horizon Scientific Press, Wymondham, U. K. and Innis, Michael A, et al, eds. PCR Protocols: a guide to methods and applications, 1990, Academic Press, San Diego, CA).
TdT-Mediated RACE (Rapid Amplification of cDNA Ends) has been used to generate cDNA clones of known genes of interest, or in attempts to generate full length cDNA libraries. cDNA is synthesized from an RNA sample using either an oligonucleotide primer complementary to a known sequence in a gene of interest, or oligo- dT, respectively. The 3' end of the first strand cDNA is then modified by the addition of a homopolymer tail (generally about 10 to about 25 nucleotides in length), using terminal deoxynucleotidyl transferase (TdT); and the cDNA second strand is then synthesized using a primer comprising a cloning site adaptor and a 3 '-homopolymer tail which is complementary to the appended cDNA homopolymer. These double stranded cDNAs are amplified using PCR (polymerse chain reaction), employing, as a first, 5', primer, the cloning adaptor (anchor)/homopolymer tail primer which was used to make the second strand cDNA. In attempts to generate a cDNA clone of a known gene, the second (3' end) PCR primer is an oligonucleotide complementary to a known sequence internal to the cDNA primer. In attempts to generate a full length cDNA library, the second primer is oligo-dT (see, e.g., Frohman et al (1988). PNAS 85, 8998-9002; Loh et al (1989). Science
243, 217-220).
In a method sometimes known as RLM-RACE (RNA ligase-mediated RACE), an RNA oligonucleotide containing an anchor sequence is ligated to the 5' ends of mRNAs in a population, using RNA ligase; the oligonucleotide sequence is incorporated by reverse transcriptase into a first cDNA strand; and a DNA oligonucleotide which comprises sequences of the RNA oligonucleotide or a portion thereof is used as a PCR primer for formation of a second cDNA strand and for subsequent PCR amplification thereof (see, e.g., Maruyama et al. (1994), Gene 138, 171-174; Kawasaki et al. (1987), Proc. Natl. Acad. Sci. USA 85, 5698-5702; Veres et al. (1987), Science 237, 415-417; Belyavsky et al. U.S. Pat. No. 5,814,445; Schaeffer, B. (1997), "RNA Ligase-Mediated RACE: An
Effective Method for the Cloning of Full-Length cDNA Ends", in Gene Cloning and Analysis: Current Innovations, Horizon Scientific Press, Wymondham, U.K., pp. 101-103; and Fromont-Racine et al. (1993), Nucleic Acids Res. 21, 1683-1684). RLM-RACE has been used in conjunction with first strand priming with ohgonucleotides complementary to known sequences (in attempts to generate cDNA clones of known genes of interest) and with oligo-dT primers (in attempts to generate full-length cDNA libraries).
In a method sometimes known as LA-PCR (ligation-anchored PCR) or SLIC (single strand ligation to ss-cDNA ends), a DNA anchor oligonucleotide is ligated to the 3' ends of first strand cDNAs in a population, using T4 RNA ligase. The anchor-ligated cDNA is then used directly in a PCR reaction (see, e.g. Troutt et al. (1992), PNAS 89,
9823-9825; Edwards et al. (1991), NAR 19, 5227-5232; and Apte et al. (1993), BioTechniques 15, 890-893). Currently available methods to generate cDNAs and/or cDNA libraries which are enhanced for 5' sequences rely on complex procedures, often including one or more ligation step(s) which are difficult to carry out in a controlled and efficient manner. Methods which require priming from poly A tails generally produce cDNAs which are deficient in sequences corresponding to the 5' ends of the mRNA templates and cannot be used with RNAs that lack poly A tails (e.g., histone mRNAs) or with RNAs that have not been selected on oligo- dT columns. There remains a need for an alternative method to produce 5' enhanced cDNA libraries which, e.g., eliminates the need for ligation steps and allows for priming from internal template sequences.
Detailed Description of the Invention
This invention relates, e.g., to methods for generating cDNA libraries which are enriched for 5' sequences. The 5' enhanced libraries can be used, e.g., to select sequences that encode signal peptides. The invention encompasses the use of novel oligonucleotide primers (which comprise both defined sequences that can be used as PCR primers and random oligonucleotide sequences that can be used to prime reverse transcription), a method to preferentially select cDNAs which have copied substantially completely to the 5' ends of their mRNA templates, a single step procedure to both generate second strand cDNA and amplify it by PCR, novel separation steps which enhance the unidirectionality of the libraries, and/or a novel yeast vector for use in selecting signal peptides. Some oligonucleotide primers used in the method preferably comprise, 5' to 3', 1 ) a defined sequence which can be used as a PCR primer and, optionally, which contains at least a portion of one or more restriction enzyme site(s), and 2) a random oligonucleotide sequence. Such oligonucleotide primers are used to prime first strand cDNAs from an RNA sample of interest. Of the cDNAs so generated, those which contain the 5' ends of genes are preferentially selected. As used herein, the terms 5'-enhanced-, 5'-enriched- and
5'-specific-cDNAs or cDNA fragments refer to cDNA populations which disproportionately represent the 5' termini and nearby sequences of expressed mRNAs. By nearby sequences is meant, e.g., sequences which lie within about 100 or more nucleotides of the 5' terminus of the mRNA template. Nearby sequences can encompass, e.g., sequences which are at the 5' terminus, or which lie about 10, 20, 30, 40, 50, 60, 70, 80,
90, 100, 120, 140, or at least 160 bases from the 5' terminus. The 5'-termini plus nearby sequences are sometimes referred to as 5'-sequences, 5'-portions, 5'-regions, 5'-ends, or 5'- ends of expressed genes. The 5' enhanced cDNAs are optionally amplified by PCR (Polymerase Chain Reaction) and are directionally cloned into a vector of choice, to form a cDNA library. See Example 1 for a schematic representation of part of the method. In a most preferred embodiment, the procedure does not depend upon ligation of linkers, adaptors, or ohgonucleotides, yet provides great flexibility in allowing for directional cloning into a variety of vectors. Notably, intermediates generated during the formation of a library are substantially separated from residual, unbound, oligonucleotide primers, thereby enhancing the unidirectional character of the library. In one embodiment, the cloning vector is a novel yeast vector which can be used to select for cDNA clones that encode signal sequences (signal peptides, leader sequences or peptides, secretory leader sequences).
Advantages of the method include its simplicity, flexibility, efficiency and reproducibility, and the ability to generate cDNA libraries which are complex, representative, highly enriched for the 5' ends of genes and highly unidirectional. Low amounts of RNA can be used (e.g., as little as 25 ng of poly A+ RNA) thereby allowing for in-depth screens for low abundance RNAs. Because cDNAs are primed internally from mRNA templates, the invention allows one readily to generate cDNA fragments of an optimal size for cloning into vectors. Among the numerous applications of the method is the generation of EST
(Expressed Sequence Tag) databases which complement existing EST data bases derived from oligo-dT primed cDNA libraries, many of which are deficient in 5' specific sequences. Libraries of 5' enhanced cDNAs (ESTs) can be used, e.g., to identify and/or quantify levels of expressed genes, to detect changes in the pattern of mRNA expression in a cell or tissue associated with a physiological or pathological change, to identify potential gene targets for drugs, or to screen for the action of a drug or to detect its side effects. Other uses include, e.g., assembling full-length genomic or cDNA clones from libraries that are deficient in 5' end sequences, and isolation of sequences of interest that reside in the 5' portions of genes, such as signals that regulate gene expression or protein processing (e.g. , leader peptidase cleavage sites, Kozak sequences, or promoter sequences) or sequences which code for signal peptides or pro-peptides. In one embodiment of the invention, 5' enhanced cDNAs are cloned directionally into a novel yeast vector which allows for the selection of inserted "leader" sequences that permit secretion of a "leaderless" selectable marker derived from yeast invertase. Among the features of the vector are a cloning region (cloning site, insertion site or region) comprising at least two different restriction enzyme sites, preferably rare ones, which allow for directional cloning of cDNA inserts, and three stop codons located upstream of the cloning region, in different reading frames, which prevent undesired translation from upstream sequences. By "rare restriction enzyme sites" is meant herein those restriction enzyme recognition sites that occur rarely in DNA, e.g., recognition sequences of 8 or more bases. The use of rare restriction sites allows one to clone long DNA sequences, e.g. , intact genes.
One embodiment of the invention is a method to generate first strand cDNAs from an mRNA sample of interest, comprising, a) hybridizing said mRNA with first primer ohgonucleotides, each comprising, 5' to 3',
1) a PCR primer sequence, PCR- A, and
2) one of a set of random sequences having a length of 5 to 40 nucleotides, which are effective for binding to mRNA and priming the synthesis of a cDNA copy, b) synthesizing first strand cDNAs by extending said random sequences hybridized to said mRNA with a reverse transcriptase, and c) separating (substantially all) unbound molecules of said first primer from first strand cDNAs, under conditions effective to produce first strand cDNAs which are effective for unidirectional cloning.
In another embodiment, the separated first strand cDNAs described above which are effective for unidirectional cloning are substantially free of unbound first primer.
Another embodiment of the invention is a method of generating first strand cDNAs as above, wherein said reverse transcriptase adds terminal C's to the 3' ends of first strand cDNAs, said method further comprising, a) hybridizing separated first strand cDNAs containing 3' terminal C's to a second PCR primer oligonucleotide which comprises, 5' to 3',
1) a second PCR primer sequence, PCR-B, and
2) the sequence GGG, wherein said GGG is effective to hybridize to said 3' terminal C's, and b) incubating with reverse transcriptase, thereby generating extended first strand cDNAs which comprise at their 3' ends CCC followed by the reverse complement of PCR-B. Another embodiment is the preceding method, wherein the reverse transcriptase adds terminal C's to the 3' ends of first strand cDNAs which have substantially reached the 5' ends of their mRNA templates (i.e., the cDNAs comprise a substantial portion of the reverse complement of the 5' portion of the mRNA templates).
In another embodiment, the preceding method further comprises separating (substantially all) molecules of said second primer oligonucleotide from said extended first strand cDNAs, under conditions effective to produce extended first strand cDNAs which are effective for unidirectional cloning.
In another embodiment, the separated extended first strand cDNAs described above are substantially free of second primer molecules. Another embodiment is first strand cDNAs generated from an mRNA sample of interest, wherein the nucleotide sequence of the 5' end of each cDNA molecule is, 5' to 3',
1) a PCR primer sequence, PCR- A, and
2) one of a set of two or more random sequences having a length of about 5 to about 40 nucleotides, which are effective for binding to mRNA and priming the synthesis of a cDNA copy, wherein said first strand cDNAs are substantially free of unbound molecules of primer oligonucleotide and are effective for unidirectional cloning.
Another embodiment is a method for constructing a directional EST library enriched for the 5' ends of expressed genes, comprising, a) generating double stranded cDNA molecules which are enriched for the 5' ends of expressed genes (mRNA), wherein each of said double stranded cDNAs is bounded by two different restriction enzyme sites, A and B, and wherein the B sites lie (substantially) at the ends of cDNA sense strands corresponding to the 5' end of said expressed genes, and the A sites lie (substantially) at the opposite ends of the sense cDNA strands, and b) ligating said double stranded cDNA molecules directionally into a vector.
"Sense strand" cDNA corresponds to the strand of double strand template DNA which is expressed (transcribed) into mRNA. Therefore, the B restriction site lies at or close to the portion of cDNA which corresponds to the 5' end of the transcribed mRNA, which can encode the N-terminus of a protein.
Another embodiment is a method for constructing a directional cDNA library enriched for cDNAs coding for signal peptides, comprising, a) generating double stranded cDNA molecules which are enriched for the 5' ends of expressed genes (mRNA), wherein each of said double stranded cDNAs is bounded by two different restriction enzyme sites, A and B, and wherein the B sites lie (substantially) at the ends of cDNA sense strands corresponding to the 5' end of said expressed genes, and the A sites lie (substantially) at the opposite ends of the sense cDNA strands, b) ligating said double stranded cDNA molecules into a yeast vector, wherein said vector comprises, 5' to 3',
1) stop codons in each of three reading frames, just upstream of
2) a cDNA insertion region wherein a site for restriction enzyme B lies 5' of one for restriction enzyme A, and 3) a sequence encoding a non-secreted yeast invertase, and c) transfecting said vectors comprising ligated DNA into yeast and selecting for clones which secrete yeast invertase.
Another embodiment is a yeast cloning vector which comprises, 5' to 3', 1) stop codons in each of three reading frames, just upstream of 2) a cDNA insertion region wherein a site for restriction enzyme B lies 5' of one for restriction enzyme A, and
3) a sequence encoding a non-secreted yeast invertase.
Sources for RNA samples from which cDNA libraries of the invention can be made include, e.g., biological samples derived from a human or other animal source, such as cells, tissues or organs, encompassing, among many others, blood cells, primary cells taken from developing embryos or mature animals, biopsy samples, histology tissue samples, or cell lines. Other possible sources include bacterial, fungal, viral or parasite preparations (including cells, tissues or organs infected by such entities) and plants.
The first step in the method of the invention is isolation or provision of an mRNA population. mRNA can be obtained from any source. Methods of extraction of RNA are well-known in the art and are described, for example, in J. Sambrook et al, Molecular Cloning: A Laboratory Manual (2nd ed.). Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989, Vols. 1-3, especially Vol. 1, Ch. 7, "Extraction, Purification, and Analysis of Messenger RNA from Eukaryotic Cells." Other isolation and extraction methods are also well-known. Typically, isolation is performed in the presence of chaotropic agents such as guanidinium chloride or guanidinium thiocyanate, although other detergents and extraction agents can alternatively be used.
Typically, the mRNA is isolated from the total extracted RNA by chromatography over oligo-dT-cellulose or other chromatographic media that have the capacity to bind the polyadenylated 3'-portion of mRNA molecules. Alternatively, total RNA can be used. However, it is generally preferred to isolate poly A+ RNA. As used herein, "priming" or "to prime" refers to the apposition (or pairing) of an oligonucleotide or nucleic acid to a template nucleic acid, whereby said apposition enables a transcriptase to polymerize nucleotides into a nucleic acid which is complementary to the template nucleic acid. As used herein, the term "transcriptase" refers to any enzyme which can copy a nucleic acid sequence into its complement, e.g., reverse transcriptase, DNA-dependent DNA polymerase, etc. As used herein, the term "primer oligonucleotide" or "oligonucleotide primer" refers to any oligonucleotide, containing any number of nucleotides, which can prime the synthesis by a transcriptase of a complementary nucleic acid from a nucleic acid template. In accord with the invention, a primer possesses a free 3' OH group which upon apposition to the nucleic acid template is recessed relative to the 5' end of the template and thus is capable of acting as a site of initiation of the synthesis or polymerization of a nucleic acid polymer, the sequence of which is complementary to the template strand, in the presence of nucleotides and a transcriptase and at a suitable temperature and pH. A primer oligonucleotide can be used, e.g., to prime the synthesis of first strand cDNA, using a reverse transcriptase, to prime the synthesis of second strand cDNA from first strand cDNA template, using a DNA-dependent DNA polymerase, or to amplify DNA by PCR, using a thermostable DNA polymerase.
Oligonucleotide primers can be any type of oligonucleotide, e.g., ribonucleotide, deoxyribonucleotide, PNA, or chimeras or mixtures thereof. Preferably, the primer is deoxynucleotide. Primer sequences can include one or more of the bases A, T, U, C, or G, or non-naturally occurring nucleosides such as, e.g., inosine. The universal nucleotide phosphoramidite (Glen Research) and the universal base phosphoramidite (Clontech) can introduce modified bases into primers that can pair equally well with all four natural bases. Primers may be derivatized with chemical groups to optimize their performance or to facilitate characterization of extension or amplification products. For example, primers can be substituted with biotin, using known synthetic techniques. The nucleic acid backbone can comprise one or more known linkages, such as, e.g., phosphodiester bonds, or sulfamate, sulfamide, phosphorothionate, methylphosphonate, or carbamate linkages.
In a preferred embodiment, both the probe and the template are in solution. In another embodiment, methods of the invention may be performed wherein the primers are attached to a solid phase such that attachment does not interfere with their ability to prime nucleic acid synthesis. The advantage of this embodiment is that all the products are covalently bound to a solid phase support, thus simplifying their isolation, characterization and/or cloning.
In a preferred embodiment of the invention, a first primer oligonucleotide is used to generate first strand cDNA. This first primer oligonucleotide comprises two moieties,
1) a PCR primer sequence (PCR-A), which optionally comprises a restriction enzyme region (RE-A) that comprises at least a portion of one or more restriction enzyme sites, and
2) a random sequence.
As used herein, the term "PCR primer sequence" encompasses a sequence which is capable of serving as a primer in a PCR reaction but which is not necessarily used in such a reaction. The term "restriction enzyme site" ("restriction site") refers to a sequence of nucleotides which is recognized by a restriction endonuclease. It can form a part of the cleavage site or it can be adjacent to the cleavage site as it is for type IIS enzymes (e.g., Alw I or Fok I). The term "restriction site end" refers to the portion of the molecule produced by restriction enzyme cleavage. Restriction site ends can be blunt or can have 5' or 3' overhangs. The PCR-A primer sequence moiety of a first oligonucleotide primer is of sufficient length and composition to be able to prime synthesis during subsequent PCR amplification, while maintaining a low level of undesired background synthesis. The use of oligonucleotide primers for PCR amplification, and the design factors relevant for such a use, are discussed below. Optionally, a PCR-A primer can comprise a restriction enzyme region, RE-A, to facilitate insertion of cDNA generated with that PCR-A primer into a cloning vector. An RE-A region can be positioned internal to, or 5' or 3' of, non-restriction site-containing sequences of a PCR-A primer. An RE-A region can contain one or more restriction enzyme sites. If an RE-A is located in the 5' portion of a PCR-A pπmer, it can also contain, located at the 5' most end of the RE-A, a partial restπction enzyme site (portion of a restπction site) Such a partial restπction site can be completed to a full restπction site in a subsequent step(s) (e g , duπng PCR amplification, or by ligation to an adaptor) by appending a contiguous sequence beaπng the remaining nucleotides required for restπction enzyme recognition For example, an RE-A can contain 6 bases of an 8 base recognition site, and the remaining 2 bases can be added in a subsequent reaction Alternatively, 4 bases of a recognition site can be completed with 4 more bases, 5 bases with 3 bases, and so on Of course, a PCR-A pπmer, (or any of the oligonucleotide pπmers descπbed herein) can contain any number of complete and/or partial recognition sites One of skill in the art can readily choose appropπate restπction enzyme sιte(s) to be included in an RE-A region, using well known methods in the art In a preferred embodiment, the restπction enzyme sites are recognized by "rare-cutters", e g the sites compπse 8 base or longer recognition sequences In a most preferred embodiment, an RE-A region contains partial or complete sequences of either the restπction site GGCCGGCC, which is recognized by the enzyme Fse I, or the site
GGCGCGCC, which is recognized by the enzyme Asc I
The "random sequence" moiety of a first oligonucleotide pπmer is a sequence which is not designed to be directed to a specific sequence in the nucleic acid sample to be copied By pπmers of random sequence is meant that the position of apposition of the pπmers to the nucleic acid template are substantially indeterminate with respect to the nucleic acid sequence of the template under the reaction conditions used in the methods of the invention Methods for estimating the frequency at which an oligonucleotide will appear in a nucleic acid polymer are descπbed, e g , m Volmia et al (1989), Comp App Bwsci 5, 33-40 It is recognized that the sequences of random pπmers may not be random to the extent that physical and chemical efficiencies of the synthetic procedure will allow
Furthermore, random primers designed to pnme synthesis from defined sources may be less than random in order to compensate for favored arrangements of bases, e g , percentage GC content in certain organisms, etc Ohgonucleotides having defined or arbitrary sequences can be considered "random" if their use causes the locations of their apposition to the template to be indeterminate All these examples of pπmer types are defined to be random so long as the positions along the template nucleic acid strand at which the pπmed extensions occur are largely indeterminate -l ilt is not necessary that apposition of the random primer to the template be at the site of a sequence identical to that of the primer. A primer which apposes to the template with some mismatch is within the scope of the invention if the mismatched primer- template structure can still serve as a site from which to enzymatically synthesize extension products of the primer which are complementary to the template. One of ordinary skill in the art, without undue experimentation, will be able to design many reaction conditions, both stringent (allowing only a perfect complementary sequence match between the primer and the template) and nonstringent (allowing some mismatch in the primer-template pairing) within the scope of the methods of the invention (see, e.g., Nucleic Acid Hybridization, a Practical Approach, B.D. Hames and S.J. Higgins, eds., IRL
Press, Washington, 1985).
Random primers can be generated by a variety of methods, e.g. , by random digestion or physical disruption of natural sources (for example, by digestion with endonucleases or by sonication or shearing), or, preferably, by synthesis with a commercial oligonucleotide synthesizer. Primers can be synthesized with an oligonucleotide synthesizer using conventional methods, and are available from a number of commercial sources.
The random primer sequences within a first primer oligonucleotide can be of any length effective to prime synthesis from a nucleic acid template, e.g., 4-mer, 5-mer, 6-mer, and up to about 50 bases or more. The optimal length of the primers will depend upon many factors, including base composition, chemical composition of the sugar moieties, nature of the backbone, and hybridization conditions. In a preferred embodiment, the random primer has a length of between about 5 and about 40 bases. In a more preferred embodiment, the random primer has a length of between about 6 and 20 bases, preferably 6, 7, 8, 9, 10 or 11 bases, more preferably 7, 8 or 9 bases. In a most preferred embodiment, a mixture of random primers comprising 7, 8 and 9 random bases is used to prime cDNA synthesis. Examples 2 and 3 describe conditions in which priming by random primers gives rise to substantially random priming of first strand cDNAs.
Conditions of hybridization of a first oligonucleotide primer to mRNA, such as, e.g., temperature, time of incubation, nucleic acid concentrations, pH, salt concentration, time of incubation, and amount and type of denaturant such as formamide, etc., can be readily optimized by a skilled worker, using well known methods in the art. See, e.g., Sambrook et al. (1989), Molecular Cloning: A Laboratory Manual (2d ed.), Vols. 1-3, Cold Spring Harbor Press, New York; Hames et α/. (1985), Nucleic Acid Hybridization, IL Press; Davis et al. (1986), Basic Methods in Molecular Biology, Elsevir Sciences Publishing, Inc., New York. By choosing specific ratios of the first oligonucleotide primers to mRNA templates, a skilled worker can design reaction mixtures that will generate first strand cDNAs of approximately any desired length. Useful sizes of first strand cDNAs range from about 200 nucleotides (which is long enough to encompass the 5' UTR (untranslated region) of a gene of interest, and short enough to ensure cDNA production from short genes represented in the mRNA population) to about 1 ,000 nucleotides (a length which would not be expected to comprise stop codons of most genes). Example 2 shows that, under conditions of the invention, ratios ranging from about 7: 1 to about 25: 1 can result in cDNA lengths of about 200 to 1,000 nucleotides, whereas ratios of 62:1 give rise to first strand cDNAs which are shorter. Under the conditions tested, ranges of between about 5:1 and about 60:1 yield a cDNA product which can be used in the methods of the invention.
Another factor in choosing the ratio of first primer oligonucleotide to mRNA template is that too large a ratio of primer to template can result in undesirable priming of second strand synthesis, using the first strand cDNA as template. Such second strand priming creates second strand cDNA which contains, at its 5' end, PCR-A sequences; if such cDNA molecules are inserted into a directional cloning vector designed to insert first strand cDNAS with PCR-A sequences at their 5' ends, the degree of directionality in a resulting library will be compromised. That too high a ratio can lead to undesirable second strand priming is illustrated in Example 3, which indicates that, under the reaction conditions employed in the preceding experiment, ratios of 7:1 or 25:1 result in substantially unidirectional cDNAs, whereas a ratio of 62:1 does not.
Following hybridization of first primer oligonucleotide to mRNA, the first primer is extended in the presence of deoxynucleotide substrates by a transcriptase. Many types of reverse transcriptase can be used in such a reaction, e.g., enzyme from Moloney mouse leukemia virus (MoMuLV), avian myoloblastosis virus, etc. Typically, Superscript II (a modified reverse transcriptase from MoMuLV (Life Technologies)) is used. Conditions for performing reverse transcription reactions are well known in the art (see, e.g., Sambrook et al, ibid.) It is known that many reverse transcriptases contain a terminal transferase activity which adds uncoded-for C's to the 3' ends of reverse transcripts that have been copied completely (or substantially completely) to the 5' end of their mRNA templates. By "substantially completely" is meant herein that the transcriptase has, e.g., copied to (reached) within about 100 or more nucleotides of the 5' end of an mRNA template. For example, the reverse transcriptase can copy to the 5' terminus of the mRNA template, or to a position which lies, e.g., about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160 or more bases from that 5' terminus. In some embodiments, the reverse transcriptase can copy to a position which lies substantially downstream of the 5' terminus. For example, if an mRNA comprises a long untranslated region, the reverse transcriptase need copy only for enough so that the cDNA encompasses the AUG region, including e.g. , sequences required for efficient and/or specific translation. Such a cDNA can extend, for example, about 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 bases 5' of the AUG.
Following addition of the terminal C's, the complex of template and first strand cDNA (or, optionally, first strand cDNA which has been separated from the template) is substantially separated (e.g., purified) from unbound first oligonucleotide primer (i.e., first oligonucleotide primer that has not been incorporated at the 5' end of first strand cDNA). A variety of methods are available for performing such a separation including, e.g., dialysis or microdialysis (see, e.g., Bauer et al. (1993), Nucleic Acids Research 21, 4272), separating columns, and preparative gel electrophoresis (e.g. , on agarose or polyacrylamide gels). By "separating column" is meant any column which can separate single or double strand cDNAs from unincorporated ohgonucleotides or from short ohgonucleotides (e.g., an oligonucleotide whose synthesis has been primed by an oligonucleotide primer but which has been prematurely terminated and released from its template). Examples of separating columns (also known as exclusionary columns or sizing columns) are gravity- based or spin columns made of Sephadex or Sepharose, e.g., G-50, G-75, chromaspin™ columns, or Pharmacia Microspin S400 HR columns, or QIAGEN (or other silica gel membrane based) columns. In a most preferred embodiment, first strand cDNA is separated from unbound molecules of first primer oligonucleotide by a two step procedure: binding, washing and eluating bound material on a QIAGEN™ column, followed by preparative electrophoresis on a 4% agarose gel. "First strand cDNAs which are substantially free of unbound first oligonucleotide primer" are cDNAs which, when subjected to subsequent steps of this invention, give rise to a substantially unidirectional population of cDNA clones. By "substantially unidirectional" is meant that at least about 80%, e.g., about 80, 82, 85, 87, 90 or 92%, and preferably at least about 95%, e.g., about 95, 97 or 99%, of the cDNAs are oriented in a single orientation (direction). Example 4 demonstrates a two-step purification procedure of the invention.
A variety of methods can be used to select and/or amplify first strand cDNA that has been extended completely, or substantially completely, to the 5' end of its mRNA template. For example, TdT-mediated RACE takes advantage of the fact that completely extended cDNA is preferentially (at least to some extent) acted upon by deoxynucleotidyl terminal transferase; RLM-RACE requires ligation of defined RNA ohgonucleotides to the 5' ends of mRNA templates, and only cDNA which has incorporated those oligonucleotide sequences into its 3' end is amplified in subsequent PCR reactions; and LA-PCR relies on the ligation of defined ohgonucleotides with T4 RNA ligase to the 3' end of first strand cDNA. A second PCR primer oligonucleotide can be added to the 3' ends of first strand cDNAs by any of these end-specific methods, and can be used in the generation of second strand cDNA and/or for PCR amplification.
In a preferred embodiment of the invention, another end-specific method is used (see Example 5). A first strand cDNA which has substantially reached the 5' end of its mRNA template and comprises 3' terminal, uncoded-for, C's is further extended at its 3' end by 1) hybridization to a second PCR primer oligonucleotide which comprises PCR-B sequences and, at its 3' terminus, a sequence of about 2 to 10 G's, preferably 3 G's, and which serves as a short extended template for reverse transcription, and 2) incubation with a transcriptase. As is illustrated schematically in Fig. 1, the reverse transcriptase extends the 3' end of the cDNA by appending to it the reverse complement of the sequence of PCR- B. The cDNA product obtained by such a procedure is termed "extended first strand cDNA." Optimization of the reaction conditions (e.g., hybridization conditions in which the G's of the second primer oligonucleotide (e.g., GGG) can hybridize to 3' terminal C's of first strand cDNA) can be achieved routinely by one of skill in the art. For example, to minimize breathing or melting of the first strand cDNA/mRNA complexes and to encourage annealing of the 3 G's of a second oligonucleotide primer to terminal C's of first strand cDNA, one can perform the annealing and reverse transcriptase steps at a temperature which is slightly lower than that for most reverse transcriptase reactions, e.g., 37° C instead of 42° C.
Like a PCR-A sequence, a PCR-B sequence is of sufficient length and base composition to be able to prime synthesis during subsequent PCR amplification, and it optionally can comprise a restriction enzyme region, RE-B, which has the same characteristics as RE-A, described above. Factors pertaining to the design of a PCR-B sequence, including the choice of restriction enzyme sites in an (optional) RE-B region within it, are discussed below. In one preferred embodiment, a non restriction enzyme site-containing portion of a PCR-B primer is contiguous and 5' or 3' to an RE-B region. In a most preferred embodiment, the second PCR primer oligonucleotide is 5'-
TACGGCTGCGAGAAGACGACAGAAGGG-3', which has the same sequence as the "SMART ™ Oligo" sold by Clontech.
The transcriptase used to extend first strand cDNAs can be the same as or different from that used to generate the first portion of the cDNA. In a preferred embodiment, the same type of reverse transcriptase is used in both steps.
Example 6 shows that in extended first strand cDNA prepared by the methods of the invention, the second oligonucleotide primer interacts predominately with the 3' ends of first strand cDNAs that have been copied substantially to the 5' ends of their mRNA templates. Thus, extended first strand cDNA molecules are substantially flanked at the ends corresponding to the 5 ' ends of expressed genes by PCR-B sequences, and at the opposite ends by PCR-A sequences. This asymmetry allows the cDNAs to be cloned in any desired orientation in a vector, and thereby allows a skilled worker to generate a unidirectional library.
Following the generation of extended first strand cDNA, the complex of template and extended first strand cDNA (or, optionally, extended first strand cDNA which has been separated from the template, e.g., by treatment with RNAse H) is substantially separated (e.g., purified) from second oligonucleotide primer. Any of the methods described above for separation of first strand cDNA can be used for the separation of extended first strand cDNA. In a most preferred embodiment, a combination of a QIAGEN™ column and a 4% agarose gel is employed, as described herein. Separated
(e.g., purified) extended first strand cDNAs which are substantially free of second primer oligonucleotide, when subjected to the subsequent steps of this invention, give rise to a substantially unidirectional population of cDNA clones. That is, a substantial number of the cDNA fragments are flanked by different, distinguishable restriction sites at each end (substantially all of the 5' ends have one restriction site and substantially all of the 3' ends have a different restriction site), enabling directional insertion. In this context, by "substantially" is meant at least about 80%, e.g., about 80, 82, 85, 87, 90 or 92%, and preferably at least about 95%>, e.g., about 95, 97 or 99%. Therefore, upon ligation into an appropriate cloning vector, at least about 80%, e.g., about 80, 82, 85, 87, 90 or 92%, and preferably at least about 95%o, e.g., about 95, 97 or 99% of the cDNAs are aligned in the same orientation with respect to the direction of transcription of the mRNA templates. Extended first strand cDNA can be converted to double stranded cDNA by conventional procedures in the art. For example, second strand cDNA can be generated by hybridizing extended first strand cDNA to a primer oligonucleotide which comprises PCR-B sequences, followed by elongation of the primer oligonucleotide with a DNA- dependent DNA polymerase. After second strand cDNA is generated, the double stranded cDNA can optionally be amplified by PCR. Double stranded cDNA, whether it is PCR amplified or not, can be cloned into a vector of choice by any of a variety of art-recognized procedures. For example, one or both of the DNA ends can be "polished" (e.g., by treatment with T4 DNA polymerase) and inserted into a vector by blunt-end ligation; linkers or linker adaptors can be added to the ends of a cDNA fragment to provide a desired restriction site end (or different restriction site ends at each end of the cDNA fragment) which is compatible with a cloning restriction site(s) in the vector; or double strand cDNA which contains an RE restriction region at either or both ends can be digested with appropriate restriction enzyme(s) and inserted into compatible restriction site(s) in a cloning vector. Alternatively, restriction enzyme sites can be added to one or both ends of a double stranded cDNA during PCR amplification, and the cDNA can then be inserted into a cloning vector via those restriction enzyme ends.
In a most preferred embodiment of the invention, a second cDNA strand is both generated and amplified by PCR in a single reaction. After separated, extended first strand cDNA is generated, it is converted to second strand cDNA and is amplified in the same reaction mixture by PCR. The 5' primer in the reaction (corresponding to the 5' end of the mRNA), herein designated as a "third PCR primer oligonucleotide," comprises PCR-B sequences; and the 3' primer, herein designated as a "fourth PCR primer oligonucleotide," comprises PCR-A sequences. Of course, the third and fourth primers can, if desired, contain only portions of PCR-B and PCR-A sequences, respectively, provided that the length and composition of the sequences in the third and fourth primers are sufficient to allow for hybridization and PCR amplification. In a prefened embodiment, each of the third and fourth primer ohgonucleotides also comprises all or a portion of at least one restriction enzyme site. The inclusion of restriction enzyme sites in the third and fourth primers provides for great flexibility, allowing a skilled worker to flank cDNAs with a wide variety of restriction enzyme sites, compatible with any desired cloning vector. In a most preferred embodiment, as illustrated in Example 1 and Figure 1 , the third PCR primer oligonucleotide comprises, 5' to 3', an Asc I restriction site and PCR-B sequences; and the fourth PCR primer oligonucleotide comprises, 5' to 3', an Fse I restriction site and PCR-A sequences. Of course, in order to facilitate efficient digestion at a restriction site which lies close to the 5' ends of the third or fourth primer (or of any primer), additional spacer sequences can be added to the 5' terminus of the primer. Amplified, double stranded cDNAs can be cloned into vectors by any of the methods described above. In a preferred embodiment, amplified cDNAs flanked by RE-A and RE-B sites are cleaved, simultaneously or sequentially, by enzymes which cut within restriction sites in the RE sequences, and are then cloned into compatible restriction sites in a cloning vector. Of course, if the RE cutting sites are positioned between PCR-A and/or PCR-B and cDNA sequences, one can excise substantially all information extraneous to the cDNA by cleaving the fragments with appropriate restriction enzymes before cloning. In another preferred embodiment, amplified cDNAs flanked by restriction sites that have been introduced during PCR amplification are cleaved at those sites, simultaneously or sequentially, by appropriate restriction enzymes and are cloned into compatible site(s) in a cloning vector.
The design of oligonucleotide primers is governed by factors relevant both to the ability of the fragments to prime PCR amplification and to the nature of the (optional) restriction enzyme sites included therein.
A skilled worker can design a pair of oligonucleotide primers (5' and 3' primers, e.g., the first and second oligonucleotide primers, or the third and fourth oligonucleotide primers, as described herein) to contain any desired restriction enzyme site(s). Preferably, the restriction site ends at each end of a double strand cDNA fragment which is to be cloned are different, in order to facilitate directional cloning. Most preferably, one of these restriction site ends has a 5' overhang and the other a 3' overhang; the two enzymes prefer the same reaction buffers and temperature; and extensive double digestion is possible. A variety of appropriate enzyme pairs will be evident to one of skill in the art. As noted above, in a preferred embodiment, the fragment is bounded by restriction sites of 8 or more nucleotides, and in a most preferred embodiment, the sites are Asc I and Fse I.
Once one or more partial or complete restriction enzyme sites have been selected to reside in each of a pair of oligonucleotide primers, a skilled worker can design the remaining sequences (e.g., non-restriction enzyme-containing PCR-A sequences and random sequences of a "first oligonucleotide primer" as used herein; non-restriction enzyme-containing PCR-B sequences of a "second oligonucleotide primer" as used herein; or non-restriction enzyme-containing sequences of a third or fourth oligonucleotide primer as used herein), taking into account a number of variables. For example, it is desirable to avoid stop codons, especially those in frame with a reporter gene in the vector, and codons which could bias a subsequent screening reaction, if such codons are not to be excised before the screen is begun. Of course, primers can be designed such that undesirable codons or longer sequences can be excised, e.g., by one or more restriction endonucleases, before or after the fragments are cloned into a cloning vector. Examples of sequences which could unfavorably bias a subsequent screening reaction include, e.g., codons for hydrophobic amino acids that could falsely mimic signal sequences.
A further factor in designing primer ohgonucleotides is that a pair of primers to be used for PCR amplification should contain approximately equal AT and GC contents, and be as close to the GC-AT ratio of the template as possible. Therefore, when, for example, a PCR-A or PCR-B element comprises a GC-rich sequence, such as a restriction enzyme site for Asc I or Fse I, other sequences in the primer can be adjusted so as to be relatively richer in AT content. Other considerations, including ways to reduce undesirable background during PCR amplification, are well-known to one of skill in the art and are discussed, e.g., in "Selection of primers for polymerase chain reaction" in Methods in Molecular Biology, Vol. 15, B.A. White ed., Humana Press Inc., Totowa, NJ, pp. 31-40. Among the relevant factors are: avoidance of primer pairs with 3'-terminal complementarity, so as to prevent the formation of primer-dimers; and avoidance of primers with self-complementarity, since such primers will be extended using themselves as a template instead of the target molecule.
PCR reaction conditions can be optimized by one of skill in the art, using routine, art-recognized conditions (see, e.g., Newton, C.R. and Graham, A., eds. Polymerase Cham Reaction, BIOS Scientific Publishers Limited, 1997, especially Chapter 5; Also, Innis,
Michael A, et al, eds. PCR Protocols: a guide to methods and applications, 1990, Academic Press, San Diego, CA). A number of thermophyllic polymerases, including enzymes with 3' — > 5' proofreading exonuclease activity, are commercially available. Thermocycling parameters, including denaturing and annealing conditions, the temperature and length of incubation, and the number of cycles, can readily be optimized by the skilled worker. Example 7 illustrates the formation of second strand cDNA and its amplification by PCR. Example 8 and Fig. 4 show that the double strand cDNAs made by the methods of the invention retain the unidirectional character observed in the preceding steps. In a preferred embodiment, amplified double strand cDNA fragments flanked by two different restriction sites are inserted unidirectionally into the insertion region of a cloning vector, i.e., into a vector which comprises two different, separable, restriction enzyme insertion sites. Cloning vectors which have been cleaved so as to form two different, non-compatible, restriction site ends are less likely to self ligate (without insertion of a fragment) than are vectors cleaved by a single restriction enzyme, particularly if one of the restriction site ends has a 5' overhang (e.g., Ascl) and the other has a 3' overhang (e.g., Fsel). Vectors can be designed to grow in a variety of hosts, including various bacteria, yeast, insect cells or higher eukaryotic cells, or to shuttle between two different host organisms. A wide variety of markers are available for the selection of those vectors which have inserted foreign DNA. For some applications, the inserted sequences need not be expressed. For other applications, it is desirable that the inserted sequences are expressed. Appropriate vectors of both types are plentiful and well- known to those skilled in the art. Examples include, e.g., pBR322, pUC vectors, PET vectors, etc. (Sambrook et al, ibid] In a most preferred embodiment, cDNA-containing fragments are inserted into a cloning vector which is designed to select for (to trap) sequences that encode functional signal peptides which direct secretion of polypeptides to which they are attached. One such vector is the yeast vector described in U.S. Pat. No. 5,536,637, in which a cloning region (with a single restriction enzyme site) is positioned upstream of a non-secreted yeast invertase gene. When cDNA fragments that contain functional signal peptides (secretory leader sequences) are inserted into the cloning site, they allow the "non- secreted" yeast gene to be secreted following transformation of the other recombinant vector into a yeast cell. Yeast which secrete invertase can then be selected on medium containing sucrose or raffinose, allowing one to identify and isolate cDNAs that encode functional signal peptides. Many other such selectable markers can be used, including known secretory genes, e.g., IL-2 receptor α (see, e.g., U.S. Pat. No 5,525,486) or proteins whose expression can be detected on the cell surface by immunostaining such as Tac (see, e.g., Tashiro et al. (1993), Science 261, 600-603 and U.S. Pat. No. 5,753,462). Preferably, a vector of the invention contains one or two of the following properties: 1) The cloning region contains at least two restriction enzyme sites, thereby allowing directional cloning of inserts. As noted above, the two restriction sites are preferably 8-base or longer restriction enzyme recognition sites and, most preferably, are Asc I and Fse I. 2) Upstream of the cloning region are three stop codons, in different reading frames. The presence of these stop codons prevents translation initiation from upstream sequences and therefore requires that translation of the selectable marker (e.g., the invertase fusion protein described in U.S. Pat. No. 5,536,637) begins from a site within the inserted cDNA fragment; this reduces false positives in the selection. Example 9 describes the construction of a cloning vector which exhibits a high degree of efficiency for selecting signal peptides, and Example 10 describes the generation of a directional cDNA library, using the vector.
It may be preferable to clone certain sequences in the opposite orientation from that of normal transcription. Antisense RNAs transcribed from such clones can be useful, e.g., for downregulating gene expression from one or more targeted genes, e.g., in methods of gene therapy. cDNA libraries generated by the methods of the invention are complex and representative. Example 11 demonstrates by colony hybridization analysis that such libraries are representative, and Example 12 demonstrates by PCR analysis that they are complex. Example 13 shows that at all the steps of the generation of cDNAs by methods of the invention, the cDNAs exhibit unidirectional characteristics. Brief Description of the Drawings
Fig. 1 shows a schematic representation of the method of cDNA synthesis.
Fig 2A schematically illustrates PCR primer pairs which can be used to distinguish
"correct" from "incorrect" priming of first strand PSA (prostate specific antigen)- specific cDNA.
Fig. 2B shows a gel which illustrates the degree of "correct" first strand PSA- specific cDNA synthesis under several conditions.
Fig. 3A schematically illustrates PCR primer pairs which can be used to distinguish "correct" from "incorrect" extension of first strand PSA-specific cDNA. Fig. 3B shows a gel which illustrates "correct" extension of first strand PSA- specific cDNA.
Fig. 4 shows a gel which evaluates the unidirectional character of PCR-amplified cDNA.
Fig.5 is a Table summarizing colony hybridization of 12 cDNA libraries to probes for three differently expressed genes.
Examples
Example 1- SCHEMATIC ILLUSTRATION OF cDNA SYNTHESIS
Preferred steps of the invention, as illustrated in figure 1, are described below.
1. First strand cDNAs are generated from poly A+ mRNA templates by priming with a reverse transcription mixture comprising three first oligonucleotide primers, each of which comprises a common PCR-A sequence. The three primers contain stretches of either 7, 8 or 9 random nucleotides. The primers are made by conventional means in an oligonucleotide synthesizer and/or are obtained from a commercial source. First strand cDNA which reaches the 5' terminus of its mRNA template is tailed with C's by the reverse transcriptase. The first strand cDNAs, preferably in the form of cDNA/mRNA complexes, as shown here, are separated from e.g., unincorporated first primer ohgonucleotides by column chromatography and gel purification.
2. Extended first strand cDNA. A second oligonucleotide primer, which comprises 5' to 3' a defined PCR-B sequence followed by GGG, is hybridized to a separated first strand cDNA/mRNA complex, possibly by virtue of hybridization of the GGG sequence of the primer to the C-tails of the cDNA, and the hybridized complex is incubated with reverse transcriptase. The reverse complement of PCR-B is thereby appended to the C-tailed cDNAs. The extended cDNA/mRNA complexes are separated from, inter alia, unincorporated second oligonucleotide primers by column chromatography and gel purification.
3. PCR amplification and addition of restriction enzyme sites. Separated extended first strand cDNA/mRNA complexes are subjected to 15-22 cycles of PCR amplification, using as primers: a) a 5 ' primer (third primer oligonucleotide) which comprises, 5' to 3 ', an Asc I restriction site and PCR-B sequences, and b) a 3' primer
(fourth primer oligonucleotide) which comprises, 5' to 3', an Fse I restriction site and PCR-A sequences. The amplified double strand cDNA is separated from unincorporated third and fourth primer ohgonucleotides by column chromatography.
4. Preparation for cloning. The double strand cDNA is digested with the restriction enzymes Asc I and Fse I, and the digested, double strand cDNA is separated and size-selected by gel electrophoresis.
Example 2 - SYNTHESIS OF FIRST STRAND cDNA
First strand cDNA is synthesized using reagents from a SMART™ PCR cDNA Library Construction Kit (Clontech). Poly A+ mRNA from normal human prostate is primed with an equimolar mixture of three first primer ohgonucleotides, SEQ ID Nos: 1 ,
2 and 3, which contain, respectively, random sequences of 7, 8 or 9 nucleotides in length. Each first primer oligonucleotide also contains, 5' to the random sequences, a common PCR-A sequence which has, at its 5 ' end, a partial Fse I site.
For each reaction, 0.5 μg of poly A" mRNA is mixed with primers in a total volume of 5 μl. The mixture is heated at 72 °C for two minutes and placed on ice for two minutes. First strand cDNA synthesis buffer (lx) (50 mm Tris, pH 8.3; 6 mm MgCl2; 75 mm KC1) and dNTPs (deoxynucleoside triphosphates) (1 mM), DTT (2 mM), 32P dCTP (5 μC) and Superscript II Reverse Transcriptase (200 U, BRL) are added to the mixture in a total volume of lOμl. The mixture is then incubated at 42 °C to allow first strand cDNA synthesis to proceed. Both the ratio of primers to mRNA template and the length of time during which reverse transcriptase is allowed to elongate cDNA can affect the size of cDNAs produced during the reaction. For example, if one assumes that the length of the mRNAs is about 1.5 kb, and the desirable average distance from the 5 ' end to put down a primer is 0.6 kb, and every random primer in the first strand reaction can anneal and prime a cDNA, then one would use primers at a 2.5 fold molar excess to the mRNAs. If not all primers participate in priming, a larger ratio would be needed. Here, primer/mRNA ratios of 7: 1 , 25: 1 and 62: 1, and elongation times of 20, 40 and 60 minutes, are examined. Nine samples are prepared, using each possible combination of ratios and elongation times. Samples 1-3 are incubated with 7:1 ratios, samples 4-6 with 25:1, and samples 7-9 with
62:1. Samples 1, 4, and 7 are incubated for twenty minutes, samples 2, 5 and 8 for forty minutes and samples 3, 6 and 9 for sixty minutes.
The radioactively labeled cDNAs are analyzed by electrophoresis on an alkaline agarose denaturing gel and autoradiography, following art - recognized procedures (See, e.g. , Sambrook et al, ibid., Chapter 6.20). For each sample, the cDNAs appear as a smear of about 0.2 to about 7 kb with no apparent banding. This indicates that priming with the random primers is, indeed, random. The molecular weight range of the smears increases for samples which are incubated for longer times, and decreases as the ratio of primers/mRNA goes up, as expected from the theoretical considerations noted above.
Example 3 - ANALYSIS OF FIRST STRAND cDNA BY PCR
It is desirable that first oligonucleotide primers, as discussed in Example 1, prime first strand cDNA synthesis from mRNA, but do not also prime subsequent cDNA synthesis using first strand cDNA as a template. To show that this is achieved, the cDNAs of Example 2 are analyzed by PCR, with particular attention to cDNAs corresponding to the highly expressed prostate-specific gene, prostate specific antigen (PSA). (See Figures
2A and 2B)
Fig. 2A shows schematically that PCR amplification with primers PSA-up (SEQ. ID No: 4; a unique sequence within PSA) and PCR-A' (SEQ. ID No: 25; PCR-A and an Fsel restriction enzyme site) would be expected to detect and amplify first strand PSA cDNAs of the "correct" or desirable orientation. By contrast, amplification with PCR-A' and PSA-down (SEQ. ID No: 5) would be expected to detect and amplify PSA cDNA of the undesired "incorrect" or "anti-sense" orientation which has been primed by first strand oligonucleotide primers using first strand cDNA as template.
Fig. 2B, top panel, depicts PCR products generated by PSA-up and PCR-A'. Lanes 1-3 contain PCR products generated from first strand cDNAs made with 7-fold ratios of primer to mRNA, lanes 4-6, 25-fold, and lanes 7-9, 62-fold. For lanes 1 , 4 and 7, the incubation time is twenty minutes; for lanes 2, 5 and 8, 40 minutes; and for lanes 3, 6 and 9, 60 minutes. Samples in all subsequent Examples herein are loaded onto gels in this order. Clearly, significant amounts of PSA-specific first strand cDNA are generated under all reaction conditions. The heterogeneous sizes of the bands confirms the finding in Example 1 that the cDNA synthesis results from random priming.
Fig.2B, bottom panel, depicts PCR products generated by PCR-A' and PSA-down.
Small amounts of cDNA of the undesirable "incorrect" or "antisense" orientation can be detected in lanes 1, 2 and 4. Significantly larger amounts of "incorrect" cDNA are present in cDNA made with a 62-fold excess of primer/mRNA (lanes 7-9), or when extension is allowed to proceed for 40 or, especially, 60 minutes.
Under optimal conditions of cDNA synthesis, (e.g., lanes 1 and 4), the ratio of
"correct" first strand cDNA to undesirable "incorrect" cDNA is approximately 9:1. Similar experiments to analyze another highly expressed prostate-specific gene, PN44, confirm that under optimal conditions, at least 90%) of the cDNA primed by PCR-A is generated from mRNA template rather than from first strand cDNA template.
Furthermore, sequence analysis of a number of PN44 inserts indicates that they have different 3 ' ends, and sequences of random clones show no pattern of preferred priming, further confirming that cDNA priming with the methods of the invention is random.
Example 4 - SEPARATION OF FIRST STRAND cDNA/mRNA COMPLEXES
The first strand cDNA/mRNA complexes from Example 2 are substantially separated (e.g., purified) from unincorporated first primer ohgonucleotides, using a
QI Aquick™ PCR purification kit, i. e. , chromatography on a QIAGEN™ column, followed by preparative electrophoresis on a 4% agarose gel (FMC). Nucleic acid larger than a double strand 200 bp marker is isolated and extracted from the gel with a QIAquick™ Gel Extraction Kit (QIAGEN™). This process encompasses at least one cycle of binding the extracted nucleic acid to the QIAquick™ and eluting it from the column. If first strand cDNA is not separated by such a stringent, two-step, procedure, residual first primer oligonucleotide interferes with subsequent steps in the genesis of double stranded cDNA,
(e.g., first oligonucleotide primer is incorporated into cDNAs at positions other than the 5' end of first strand cDNA), and results in cDNA libraries that are significantly less unidirectional than the libraries of the invention.
Example 5 - SYNTHESIS AND PURIFICATION OF EXTENDED FIRST STRAND cDNA
A second oligonucleotide primer, which comprises, 5' to 3'; a defined PCR-B sequence followed by GGG is hybridized to first strand cDNA (preferably in the form of a cDNA/mRNA complex) which has substantially reached the 5' end of its mRNA template and has been modified by the addition of terminal C's by the terminal transferase activity of the reverse transcriptase. In this example, the second oligonucleotide primer is a "SMART™ Oligo" obtained from Clontech. However, any primer with this sequence (SEQ. ID No: 26), or any derivative or modification thereof, comprising, e.g., naturally or non-naturally occurring bases or sugars, can be used.
Second oligonucleotide primer (2 μm), first strand cDNA synthesis buffer (lx), dNTPs (1 mM), DTT (2 mM) and Superscript II Reverse Transcriptase (200 U) are added to 5 μl of each of the separated, size selected, first strand cDNAs from Example 4 (0.25 μg) to a final volume of 10 μl. The mixture is incubated at 37°C for 1.5 hours.
The extended first strand cDNAs are substantially separated (e.g., purified) from residual second oligonucleotide primer and other oligomers of less than 200 bp with a QIAquick™ PCR Purification column followed by preparative electrophoresis on a 4% agarose gel. Nucleic acids of 0.2 to 12 kb are excised and extracted from the gel.
Example 6 - ANALYSIS OF EXTENDED FIRST STRAND cDNA BY PCR It is desirable that the second oligonucleotide primer, as discussed in Example 5, serves as a template for extension synthesis predominantly at the 3 ' end of first strand cDNAs, which have reached the 5' ends of their mRNA templates. To show that this is, indeed, achieved under the conditions of the invention, the cDNAs of Example 5 are analyzed by PCR. (See Figure 3A and 3B)
Fig. 3A shows schematically that PCR amplification with primers PCR-B' (SEQ. ID No: 24; PCR-B and an Ascl restriction enzyme site) and PSA-down would be expected to detect and amplify "correct" or "sense" strand extended first strand PSA cDNA in which the reverse complement of PCR-B has been appended to the 3' end of cDNA that has substantially reached the 5' end of its mRNA template. By contrast, amplification with primers PSA-up and PCR-B' would be expected to detect and amplify PSA cDNA of the "incorrect" or "anti-sense" orientation which has incorporated the reverse complement of PCR-B sequences at sites other than at the 5' end of template mRNA.
Fig. 3B, left panel, depicts PCR products generated with PCR-B' and PSA-down. cDNA samples used in lanes 1 -6 are as described in Example 2 and are loaded onto the gel in the order described in Example 3. A single band of about 220 bp is observed for all of the samples. This band is of the length expected for the extended first strand PSA cDNA of the desirable orientation. Similar experiments with a PN44-specific primer confirm that PN44 extended first strand cDNA also appends the reverse complement of PCR-B to its 3 ' end, and that the extension also is appended substantially only to cDNAs which are copied completely to the 5 'end of the mRNA template.
Amplification products with PSA-up and PCR-B' are shown in the right panel of
Figure 3B. Only very short products can be detected, similar to those in the negative control lanes, 7 and 8, indicating that under all conditions examined, the reverse complement of PCR-B is appended predominantly at the 3' ends of first strand cDNA, as desired, rather than appended at the 3' ends of second strand cDNA.
Similar experiments with a different cDNA library show that for three out of six genes surveyed, the PCR-B primer attaches predominantly to the 3 ' ends of first strand cDNAs that have been copied substantially completely to the 5 'ends of their mRNA templates.
Sequence analysis shows that for approximately 95% of cDNAs tested, three C's are present between the 3 ' end of first strand cDNA and reverse complement PCR-B sequences. Without wishing to be held to any mechanism, this is consistent with the hypothesis that the terminal transferase adds three C's to the 3' end of substantially completed first strand cDNA, and that the three G's at the 3 ' end of second primer oligonucleotide bind to the first strand cDNAs via these three GC pairs.
Example 7 - PCR AMPLIFICATION OF SEPARATED, EXTENDED,
FIRST STRAND cDNA AND INCORPORATION OF FLANKING
RESTRICTION SITES ONTO THE cDNA FRAGMENTS
Double stranded cDNA is made by PCR amplification of the six separated extended first strand cDNAs (samples 1-6) discussed in Examples 2 and 5. Duplicate samples of the template are mixed with dNTPs (200 μM), lx PCR reaction buffer (40 mM Tricine-KOH (pH 9.2); 15 mM KOAc; 3.5 mM Mg (OAc)2; 75 μg/ml Bovine serum albumin), 0.2 μM 5' end PCR primer (SEQ. ID No: 24; third oligonucleotide primer, which comprises, 5' to 3', an Asc I site and PCR-B sequences), 0.2 μM 3' end primer (SEQ ID No: 25; fourth oligonucleotide primer, which comprises, 5' to 3', an Fse I site and of PCR-A), dd H2O and Advantage® cDNA Polymerase Mix (lx) (0.8 mM Tris-HCl (pH 7.5); 1% glycerol; 1.0 mM KC1; 0.5 mM (NH4)2SO4; 2.0 μM EDTA; 0.1 mM β- mercaptoethanol; 1.1 μg/ml TAQSTART antibody; 1-5 units of Klen-Taq-1 DNA polymerase to a total volume of 100 μl in 0.2 ml PCR tubes. The stock solutions and conditions for performing PCR are provided by an Advantage® cDNA PCR Kit (#K 1905-
1) from CLONTECH Laboratories, Inc. Samples are subjected to 8, 15 or 22 cycles of amplification in a PCR machine, with the following cycling conditions: 1 X 94°C (3'), 8, 15 or 22 X 94°C (30")-70°C (l ')-72°C (4'), 1 X 72 °C (!').
The optimum number of PCR cycles is the lowest number which generates a sufficient quantity of cDNA inserts to make a large library. In this experiment, samples amplified for eight cycles do not generate enough material for gel purification; samples amplified for fifteen cycles generate enough cDNA to generate a large library (over one million). The optimal number of cycles can be routinely determined by one of skill in the art. Control reactions show that the amplification reaction is dependent upon the presence of both primers and the template. Amplified cDNA is separated by using a QIAquick™ PCR Purification Kit (QIAGEN).
Example 8 - CHARACTERIZATION OF DOUBLE STRAND cDNA BY PCR
In Figure 4, the 15-cycle amplified cDNAs described in Example 7 are characterized by PCR analysis using the PCR primer pairs described in Fig. 3A. Lanes
1-6 correspond to the cDNA samples in Example 2.
The left panel of Fig. 4A shows that "sense" PCR products are generated from all six samples when the PSA-down and PCR-B' primer pair is used. The right panel shows that, by contrast, no products large than primer-dimers are generated after amplification with the primer pair, PCR-B' and PSA-up. Similar results are obtained using PN44- specific primers. These findings confirm that after the PCR amplification stage, the double strand cDNAs are still predominantly unidirectional, with Asc I and PCR-B sequences located at the end of the cDNA corresponding to the 5' end of the mRNA, and Fse I and PCR-A sequences at the opposite end.
Example 9 - A SIGNAL TRAP VECTOR
The parental plasmid for this construction, pEHB3, was obtained from Chris A. Kaiser, Department of Biology, M.I.T. It was constructed in Kaiser's laboratory in the following manner: The 4.5kb EcoRI fragment of pRB58 (Carlson and Botstein, (1982), Cell 28, 145-154) was cloned into the yeast vector pRS316 (Sikorski and Hieter, (1989), Genetics 122, 19-27) to form a precursor of pEHB3. This precursor was converted to pEHB3 by site-directed mutagenesis (Sambrook, et al, Molecular Cloning: A Laboratory Manual (2nd ed.). Cold Spring Harbor, N.Y., 1989, Vols. 1-3, especially Vol. 2, Ch. 15) to introduce Bglll and Nhel restriction endonuclease sites 5' and 3' (respectively) to the signal cleavage site of the invertase gene (Taussig and Carlson, (1983), Nucleic Acids Research 11, 1943-1954). pEHB3 is a yeast shuttle vector which comprises an E. coli origin of replication and an Amp gene for growth and selection in bacteria; CEN 4/ARS1 and URA3 sequences for growth and selection in S. cerevisiae; and, inserted into a polylinker site, a 4.5 kb EcoRI fragment of yeast genomic DNA that contains the invertase gene, SUC2.
To construct an improved signal trap vector the SUC2 invertase signal sequence of pEHB3 was deleted and replaced by SEQ ID No.: 27 using standed mutation and recombinant methods. This sequence comprises, 5' to 3', stop codons in three different reading frames, an Ascl site, and a Fsel site. In a further step, approximately 2kb of genomic sequence 3' to the SUC2 gene, and contained within a Hind III fragment, was removed from the vector by partial Hind III digestion and religation. The resulting signal trap vector was called pSST22.
Example 10 - DIRECTIONAL CLONING OF AMPLIFIED DOUBLE STRAND cDNA
Double strand cDNAs, flanked at the 5' and 3' ends by Asc I and Fse I sites, respectively, as described in Example 8 are digested by Asc I and Fse I using the manufacturer's recommended conditions, extracted with phenol/CHCl3, and subjected to preparative electrophoresis on a 4% agarose gel. Fragments in the size range of 0.2 to 1 kb are excised and extracted as described above, and are ligated to the cloning vector described in Example 9, which has been prepared for ligation by cleavage at its Asc I and Fse I cloning sites, followed by dephosphorylation using alkaline phosphatase (e.g., Cat. # 713023, Boehringer Mannheim). The ligation mixture is transformed into XL10 gold competent cells following the manufacturer's instructions (Stratagene). Transformants are pooled from ampicillin (100 g/ml) plates, and DNA "minipreps" or "megapreps" are prepared. Plasmids taken from E. coli can, if desired, be transfected into yeast, so that clones containing leader sequences or transmembrane regions can be isolated. Selection methods are described, e.g., in U.S. Patent No. 5,536,637. Modifications of previously disclosed selection methods can, of course, be used. For example, instead of step g) of the method disclosed in U. S. Patent No. 5,536,637 (purifying DNA from the yeast cells), one can amplify the inserted DNA sequences directly from the yeast cells by PCR. Alternatively, ligated DNA can be transfected directly into yeast cells. In one embodiment, clones which can grow on the selective medium are cloned and verified by restreaking on the selection medium; and the inserted cDNA that confers the ability to grow on the selective medium is directly amplified from the yeast colony by PCR before it is analyzed further.
Details of the above cloning procedures are routine and well-known to those of skill in the art (See, e.g., Sambrook et al, ibid.).
Example 11 - REPRESENTATIVE NATURE OF cDNA LIBRARIES/ COLONY
HYBRIDIZATION
Double strand cDNA is made as described in the preceding examples, and samples 1-6 of both 15-cycle and 22-cycle amplifications are cloned into the PSST22 vector via Asc I and Fse I insertion sites to produce a total of 12 libraries. The fact that each of these libraries is representative is shown by colony hybridization to probes for genes of known expression levels (See Figure 5). Routine, art- recognized colony hybridization methods (see, e.g., Sambrook et al, ibid.) are used. Three genes whose expression levels in normal prostate have been estimated, are examined: PN44 (highly abundant), UBA (medium abundance) and Cl (low abundance). Using the PCR primer pairs, SEQ. ID Nos: 6 and 7 for PN44, SEQ. ID Nos: 8 and 9 for UBA, and
SEQ. ID Nos: 18 and 19 for Cl, 32P labeled probes are made with the Redi Prime Kit (Amersham) and hybridized to 5,000 to 7,000 colonies from each library. Fig. 5 shows that each of the genes is proportionally represented, as expected, in each of the libraries. That the frequency of these genes in each of the twelve libraries is representative suggests that the relative abundance of cDNAs from the three genes does not change significantly within the ranges tested for the three variables tested: primer/template ratio, incubation time of first strand cDNA synthesis, and the number of PCR cycles.
Example 12 - COMPLEXITY OF cDNA LIBRARIES/PCR ANALYSIS
Samples are taken at each step during preparation of the cDNA libraries discussed in the preceding examples, and are analyzed for the presence often known genes whose abundance has been determined in an EST human library (These genes and their abundance are described, e.g., in Nelson et al. (1998). Genomics 42, 12-25.): highly abundant PSA (SEQ. ID Nos: 4 and 5) and PN44 (SEQ. ID Nos: 6 and 7); medium abundant UBA52 (SEQ. ID Nos: 8 and 9) and Hevin (SEQ. ID Nos: 10 and 11); and lower abundant AH receptor (SEQ. ID Nos: 12 and 13), myosin light chain kinase (SEQ. ID Nos: 14 and 15), TALLA 1 (SEQ. ID Nos: 16 and 17), Cl inhibitor (SEQ. ID Nos: 18 and 19), Her 2 (SEQ. ID Nos: 20 and 21) and TMP homologue prostate (SEQ. ID Nos: 22 and 23). According to this study, the two highly abundant genes are present in twelve (PSA) and eleven (PN44) out of 1168 ESTs; the medium abundant genes are each present in three out of 1168; and the six lower abundant genes in one out of 1168.
PCR amplification is performed, using the PCR primer sets specific for each of the ten genes as described above. All but Hevin are detectable in the amplified cDNA inserts (Hevin is not detected in all experiments tried). By this measurement, the amplified cDNAs are complex enough to contain the 5' end of six out of six genes expressed at
<0.1% level. Interestingly, in some experiments, fewer of these genes are detectable in the first strand cDNA preparation before amplification, possibly due to insufficient template for detection by the PCR method employed therein. Although it is unlikely that PCR amplification increases the complexity of the inserts, this last observation suggests that the complexity is not grossly reduced by amplification. On the contrary, insert amplification may be important to counter inefficiency inherent in the ligation and transformation steps of library making.
Example 13 - ANALYSIS OF LIBRARY CHARACTERISTICS BY SEQUENCING
This example shows that at all the steps of the generation of cDNAs by methods of the invention, the cDNAs exhibit unidirectional characteristics.
E. coli clones from three libraries constructed using the above method have been analyzed by randomly picking colonies and sequencing their inserts. By comparing with known databases, each sequence that is the same or significantly homologous to a known and transcribed sequence (including untranslated RNAs such as rRNAs and Alu-like RNAs) is examined to see if the sequence is in the sense orientation, and if so, at the 5' end. The best statistics belong to a cDNA library from rat brain mRNA: 10 out of 10 sequences are in the sense orientation, and 8 out of 10 contain the 5' ends. The other two sequences contain 18S rRNA, a template likely to be sheared or high in secondary structures. A second library from prostate tumor mRNA has 9 out of 9 in the sense orientation, and 3 out of 9 contain the 5' ends; the 6 remaining include one 18S rRNA and three very long genes, so that shearing of the template could be a factor. A third library also from prostate tumor mRNA has the worst statistics with 7 out of 9 in the sense orientation, and 2 out of those 7 containing the 5' ends. Again, two rRNAs are among those that are not at the 5 ' end. The difference in quality between these libraries may be due to the fact that primer to mRNA molar ratio could not be ascertained for the latter two due to the scarcity of the prostate tumor mRNA.
The above statistics demonstrate that the method generates cDNA libraries which are highly directional, and highly enriched for 5' end genes. The data also demonstrate the reproducibility of the method. From the foregoing description, one skilled in the art can easily ascertain the essential characteristics of this invention, and without departing from the spirit and scope thereof, can make changes and modifications of the invention to adapt it to various usage and conditions.
Without further elaboration, it is believed that one skilled in the art can, using the preceding description, utilize the present invention to its fullest extent. The preceding preferred specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever.
The entire disclosure of all applications, patents and publications, cited above and in the figures are hereby incorporated by reference.

Claims

We claim:
1. A method to generate first strand cDNAs from an mRNA sample of interest, comprising, a) hybridizing said mRNA with first primer ohgonucleotides, each comprising, 5' to 3',
1) a PCR primer sequence, PCR-A, and
2) one of a set of random sequences having a length of 5 to 40 nucleotides, which are effective for binding to mRNA and priming the synthesis of a cDNA copy, b) synthesizing first strand cDNAs by extending said random sequences hybridized to said mRNA with a reverse transcriptase, and c) separating unbound molecules of said first primer from first strand cDNAs, under conditions effective to produce first strand cDNAs which are effective for unidirectional cloning.
2. The method of claim 1, wherein said separated first strand cDNAs which are effective for unidirectional cloning are substantially free of unbound first primer.
3. The method of claim 1, wherein said separated cDNAs are used for unidirectional cloning.
4. The method of claim 1, wherein said PCR-A sequence is between 14 and 30 nucleotides.
5. The method of claim 1, wherein the length of said random sequences is 6 to 20 nucleotides.
6. The method of claim 1 , wherein the length of said random sequences is 7 to 9 nucleotides.
7. The method of claim 1, wherein said set of random sequences comprises a mixture of sequences of 7, 8 and 9 nucleotides in length.
8. The method of claim 1 , wherein the ratio of first primer ohgonucleotides to mRNA molecules is about 7:1 to 25:1.
9. The method of claim 1 , wherein said separating is carried out by a separating column and gel electrophoresis.
10. The method of claim 1, wherein said PCR-A sequence comprises a restriction enzyme region, RE-A.
11. The method of claim 10, wherein said RE-A region comprises at least one complete restriction enzyme site.
12. The method of claim 10, wherein said RE-A region comprises at least one partial restriction enzyme site.
13. The method of claim 10, wherein said RE-A region comprises at least one complete restriction enzyme site and at least one partial restriction enzyme site.
14. The method of claim 10, wherein non-restriction site sequences of PCR-A are contiguous and 5 ' or 3' to said RE-A region.
15. First strand cDNAs generated from an mRNA sample of interest, wherein the nucleotide sequence of the 5' end of each cDNA molecule is, 5' to 3',
1 ) a PCR primer sequence, PCR-A, and
2) one of a set of two or more random sequences having a length of about 5 to about 40 nucleotides, which are effective for binding to mRNA and priming the synthesis of a cDNA copy, wherein said first strand cDNAs are substantially free of unbound molecules of primer oligonucleotide and are effective for unidirectional cloning.
16. The method of claim 1, further comprising extending the 3' ends of said separated first strand cDNAs by adding, 5' to 3',
1) the sequence CCC, and
2) the reverse complement of a sequence PCR-B, which comprises, optionally, at least a portion of the reverse complement of a restriction enzyme region, RE-B.
17. The method of claim 1, wherein said reverse transcriptase adds terminal C's to the 3' ends of first strand cDNAs, said method further comprising, a) hybridizing separated first strand cDNAs containing 3' terminal C's to a second PCR primer oligonucleotide which comprises, 5' to 3',
1) a second PCR primer sequence, PCR-B, and
2) the sequence GGG, wherein said GGG is effective to hybridize to said 3' terminal C's, and b) incubating with reverse transcriptase, thereby generating extended first strand cDNAs which comprise at their 3' ends CCC followed by the reverse complement of PCR-B.
18. The method of claim 17, wherein said PCR-B sequence comprises a restriction enzyme region, RE-B.
19. The method of claim 18, wherein said RE-B region comprises at least one complete restriction enzyme site.
20. The method of claim 18, wherein said RE-B region comprises at least one partial restriction enzyme site.
21. The method of claim 18, wherein said RE-B region comprises at least one complete restriction enzyme site and at least one partial restriction enzyme site.
22. The method of claim 18, wherein non-restriction site sequences of PCR-B are contiguous and 5 ' or 3' to said RE-B region.
23. The method of claim 17, wherein said second PCR primer oligonucleotide is 5'-TACGGCTGCGAGAAGACGACAGAAGGG-3'.
24. The method of claim 17, further comprising separating molecules of said second primer oligonucleotide from said extended first strand cDNAs, under conditions effective to produce extended first strand cDNAs which are effective for unidirectional cloning.
25. The method of claim 24, wherein said separated extended first strand cDNAs are substantially free of said second primer.
26. The method of claim 24, wherein said separating is carried out by a separating column and gel electrophoresis.
27. The method of claim 24 further comprising amplifying said separated extended first strand cDNA by PCR.
28. The method of claim 27, further comprising introducing restriction enzyme sites to each end of said separated extended first strand cDNA during the PCR procedure.
29. A method to generate from an mRNA sample of interest a directional cDNA library which is enriched for the 5' ends of expressed genes comprising, a) hybridizing said mRNA with first primer ohgonucleotides, each comprising, 5' to 3',
1 ) a PCR primer sequence, PCR-A, and
2) one of a set of random sequences having a length of 7, 8 or 9 nucleotides, which are effective for binding to mRNA and priming the synthesis of a cDNA copy, b) synthesizing first strand cDNAs by extending said random sequences hybridized to said mRNA with reverse transcriptase, wherein said reverse transcriptase adds terminal C's to the 3' ends of cDNAs, c) separating substantially all unbound molecules of said first primer ohgonucleotides from said first strand cDNAs with a separating column and gel electrophoresis, d) hybridizing said separated first strand cDNAs to a second PCR primer oligonucleotide comprising, 5' to 3', a second PCR primer sequence, PCR-B, and the sequence GGG, wherein said GGG hybridizes to said 3' terminal C's, thereby creating complexes, e) incubating said complexes with reverse transcriptase under conditions effective to produce extended first strand cDNAs which comprise, at their 3' ends, CCC followed in the 3' direction by the reverse complement of PCR-B, f) separating substantially all molecules of said second primer oligonucleotide from said extended first strand cDNAs with a separating column and gel electrophoresis, g) amplifying said separated extended first strand cDNAs by PCR, using as primers a third primer oligonucleotide which comprises, 5' to 3', a first restriction enzyme site and said PCR-B sequence, and a fourth primer which comprises, 5' to 3', a second restriction enzyme site and the reverse complement of said PCR-A sequence, h) digesting said amplified cDNAs with restriction enzymes specific for said first and second restriction enzyme sites, and i) selecting one or more of said amplified cDNAs by gel electrophoresis.
30. The method of claim 29, wherein said first restriction site is Asc I and said second restriction site is Fse I.
31. A method for constructing a directional EST library enriched for the 5' ends of expressed genes, comprising, a) generating double stranded cDNA molecules which are enriched for the 5' ends of expressed genes, wherein each of said double stranded cDNAs is bounded by two different restriction enzyme sites, A and B, and wherein the B sites lie at the ends of cDNA sense strands corresponding to the 5' end of said expressed genes, and the A sites lie at the opposite ends of the sense cDNA strands, and b) ligating said double stranded cDNA molecules directionally into a vector.
32. The method of claim 30, further comprising amplifying said double stranded cDNA molecules by PCR before ligating them into said vector.
33. A method for constructing a directional cDNA library enriched for cDNAs coding for signal peptides, comprising, a) generating double stranded cDNA molecules which are enriched for the 5' ends of expressed genes, wherein each of said double stranded cDNAs is bounded by two different restriction enzyme sites, A and B, and wherein the B sites lie preferentially at the ends of cDNA sense strands corresponding to the 5' end of said expressed genes, and the A sites lie preferentially at the opposite ends of the sense cDNA strands, b) ligating said double stranded cDNA molecules into a yeast vector, wherein said vector comprises, 5' to 3',
1) stop codons in each of three reading frames, just upstream of
2) a cDNA insertion region wherein a site for restriction enzyme B lies 5' of one for restriction enzyme A, and
3) a sequence encoding a non-secreted yeast invertase, and c) transfecting said vectors comprising ligated DNA into yeast and selecting for clones which secrete yeast invertase.
34. The method of claim 33, further comprising amplifying said double stranded cDNA molecules by PCR before ligating them into said vector.
35. The method of claim 24, further comprising, a) generating double stranded cDNA molecules from said separated, extended first strand cDNAs, wherein each of said double stranded cDNAs is bounded by two different restriction enzyme sites, A and B, and wherein the B sites lie preferentially at the ends of cDNA sense strands corresponding to the 5' end of said expressed genes, and the A sites lie preferentially at the opposite ends of the sense cDNA strands, b) ligating said double stranded cDNA molecules into a yeast vector, wherein said vector comprises, 5' to 3',
1) stop codons in each of three reading frames, just upstream of 2) a cDNA insertion region wherein a site for restriction enzyme B lies 5' of one for restriction enzyme A, and
3) a sequence encoding a non-secreted yeast invertase, and c) transfecting said vectors comprising ligated DNA into yeast and selecting for clones which secrete yeast invertase.
36. The method of claim 35, wherein said double stranded cDNA molecules are amplified by PCR before they are ligated into said vector.
37. A yeast cloning vector which comprises, 5' to 3',
1) stop codons in each of three reading frames, just upstream of
2) a cDNA insertion region wherein a site for restriction enzyme B lies 5' of a site for restriction enzyme A, and
3) a sequence encoding a non-secreted yeast invertase.
38. The yeast cloning vector of claim 37, wherein said restriction site A is Fse I and said restriction site B is Asc I.
39. A method of expressing a protein, comprising culturing a cell comprising a vector of claim 35, wherein a cDNA sequence encoding said protein is cloned into said cDNA insertion region.
40. A directional cDNA library which is enriched for the 5' ends of expressed genes, produced by the method of claim 29.
41. A directional cDNA library enriched for cDNAs coding for signal peptides, produced by the method of claim 33.
42. A directional cDNA library enriched for cDNAs coding for signal peptides, produced by the method of claim 35.
PCT/US2000/020541 1999-07-29 2000-07-28 5'ENRICHED cDNA LIBRARIES AND A YEAST SIGNAL TRAP WO2001009310A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU63863/00A AU6386300A (en) 1999-07-29 2000-07-28 5'enriched cdna libraries and a yeast signal trap

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14597499P 1999-07-29 1999-07-29
US60/145,974 1999-07-29
US09/628,178 2000-07-28

Publications (1)

Publication Number Publication Date
WO2001009310A1 true WO2001009310A1 (en) 2001-02-08

Family

ID=22515385

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/020541 WO2001009310A1 (en) 1999-07-29 2000-07-28 5'ENRICHED cDNA LIBRARIES AND A YEAST SIGNAL TRAP

Country Status (1)

Country Link
WO (1) WO2001009310A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002044399A2 (en) * 2000-11-28 2002-06-06 Rosetta Inpharmatics, Inc. In vitro transcription method for rna amplification
EP1275734A1 (en) * 2001-07-11 2003-01-15 Roche Diagnostics GmbH Method for random cDNA synthesis and amplification
EP1275738A1 (en) * 2001-07-11 2003-01-15 Roche Diagnostics GmbH Method for random cDNA synthesis and amplification
US8268987B2 (en) 2005-12-06 2012-09-18 Applied Biosystems, Llc Reverse transcription primers and methods of design
EP2912197A4 (en) * 2012-10-24 2016-07-13 Clontech Lab Inc Template switch-based methods for producing a product nucleic acid
US9719136B2 (en) 2013-12-17 2017-08-01 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same
US10781443B2 (en) 2013-10-17 2020-09-22 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0607054A2 (en) * 1993-01-14 1994-07-20 Honjo, Tasuku Novel process for constructing a cDNA library and a novel polypeptide and DNA coding the same
US5536637A (en) * 1993-04-07 1996-07-16 Genetics Institute, Inc. Method of screening for cDNA encoding novel secreted mammalian proteins in yeast
WO1996040904A1 (en) * 1995-06-07 1996-12-19 Zymogenetics, Inc. Secretion leader trap cloning method
WO1997040146A1 (en) * 1996-04-24 1997-10-30 Genetics Institute, Inc. Yeast invertase gene as reporter system for isolating cytokines
WO1999049028A1 (en) * 1998-03-23 1999-09-30 Genentech, Inc. Method of selection for genes encoding secreted and transmembrane proteins

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0607054A2 (en) * 1993-01-14 1994-07-20 Honjo, Tasuku Novel process for constructing a cDNA library and a novel polypeptide and DNA coding the same
US5536637A (en) * 1993-04-07 1996-07-16 Genetics Institute, Inc. Method of screening for cDNA encoding novel secreted mammalian proteins in yeast
US5712116A (en) * 1993-04-07 1998-01-27 Genetics Institute, Inc. Method for isolating cytokines and other secreted proteins
WO1996040904A1 (en) * 1995-06-07 1996-12-19 Zymogenetics, Inc. Secretion leader trap cloning method
WO1997040146A1 (en) * 1996-04-24 1997-10-30 Genetics Institute, Inc. Yeast invertase gene as reporter system for isolating cytokines
WO1999049028A1 (en) * 1998-03-23 1999-09-30 Genentech, Inc. Method of selection for genes encoding secreted and transmembrane proteins

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
BARAK I ET AL: "CONSTRUCTION OF A PROMOTER-PROBE SHUTTLE VECTOR FOR ESCHERICHIA-COLI AND BREVIBACTERIA", GENE (AMSTERDAM), vol. 95, no. 1, 1990, pages 133 - 135, XP002154997, ISSN: 0378-1119 *
BAUER ET AL: "IDENTIFICATION OF DIFFERENTIALLY EXPRESSED mRNA SPECIES BY AN IMPROVED DISPLAY TECHNIQUE", NUCLEIC ACIDS RESEARCH,GB,OXFORD UNIVERSITY PRESS, SURREY, vol. 21, no. 18, 11 September 1993 (1993-09-11), pages 4272 - 4280, XP002086977, ISSN: 0305-1048 *
BROSIUS J: "PLASMID VECTORS FOR THE SELECTION OF PROMOTERS", GENE (AMSTERDAM), vol. 27, no. 2, 1984, pages 151 - 160, XP002154996, ISSN: 0378-1119 *
FROHMAN M A ET AL: "RAPID PRODUCTION OF FULL-LENGTH CDNAS FROM RARE TRANSCRIPTS: AMPLIFICATION USING A SINGLE GENE-SPECIFIC OLIGONUCLEOTIDE PRIMER", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA,US,NATIONAL ACADEMY OF SCIENCE. WASHINGTON, vol. 85, 1 December 1988 (1988-12-01), pages 8998 - 9002, XP000604678, ISSN: 0027-8424 *
FROHMAN M A: "RACE RAPID AMPLIFICATION OF COMPLEMENTARY DNA ENDS", INNIS, M. A., ET AL. (ED.). PCR PROTOCOLS: A GUIDE TO METHODS AND, 1990, USA;LONDON, ENGLAND, UK. ILLUS. ISBN 0-12-372181-4(PAPER); ISBN 0-12-372180-6(CLOTH). 1990, pages 28 - 38, XP002154993 *
IMANAKA T ET AL: "CONSTRUCTION OF HIGH INTERMEDIATE AND LOW-COPY-NUMBER PROMOTER-PROBE PLASMIDS FOR BACILLUS-SUBTILIS", GENE (AMSTERDAM), vol. 43, no. 3, 1986, pages 231 - 236, XP002154995, ISSN: 0378-1119 *
JACOBS K A ET AL: "A genetic selection for isolating cDNAs encoding secreted proteins", GENE: AN INTERNATIONAL JOURNAL ON GENES AND GENOMES,GB,ELSEVIER SCIENCE PUBLISHERS, BARKING, vol. 198, no. 1-2, 1 October 1997 (1997-10-01), pages 289 - 296, XP004116069, ISSN: 0378-1119 *
KLEIN R D ET AL: "SELECTION FOR GENES ENCODING SECRETED PROTEINS AND RECEPTORS", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA,US,NATIONAL ACADEMY OF SCIENCE. WASHINGTON, vol. 93, no. 14, 9 July 1996 (1996-07-09), pages 7108 - 7113, XP002061411, ISSN: 0027-8424 *
PROMEGA, PRODUCT CATALOGUE, 1998, US, pages 22 - 23, XP002155070 *
SHEN HAO ET AL: "Construction of a Tn7-lux system for gene expression studies in gram-negative bacteria.", GENE (AMSTERDAM), vol. 122, no. 1, 1992, pages 27 - 34, XP002154994, ISSN: 0378-1119 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002044399A2 (en) * 2000-11-28 2002-06-06 Rosetta Inpharmatics, Inc. In vitro transcription method for rna amplification
WO2002044399A3 (en) * 2000-11-28 2003-03-13 Rosetta Inpharmatics Inc In vitro transcription method for rna amplification
US7229765B2 (en) 2000-11-28 2007-06-12 Rosetta Inpharmatics Llc Random-primed reverse transcriptase-in vitro transcription method for RNA amplification
EP1275734A1 (en) * 2001-07-11 2003-01-15 Roche Diagnostics GmbH Method for random cDNA synthesis and amplification
EP1275738A1 (en) * 2001-07-11 2003-01-15 Roche Diagnostics GmbH Method for random cDNA synthesis and amplification
US8268987B2 (en) 2005-12-06 2012-09-18 Applied Biosystems, Llc Reverse transcription primers and methods of design
US8809513B2 (en) 2005-12-06 2014-08-19 Applied Biosystems, Llc Reverse transcription primers and methods of design
US9410173B2 (en) 2012-10-24 2016-08-09 Clontech Laboratories, Inc. Template switch-based methods for producing a product nucleic acid
EP2912197A4 (en) * 2012-10-24 2016-07-13 Clontech Lab Inc Template switch-based methods for producing a product nucleic acid
US11001882B2 (en) 2012-10-24 2021-05-11 Takara Bio Usa, Inc. Template switch-based methods for producing a product nucleic acid
US10781443B2 (en) 2013-10-17 2020-09-22 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same
US10941397B2 (en) 2013-10-17 2021-03-09 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same
US10954510B2 (en) 2013-10-17 2021-03-23 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same
US9719136B2 (en) 2013-12-17 2017-08-01 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same
US10415087B2 (en) 2013-12-17 2019-09-17 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same
US11124828B2 (en) 2013-12-17 2021-09-21 Takara Bio Usa, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same

Similar Documents

Publication Publication Date Title
US5595895A (en) Efficient directional genetic cloning system
JP2843675B2 (en) Identification, isolation and cloning of messenger RNA
US6740745B2 (en) In vitro amplification of nucleic acid molecules via circular replicons
US5834202A (en) Methods for the isothermal amplification of nucleic acid molecules
US7615625B2 (en) In vitro amplification of nucleic acid molecules via circular replicons
AU779060B2 (en) cDNA synthesis improvements
US5962271A (en) Methods and compositions for generating full-length cDNA having arbitrary nucleotide sequence at the 3&#39;-end
US6406891B1 (en) Dual RT procedure for cDNA synthesis
US6777180B1 (en) Method for full-length cDNA cloning using degenerate stem-loop annealing primers
WO1996023904A1 (en) Methods for the isothermal amplification of nucleic acid molecules
EA020657B1 (en) Tailored multi-site combinatorial assembly
US6461814B1 (en) Method of identifying gene transcription patterns
US5891637A (en) Construction of full length cDNA libraries
WO2001009310A1 (en) 5&#39;ENRICHED cDNA LIBRARIES AND A YEAST SIGNAL TRAP
JP4403069B2 (en) Methods for using the 5 &#39;end of mRNA for cloning and analysis
KR100762261B1 (en) Process for preparation of full-length cDNA and anchor and primer used for the same
US6703239B2 (en) Nucleic acid encoding a fusion protein comprising an EIF-4E domain and an EIF-4G domain joined by a linker domain
JP2003169672A (en) Method for processing library by using ligation inhibition

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP