WO1998014619A1 - Methods for generating and analyzing transcript markers - Google Patents

Methods for generating and analyzing transcript markers Download PDF

Info

Publication number
WO1998014619A1
WO1998014619A1 PCT/US1997/018344 US9718344W WO9814619A1 WO 1998014619 A1 WO1998014619 A1 WO 1998014619A1 US 9718344 W US9718344 W US 9718344W WO 9814619 A1 WO9814619 A1 WO 9814619A1
Authority
WO
WIPO (PCT)
Prior art keywords
cdna
transcript
markers
cdna library
transcript markers
Prior art date
Application number
PCT/US1997/018344
Other languages
French (fr)
Inventor
Bruce B. Wang
Alicia Chung
Karl J. Guegler
Zhi Yang
Benjamin Graeme Cocks
Susan G. Stuart
Original Assignee
Incyte Pharmaceuticals, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Pharmaceuticals, Inc. filed Critical Incyte Pharmaceuticals, Inc.
Priority to AU47533/97A priority Critical patent/AU4753397A/en
Publication of WO1998014619A1 publication Critical patent/WO1998014619A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR

Definitions

  • the present invention relates generally to the field of molecular biology and specifically to rapid, high-throughput gene discovery methods that facilitate genome closure and to methods for analyzing gene expression patterns.
  • Elucidation of gene expression represents another level of complexity equally important to the elucidation of genetic structure.
  • the generation of a gene expression pattern can be used directly as a diagnostic profile or as a gene discovery method.
  • Seilhamer et al. (WO 95/20681, filed January 27, 1995) disclose methods for the high-throughput sequence-specific analysis of cDNAs and generation of transcript images.
  • Matsubara et al (WO 95/14772, filed November 11,1994) disclose methods for generating 3' directed cDNA libraries which accurately reflect the abundance ratio of mRNA in a cell. Velculescu et al.
  • 5-azadCyd has been used experimentally as a DNA methylation inhibitor to induce gene expression and cellular differentiation (Juttermann et al. 1994, Proc. Natl. Acad. Sci. USA 91 : 11797-11801).
  • the present invention relates, in part, to rapid, high-throughput gene discovery methods that facilitate genome closure and to methods for analyzing gene expression patterns.
  • the present invention also relates to methods for the rapid, sequence-specific identification of transcripts derived from an mRNA population.
  • the present invention further relates to methods for extending the nucleotide sequences of partial transcripts in a high-throughput manner using polymerase chain reaction technology.
  • the present invention provides rapid, high-throughput methods for generating and analyzing transcript markers from the 5' most end of cDNAs and methods for generating and analyzing two, discontinuous transcript markers from a single cDNA providing the advantage of obtaining more information from a single transcript than is possible by current methods.
  • the two discontinuous markers are derived from both the 5' and 3' ends of a single cDNA and in another embodiment of the present invention, the two discontinuous markers are derived from random areas of the cDNA.
  • a method for generating and analyzing transcript markers from the 5' most end of individual cDNAs of a cDNA library comprising the steps of obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site for a restriction endonuclease that digests the cDNA at the 5' most end within an expected distance from its recognition site and a second endonuclease restriction site; subjecting the cDNAs to digestion with the first and the second restriction endonuclease thereby excising transcript markers from the 5' most end of the individual cDNAs; ligating said transcript markers to a vector; transforming said vector containing the transcript markers in a host cell; culturing said host cells; and performing nucleic acid sequence analysis of the transcript markers.
  • This method can be used for gene discovery purposes or transcript imaging purposes. This method can be combined with PCR based technology for the rapid nucleic acid sequence analysis of the transcript markers.
  • a method for generating and analyzing non-contiguous transcript markers derived from the 5' most and 3' most end of individual cDNAs comprising the steps of obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site at the 5' most end and at the 3' most end for a restriction endonuclease that digests nucleic acid within an expected distance from the first endonuclease recognition site, and a second endonuclease restriction site; subjecting the cDNAs to digestion with the first endonuclease thereby creating linearized cDNAs containing transcript markers from the 5' most end and the 3' most end of the individual cDNAs; self-ligating said linearized cDNAs thereby joining the transcript markers from the 5' most and 3' most ends to create cDNAs containing non-contiguous transcript markers; transforming said linearized cDNAs in a host cell and culturing said
  • a method for the rapid, sequence- specific identification of cDNAs derived from a human mRNA population comprising the steps of obtaining a cDNA library comprising individual cDNAs wherein said cDNAs contain first restriction endonuclease sites for an endonuclease having a 4 base pair recognition site and wherein said cDNAs are cloned into a first vector lacking the first restriction endonuclease sites; subjecting the cDNA library to digestion with the first restriction endonuclease thereby creating linearized cDNAs containing a portion of the original cDNA; ligating an adapter to said linearized cDNAs wherein the adapter contains a second restriction endonuclease site for an endonuclease that cleaves within an expected range of its recognition site thereby creating cDNAs containing two non-contiguous transcript markers joined by the adapter; digesting the cDNAs with the second restriction endonu
  • the present invention also provides novel methods and vectors used in preparation of cDNA libraries containing transcript markers.
  • the present invention also provides novel, high- throughput methods for obtaining complete nucleotide sequence information on transcript markers.
  • cDNA libraries containing a serial arrangement of multiple transcript markers are constructed and subjected to high-throughput nucleic acid sequence analysis in a multi-well format using polymerase chain reaction (PCR) technology and specialized PCR primers.
  • PCR polymerase chain reaction
  • the cDNA libraries are constructed to bias the cDNA population toward rare cDNAs.
  • a cDNA library is constructed by normalization techniques and in another, the cDNA library is constructed using subtractive hybridization techniques.
  • the cDNA library has been constructed from mRNA treated with demethylating agents, such as, 5-aza-2' deoxycytidine and 5-azacytidine, which induces the transcription of silent genes.
  • the cDNA library has been constructed using oligo dT primers and in another aspect, the cDNA library is constructed using random primers.
  • Figure 1 illustrates a general schematic of cDNA library construction.
  • FIG 2 illustrates Array of Transcript Markers (ATM) Strategy 1 for constructing a cDNA library containing an array of transcript markers derived from the 5' most end of a cDNA (5' transcript markers).
  • ATM Transcript Markers
  • Figure 3 illustrates ATM Strategy II for constructing a cDNA library containing an array of 5' transcript markers.
  • Figures 4A-4B illustrate ATM Strategy III for constructing a cDNA library containing an array of 5' transcript markers.
  • Figures 5A-5B illustrate ATM Strategy IV for constructing a cDNA library containing an array of 5' transcript markers.
  • Figures 6A-6B illustrate ATM Strategy V for constructing a cDNA library containing an array of 5' transcript markers.
  • Figure 7 illustrates a strategy for simultaneously obtaining two non-contiguous transcript markers from both the 5' and 3' end of cDNA.
  • Figure 8 illustrates a strategy for obtaining non-contiguous transcript markers from random areas of a cDNA.
  • Figure 9 illustrates a sequence specific approach to the identification of all transcribed genes from an mRNA population.
  • Figure 10 is a list of 12 known sequences identified by the method illustrated in Figure 2.
  • Figure 11 illustrates the toxicity curve for THP1 cells treated for 3 days with 5-aza-2'- cytidine.
  • Figures 12A-12B illustrate examples of Type IIs restriction endonucleases useful for generating ATM libraries.
  • Bpml, Bsgl and Eco57I are three restriction endonuclease that cleave 16 bp away from their recognition site with a 2 nucleotide 3' overhang.
  • Bpml is illustrated in Figure 12A.
  • N's represent nucleotides found adjacent to the restriction site (e.g., the 5' end of a cDNA).
  • Treatment with T4 DNA polymerase removes the 3 1 extension which leaves 14 bp derived from the adjacent sequence.
  • Figure 12B illustrates that the restriction endonuclease BSMFI cleaves 14 bp away from its recognition site with a 4 nucleotide 5 1 overhang. This end may be filled in by treatment with a DNA polymerase which results in 14 bp derived from adjacent sequence.
  • Figure 13 illustrates a vector useful in the strategy of Figure 8 which has all restriction endonuclease sites having a 4 base pair recognition site removed.
  • Figures 14A-14B illustrate a typical concatenated array of transcript markers.
  • Figure 15 illustrates an abundance profile of THP cells treated with 5-aza-2' deoxycytidine.
  • the black-shaded area represents control clones
  • the gray shaded area represents 5-aza-2' deoxycytidine treated cells
  • the gray line represents the abundance profile of the 5- aza-2' deoxycytidine treated cells.
  • transcript marker derived from a cDNA library refers to an isolated polynucleotide derived from an individual cDNA and being preferably from about 10 base pairs to about 20 base pairs in length derived from an individual cDNA.
  • non-contiguous transcript marker from an individual cDNA refers to two polynucleotides which are not adjacent to one another under naturally occurring conditions, but which are constructed to exist in tandem, with each polypeptide being preferably from about 10 base pairs to about 20 base pairs in length.
  • the term "3' transcript marker” or “transcript marker from the 3' most end of a cDNA” refers to an isolated polynucleotide derived from the 3' most end of an individual cDNA and being preferably from about 10 base pairs to about 20 base pairs in length.
  • the term “5' transcript marker” or “5 1 transcript marker derived from a cDNA library” refers to an isolated 5' most nucleic acid sequence of a cDNA which is preferably about 10 base pairs to about 20 base pairs in length.
  • “5' most” means that a 5' transcript marker may represent the 5' end of the full-length coding region of a cDNA and may include 5' untranslated sequences. Alternatively, a 5' most transcript marker may represent an internal coding region of a individual cDNA. Each 5' transcript marker can reflect the expression of an individual cDNA.
  • adapter refers to a synthetic fragment of nucleic acid which is ligated to a cDNA and which may contain recognition sites for restriction endonucleases.
  • the term "5' adapter” refers to a synthetic fragment of nucleic acid which is ligated onto the 5' end of cDNAs prior to ligation to a vector.
  • the 5' adapter contains a first restriction endonuclease site which digests nucleic acid at a expected distance from its enzymatic recognition site and at least one other restriction endonuclease site.
  • the second restriction endonuclease site is for a restriction endonuclease which digests the cDNA to form blunt ends or 5' or 3' overhangs.
  • digestion of the cDNA with the first and second restriction enzymes excises transcript markers which are at least 20 base pairs in length and contain at least 14 base pairs of cDNA sequence and 6 base pairs of adapter sequence.
  • array(s) of transcript markers or “ATM” refers to the collection or serial arrangement of transcript markers prepared by concatenation of individual transcript markers.
  • an "ATM cDNA library” is a cDNA library containing transcript markers as inserts. The inserts may be individual inserts, multiple inserts or a serial arrangement of multiple inserts which may be in sense or antisense orientation.
  • full-length coding region refers to the cDNA sequence for the entire transcribed mRNA for a particular protein from initiating methionine to the poly A tail.
  • type IIs restriction endonuclease or “type IIs enzyme” refers to that category of restriction endonucleases that digests nucleic acid at an expected distance from the enzymatic recognition site.
  • Preferred examples of type IIs enzymes for use in the present invention are those which digest nucleic acid at least 10 base pairs from the recognition site or which digest nucleic acid less than 10 base pairs from the recognition site but can be filled in enzymatically to give a 10 base pair transcript marker.
  • Type IIs enzymes include, but are not limited to Bpml, Bsgl Eco57I and BsmFI.
  • Examples of type IIs restriction endonucleases is illustrated in Figures 12A-12B. As will be understood by those of skill in the art, it is possible to alter enzymatic digestion conditions to change the nucleic acid digestion position of the type IIs restriction endonuclease.
  • type Ilsg restriction endonuclease or "type Ilsg enzyme” refers to that category of restriction endonucleases that digests nucleic acid within an expected range of its enzymatic recognition site.
  • type Ilsg restriction enzymes include, but are not limited to Bcgl.
  • the term “concatenating” refers to the process of ligating multiple transcript markers prior to ligation in a cloning vector.
  • normalized cDNA library or “normalized library” refers to a cDNA library constructed in such a manner as to reduce the redundancy in high- level abundance cDNAs.
  • the term "subtractive hybridization” when referring to construction of a cDNA library refers to a process wherein a first population of nucleic acid is hybridized with a second labelled population of nucleic acid (driver) and the resultant nucleic acid hybrids removed to completion thereby identifying and isolating a set of nucleic acid sequences unique to the first population of nucleic acid which may be used in the construction of a cDNA library.
  • the term "selected set" of random primers refers to a set of primers designed to anneal to a specific region of mRNA.
  • the selected set of random primers is designed to anneal to the 5 1 most region of mRNA used as starting material for cDNA synthesis.
  • transcript imaging refers to a method of determining the relative abundance of individual transcript markers in a cDNA library.
  • relative abundance refers to the number of times an individual transcript marker appears relative to the total number of transcript markers identified.
  • non-amplified growth of host cells refers to growth conditions which allow for uniform growth of recombinant host cells.
  • high abundance or high-level abundance messages exist at greater than 10,000 copies per cell.
  • mid abundance or “mid-level abundance” species exist at 100 to 400 copies per cell.
  • low abundance or low-level abundance species are found at less than 15 copies per cell. This low-abundance class represents 20 to 50 percent of the unique transcripts in the cell.
  • bubble primer refers to a sequencing primer degenerate at the 3'-end to allow sequencing for all three possible bases following the poly A tail.
  • the present invention relates to rapid, high-throughput gene discovery methods that facilitate genome closure and to methods for analyzing gene expression patterns.
  • the present invention is based, in part, upon the discovery of methods for the generation of transcript markers derived from the 5' end of cDNAs contained within a cDNA library.
  • the present invention is also based, in part, upon the discovery of methods for the generation of two discontinuous transcript markers derived from both the 5' and 3' ends of cDNAs contained within a cDNA library.
  • the transcript markers of the present invention are derived from previously constructed cDNA libraries.
  • the nucleotide information contained within transcript markers can be used to identify novel transcripts or to provide the basis for the design of PCR primers useful in extending and identifying a transcript marker nucleotide sequence contained within a cDNA library.
  • CDNA libraries of the present invention can be constructed by methods which equalize the population of cDNAs or bias the population of cDNAs toward rare cDNAs, e.g. by normalization techniques, subtraction techniques, treatment with demethylating agents and treatment with differentiating agents, for example.
  • the present invention also provides novel methods for achieving genome closure which combine rapid nucleic acid sequence identification of transcript markers from a cDNA with subsequent extension of the sequence of the identified transcript markers using PCR technology. Methods of the present invention may also be used to provide a transcript image of specific tissues or biological samples. II. Construction of cDNA libraries
  • CDNA libraries for use in the methods of the present invention may be prepared by methods described in Maniatis et al. (1982, Molecular Cloning: a Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York) or any means known to those of skill in the art.
  • CDNA libraries may be constructed to bias the population of individual cDNAs toward desired coding regions.
  • a cDNA library constructed with oligo dT primers would be expected to contain nucleic acid sequences from the 3' coding region of a cDNA, as well as 3' untranslated regions. Additionally, a cDNA library constructed with oligo dT primers may also contain part or all of the entire coding region of a particular cDNA.
  • a cDNA library constructed with random primers would be expected to contain nucleic acid sequences from all parts of the coding region of a cDNA, including the 5' most nucleic acid sequence of the full-length coding region.
  • CDNA libraries constructed with selected sets of random primers such as primers which specifically prime first strand cDNA synthesis toward the 5' end, such as with the incorporation of CapFinderTM PCR construction kit (Clontech K1051-1), are desirable for obtaining 5' transcript markers from the 5' most end of a putative transcript or cDNA.
  • preferred methods for the construction of cDNA libraries include the use of random primers or a selected set of random primers designed to prime cDNA synthesis from the 5' end of the mRNA; the use of cell lines treated with demethylating compounds, such as, 5-aza-2'-deoxycytidine, as starting material for the preparation of mRNA; the use of cell lines treated with compounds which induce differentiation, such as retinoic acid; normalization or equalization methods; the use of subtractive hybridization techniques in the construction of cDNA libraries; and any method that induces the transcription of silent genes or allows for the identification of rare cDNAs, such as size fractionation of specific cDNA populations expected to contain rare cDNAs.
  • Preferred methods for the construction of cDNA libraries intended for transcript imaging purposes are those which produce an unbiased population of cDNAs and would include a step for the non-amplified or uniform growth of host cells used in constructing the cDNA library. Such growth conditions are described in Current Protocols in Molecular Biology "Amplification of Cosmid and Plasmid Libraries" Unit 5.10. (1987).
  • the prevalence of high-abundance cDNA clones decreases dramatically, clones with mid-level abundance are relatively unaffected, and clones for rare transcripts are effectively increased in abundance.
  • the abundance levels of individual cDNA clones have been equalized by a kinetic re-annealing hybridization. This approach is designed to reduce the initial 10,000- fold variation in individual cDNA frequencies in order to achieve abundances within one order of magnitude while maintaining the overall sequence complexity of the library.
  • CDNA libraries prepared by normalization techniques are not an accurate reflection of the source tissue's gene-expression profile; however cDNA libraries produced by normalization techniques may provide a source of low abundance transcripts and therefore, would be useful for gene discovery purposes.
  • Subtractive hybridization of nucleic acids is a method to isolate the coding sequences of a gene which are differentially expressed such as during development or in disease states.
  • CDNA libraries prepared by subtractive hybridization techniques are described in Rubenstein et al (1990, Nucleic Acids Research. 18:4833-4842); Travis et al. (1988, Proc. Natl. Acad. Sci. USA, 85:1696-1700).
  • Obtaining cDNA libraries from cells treated with 5-aza-2'-deoxycytidine should enhance the discovery of rare genes and genes expressed in specialized cell types from which it is difficult to isolate and/or prepare DNA.
  • a non-toxic concentration of 5-aza-2'-deoxycytidine can be predetermined through titration/toxicity assays as illustrated in Figure 1 1.
  • 5-aza-2'-deoxycytidine has been shown to induce the transcription of silent genes through demethylation (Hsieh et al supra).
  • the preferred amount of 5-aza-2'-deoxycytidine to use is that amount that induces the transcription of silent genes without being toxic to the cell. This method can be coupled to subtractive methods to enhance further the discovery of novel transcripts or genes.
  • FIG. 2 illustrates ATM Strategy 1 for constructing a cDNA library containing an array of transcript markers derived from the 5' most end of a cDNA (5' transcript markers).
  • a cDNA library is constructed using an adapter containing a Bpml site and a PvuII restriction site.
  • the constructed cDNA library is digested with Bpml and PvuII to isolate 20 bp 5' transcript markers (14 base pairs of cDNA and 6 base pairs from the adapter) from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt ends and the markers are concatenated to form an array or serial arrangement of multiple markers.
  • the concatenated array of 5' transcript markers is then used to create a library containing an array of 5' transcript markers.
  • individual transcript markers can be subjected to subtractive hybridization methods to bias the population toward rare transcript markers.
  • FIG. 3 illustrates ATM Strategy II for constructing a cDNA library containing an array of 5' transcript markers.
  • a cDNA library is constructed using an adapter containing a Bpml site and a PvuII restriction site.
  • the constructed cDNA library is digested with Bpml and a second degenerate adapter containing a PvuII site is ligated onto the Bpml site.
  • 25 bp transcript markers are isolated.
  • the isolated transcript markers are concatenated to form arrays, ligated into a vector and transformed into host cells to create a library containing an array of 5' transcript markers.
  • ATM Strategy III as shown in Figures 4A-4B illustrates the construction of a cDNA library containing an array of 5' transcript markers.
  • a cDNA library is constructed using an adapter containing a Bpml site and a PvuII restriction site.
  • the constructed cDNA library is digested with Bpml and a second degenerate adapter containing a PvuII site is ligated onto the Bpml site.
  • Sense RNA is transcribed from the template and the mixture is subjected to DNAse treatment.
  • First and second strand cDNA is synthesized from the template.
  • the double stranded cDNA is digested with PvuII which yields 25 bp transcript markers.
  • the transcript markers are concatenated to form arrays, the arrays are cloned into a vector and the vector transformed into host cells to create a library containing an array of 5' transcript markers.
  • ATM strategy IV illustrates ATM Strategy IV for constructing a cDNA library containing an array of 5' transcript markers.
  • a cDNA library is constructed using an adapter containing a Bpml and PvuII restriction site.
  • the constructed cDNA library is digested with Bpml and PvuII to isolate 20 bp 5' transcript markers from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt ends and the markers are concatenated to form an array or serial arrangement of multiple markers.
  • Adapters are added on to the ends of the transcript marker arrays and the arrays are amplified using PCR technology.
  • the arrays are cloned into a vector to create a library containing arrays of the transcript markers (ATM).
  • FIGS 6A-6B illustrate ATM Strategy V for constructing a cDNA library containing an array of 5' transcript markers.
  • a cDNA library is constructed using an adapter containing a Bpml and PvuII restriction site.
  • the constructed cDNA library is digested with Bpml and PvuII to isolate 20 bp 5' transcript markers from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt ends and the markers are concatenated to form an array or serial arrangement of multiple markers.
  • the arrays are ligated into a plasmid vector, subjected to PCR to amplify the transcript markers, and cloned into a vector to create a cDNA library containing PCR amplified arrays of 5' transcript markers.
  • Figure 7 illustrates a strategy for simultaneously obtaining two non-contiguous transcript markers from both the 5' and 3' end.
  • first strand cDNA synthesis is performed using a modified random hexamer.
  • the hexamer is designed to provide directionality.
  • Second strand cDNA synthesis is performed by standard means and an adapter containing a Bpml and PvuII site is ligated onto both ends of the cDNA.
  • the cDNA is ligated to a vector modified to delete all Bpml restriction sites.
  • the vector containing the cDNA is transformed into a host cell to create a cDNA library and plasmid DNA is isolated and treated with Bpml, thereby creating individual linearized cDNAs containing both a 5' and 3' transcript marker.
  • the linearized cDNA is treated with T4 polymerase to blunt end, self-ligated and re-transformed to create a library.
  • the plasmid DNA of the library is isolated and digested with PvuII to excise the nucleic acid containing non-contiguous transcript markers from both the 5' and 3' end.
  • the excised transcript markers are concatenated and cloned to create a library containing a serial arrangement of non- contiguous transcript markers from the 5' and 3' end which can be subjected to high-throughput nucleic acid sequence analysis.
  • the vector is constructed to contain 5 1 and 3' Bpml sites at a cloning site thereby providing a means to excise the transcript marker from the cDNA.
  • Figure 8 illustrates a strategy for obtaining non-contiguous transcript markers from random areas of a cDNA. In this strategy, mRNA is converted to cDNA following standard procedures.
  • the cDNA is ligated to a vector that has been constructed to have restriction endonuclease sites for restriction endonucleases having 4 base pair recognition sites removed.
  • the cDNA library is amplified and the plasmids prepared.
  • the cDNA library is digested with a restriction endonuclease having a 4 base pair recognition site and the digested plasmid is purified.
  • a 12 base pair adapter containing a Bcgl site, which digests nucleic acid within a range of its recognition site, is ligated onto the linearized cDNA, and the cDNA is transformed into a host cell.
  • the cDNA library is digested with Bcgl resulting in the release of a 36 base pair fragment from each cDNA which originally contained the 4 base pair restriction endonuclease site.
  • the fragments may be ligated into a vector directly or concatenated and ligated into a vector and subjected to sequence specific analysis.
  • the ATM libraries of the present invention may contain a single transcript marker per individual clone or multiple transcript markers constructed in a serial arrangement. As illustrated in the figures, transcript markers may be concatenated, PCR amplified and then ligated into a vector or concatenated, ligated into a vector and then PCR amplified.
  • transcript markers that are excised from the cDNA library contain a cDNA portion and a synthetic adapter portion.
  • the cDNA portion is at least 14 base pairs in length and the adapter portion is 6 base pairs which are designed to be asymmetric.
  • the adapter portion provides the means for determining the sense orientation of the transcript marker in the vector as well as a means for determining the beginning of each transcript marker excised.
  • Nucleic acid sequencing of the excised transcript markers can be performed to create a nucleic acid data set.
  • the nucleic acid adapter portion of the transcript marker can be subtracted or removed from the transcript marker nucleic acid data set.
  • the ATM markers may exist in sense or antisense orientation in the vector.
  • the presence of an nucleic acid fragment which provides directionality, such as an asymmetric adapter portion, provides the means for discerning the sense from the anti-sense stand and provides the means for determining the beginning of each transcript marker.
  • Figure 14A illustrates sequence data from a single clone containing an array of 6 transcript markers. The 6 bp "spacer" DNA sequences that distinguish one transcript marker from another are underlined. The spacer sequence "CTGGAG” indicates that the immediately adjacent 14 bp to the right is a transcript marker (sense strand). The spacer sequence "CTCCAG” indicates that adjacent 14 bp to the left is a transcript marker (antisense strand).
  • Figure 14B provides a list of the six 14 bp transcript markers (sense strand) without the spacer DNA sequence derived from the array in (14a). Asterisks (*) indicate sequences which are the reverse complement of the sequence actually found in the array.
  • Method for Nucleic Acid Sequencing Nucleic acid sequencing of transcript markers can be performed by any means known to those of skill in the art. Methods for cDNA sequencing employ such enzymes as the Klenow fragment of cDNA polymerase I, Sequenase® (US Biochemical Corp, Cleveland OH)), Taq polymerase (Perkin Elmer, Norwalk CT), thermostable T7 polymerase (Amersham, Chicago IL), or combinations of recombinant polymerases and proofreading exonucleases such as the ELONGASE Amplification System marketed by Gibco BRL (Gaithersburg MD).
  • Methods for cDNA sequencing employ such enzymes as the Klenow fragment of cDNA polymerase I, Sequenase® (US Biochemical Corp, Cleveland OH)), Taq polymerase (Perkin Elmer, Norwalk CT), thermostable T7 polymerase (Amersham, Chicago IL), or combinations of recombinant polymerases and proofreading exonucleases such as the ELONG
  • the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown MA) and the ABI 377 cDNA sequencers (Perkin Elmer).
  • machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown MA) and the ABI 377 cDNA sequencers (Perkin Elmer).
  • PCR Methods Numerous PCR methods are known to those of skill in the art that would facilitate isolation, amplification and/or extension of nucleic acid sequences in the 5' or 3' direction.
  • Gobinda et al (1993; PCR Methods Applic 2:318-22) disclose "restriction-site" polymerase chain reaction (PCR) as a direct method which uses universal primers to retrieve unknown sequence adjacent to a known locus.
  • PCR polymerase chain reaction
  • cDNA is amplified in the presence of primer to a linker sequence and a primer specific to the known region.
  • the amplified sequences are subjected to a second round of PCR with the same linker primer and another specific primer internal to the first one.
  • Products of each round of PCR are transcribed with an appropriate RNA polymerase and sequenced using reverse transcriptase.
  • Inverse PCR can be used to amplify or extend sequences using divergent primers based on a known region (Triglia T et al (1988) Nucleic Acids Res 16:8186). Adapters are ligated onto cDNAs which then allow cDNAs to be circularized. The intramolecular ligation products then serve as PCR templates. The method uses several restriction enzymes to generate a suitable fragment in the known region of a gene. The fragment is then circularized by intramolecular ligation and used as a PCR template.
  • Capture PCR (Lagerstrom M et al (1991) PCR Methods Applic 1 : 1 1 1-19) is another method which may be used. The method involves PCR amplification of cDNA fragments adjacent to a known sequence in human and yeast artificial chromosome cDNA. Capture PCR also requires multiple restriction enzyme digestions and ligations to place an engineered double-stranded sequence into an unknown portion of the cDNA molecule before PCR. Another PCR method which may be used to retrieve sequences is that of Parker JD et al (1991 ; Nucleic Acids Res 19:3055-60). Capillary electrophoresis may be used to analyze the size or confirm the nucleotide sequence of sequencing or PCR products. Systems for rapid sequencing are available from Perkin Elmer.
  • Capillary sequencing may employ flowable polymers for electrophoretic separation, four different fluorescent dyes (one for each nucleotide) which are laser activated, and detection of the emitted wavelengths by a charge coupled devise camera. Output/light intensity is converted to electrical signal using appropriate software (eg. GenotyperTM and Sequence NavigatorTM from Perkin Elmer) and the entire process from loading of samples to computer analysis and electronic data display is computer controlled. Capillary electrophoresis is particularly suited to the sequencing of small pieces of cDNA which might be present in limited amounts in a particular sample.
  • the present invention provides novel methods for achieving genome closure by combining rapid nucleic acid sequence identification of transcript markers from a cDNA with subsequent extension of the sequence of the identified transcript markers using PCR technology.
  • Transcript markers constructed by the method illustrated in Figure 8 which have been concatenated and ligated to a vector provide for the sequence specific identification of 15-20 transcript markers per vector.
  • PCR technology is used to extend the sequence of a transcript marker in an outward direction as described in US Patent Application 08/487,112, filed June 7, 1995, specifically incorporated by reference.
  • One primer is synthesized to initiate extension in the antisense direction (XLR) and the other is synthesized to extend sequence in the sense direction (XLF).
  • Primers allow the extension of the known sequence "outward" generating amplified nucleotide sequences containing new, unknown nucleotide sequence for the region of interest.
  • PCR primers which contain the sequence of a known adapter joining the non-contiguous transcript markers and all possible combinations of nucleotides for the 5 nucleotide positions flanking the adapter are used in PCR reactions to extend the sequence of the identified transcript markers using PCR technology.
  • CDNA libraries which have been prepared with oligo dT primers will allow for the identification of the 3' most part of the cDNA and part of the coding region or the complete 5' most end of a cDNA.
  • genome closure can be achieved by the identification of all transcribed genes from an mRNA source.
  • step 1 a cDNA library containing a biotinylated poly A tail and having a Not I restriction site is constructed by standard means and cloned into a vector from which restriction endonuclease sites for 4 base pair restriction endonucleases have been removed. Multiple cDNA libraries derived from a variety of mRNA sources can be pooled to provide one sample.
  • step 2 the cDNA is digested with a restriction endonuclease which has a 4 base pair recognition site and the digested cDNA is captured by streptavidin-beads ( Figure 9 step 3).
  • the cDNA is digested with Not I which removes the biotin and ligated into a vector having a Type IIs restriction endonuclease site for MboII ( Figure 9, step 4).
  • Other Type IIs restriction endonucleases such as Bpml, Bsgl Eco57I and BsmFI, can be used.
  • the cDNA is digested with MboII which cuts 8 base pairs into the cDNA.
  • a known linker of sufficient size for PCR amplification is cloned into the Mbo II restriction site and the cDNA pools are subjected to PCR analysis using the primers described above.
  • PCR-extension products can either be sequenced directly or with a wobble primer, ie, a sequencing primer degenerate at the 3 '-end to allow for all 3 possible bases following the poly A tail, or religated and cloned into a vector and then subjected to nucleic acid sequencing.
  • a wobble primer ie, a sequencing primer degenerate at the 3 '-end to allow for all 3 possible bases following the poly A tail, or religated and cloned into a vector and then subjected to nucleic acid sequencing.
  • Transcript imaging is a method for evaluating changes in gene expression caused by factors such as disease progression, pharmacologic treatment and aging. Transcript imaging is accomplished by sequencing several thousand clones from a particular tissue or cell type and electronically recording the abundance levels for each mRNA species identified. Electronic manipulations can then be done to examine which mRNAs are up- or down-regulated, or unchanged.
  • Transcript imaging can be achieved by using transcript markers produced by the present invention. After nucleic acid sequencing of the transcript markers is accomplished, the abundance levels for each transcript marker are electronically recorded. Electronic manipulations can then be performed to examine which transcript markers are up- or down- regulated, or unchanged.
  • the following example describes the construction of a cDNA library.
  • the first step is to isolate mRNA from a desired biological tissue or cell source.
  • the mRNA is then used in the synthesis of cDNA.
  • RNA Isolation RNA is isolated using guanidinium isothiocyanate and 2-mercaptoethanol lysis, followed by ultracentrifugation over a cesium chloride gradient to obtain total RNA (Chirgwin et al).
  • total RNA can be isolated using acid/phenol extraction (Chowzisky et al) and polyadenylated RNA can be isolated directly using a biotinylated oligo dT primer. An optical density measurement is taken to assess the quantity of total RNA isolated, and an aliquot is run on an electrophoresis gel to assess the quality and integrity of the RNA. The RNA is then stored until needed at -80 °C, which prevents degradation.
  • each sample is treated with DNAse and acid phenol. followed by precipitation and washing.
  • the RNA is again run on an electrophoresis gel to make sure it is free of genomic DNA contamination.
  • Subsequent selection of polyadenylated (poly A) RNA is done with either an oligo(dT)-based affinity column or OligotexTM latex microspheres.
  • first-strand cDNA is initiated using a poly(dT) primer that is complementary to the polyA stretch at the 3' end of most transcripts; and the reverse transcriptase enzyme.
  • the primer used in this reaction contains a restriction enzyme recognition site (NotI) that permits directional insertion into an appropriate cloning vector.
  • Second-strand cDNA synthesis is based on the method developed by Gubler and Hoffman (1983). RNAse H nicks the RNA/cDNA hybrid created in the reverse transcription reaction, creating priming sites for E. coli DNA polymerase to synthesize second-strand cDNA. The gaps in the second strand are ligated together using E. coli DNA ligase.
  • an adaptor is ligated to the double-stranded cDNA.
  • This oligonucleotide which contains an EcoRI-compatible sticky end, allows for directional cloning on the cDNA once digestion is complete with NotI, the restriction enzyme site found at the 3' terminus of the cDNA.
  • the cDNA is then size-fractionated to remove very short cDNAs, which would inhibit the generation of highly complex libraries. At this point, the cDNA is ligated into a plasmid vector system and transformed into bacterial cells for propagation. II. Construction of ATM cDNA libraries
  • Three cDNA libraries were constructed containing an adapter having the Type IIs restriction endonuclease, Bpm I, at the 5' end of each cDNA insert.
  • Two ug each of poly A + RNA from human colon, human prostate, and a single species control RNA (bacterial chloramphenicol transferase) were reverse transcribed using an oligo-dT primer and Superscript reverse transcriptase according to manufacturer's instructions (Gibco Superscript Plasmid System).
  • 3 ug of the adapter containing the Bpm I site were ligated to each cDNA.
  • the adapter containing the Bpm I site was prepared previously in the following manner: two oligonucleotides (5' AATTCAGCTGGAG and 5' phos-CTCCAGCTG) were synthesized and purified by HPLC and polyacrylamide gel electrophoresis (New England Biolabs). Equimolar amounts of the two oligos were combined in annealing buffer (20 mM Tris, pH 7.4, 2 mM MgCU, 50 mM NaCl), boiled, and allowed to cool to ⁇ 30 degrees C in a heating block. Following adapter ligation, the cDNA was digested with Not I, fractionated over a Sepharose CL- 4B column and cloned into a pSPORT vector (LTI, Inc.).
  • ATM libraries were made according to ATM Strategy 1, illustrated in Figure 2.
  • Library size is determined by subtracting the background from the number of individual colonies or transformants that are generated from a single ligation reaction of the transcript markers to pSport vector. Average number of markers per clone as shown in Table I is determined by both insert size and DNA sequence verification.
  • Arrays of transcript markers from the prostate ATM library were PCR amplified for 30 cycles using Ml 3 forward and reverse primers. These arrays were size selected on a 6% acrylamide gel and cloned into a pSPORT-derived vector. Twenty-seven random clones were sequenced from a total of 131 total transcript markers analyzed. There was an average of 4.9 markers per clone with a range of 4-7 markers/clone. Table II illustrates the average marker size per number of clones. TABLE II marker size number (%)
  • the marker size "14/15” refers to an ambiguous situation where there are 29 bp (instead of 28 bp) with markers which are concatenated back to back as shown: spacer
  • a transcript marker (5' ccgagagtcgtcgg was identified from a prostate ATM library which corresponds to the gene apoferritin H (GenBank GI numbers: g31340, g31342, g28434). Based on the published sequence, this marker is present at the 5' end of the mRNA, 14 nucleotides downstream from the start of transcription and 181 nucleotides upstream of the start of the coding sequence. The entire mRNA is predicted to be about 0.9 kb.
  • a gene specific primer (“g31340") that contains 7 nucleotides of the Bpm I adapter and the 14 nucleotides of the transcript marker (5' gctggagccgagagtcgtcgg) was designed.
  • the cDNA inserts from 1 ug of the prostate ATM library were amplified by 30 cycles of PCR using the Ml 3 forward and reverse primers. This PCR reaction was diluted 1 :50 in water and 1 ul was reamplified for 33 cycles using the nested T7 and "g31340" primers. A 0.9 kb band was isolated, gel purified and cloned. DNA sequencing confirmed the identity of the cloned gene as full length apoferritin H. Alternatively, the 0.9 kb PCR product is sequenced directly with appropriate primers.
  • High throughput isolation of cDNA clones from an ATM library is achieved in the following manner.
  • a master pool of insert cDNA is created from an ATM library by PCR amplification using primers found in the vector (e.g., Ml 3 forward and reverse).
  • primers found in the vector e.g., Ml 3 forward and reverse.
  • gene specific primers are synthesized in 96 well arrays (Gibco).
  • Third, aliquots of gene specific primers, PCR reagents, and the master cDNA pool are aliquoted to 96 PCR wells for PCR and subsequently analyzed by gel electrophoresis.
  • an initial screen for successful PCR reactions is accomplished by doing real time flourescent detection of PCR products. With this technique only those reactions which give a significant fluorescent signal above background would then be analyzed by gel electrophoresis.
  • the nucleic acid sequence of transcript markers can be used to design oligonucleotide primers for extending a partial nucleotide sequence to full length or for obtaining 5' sequences from genomic libraries.
  • One primer is synthesized to initiate extension in the antisense direction (XLR) and the other is synthesized to extend sequence in the sense direction (XLF).
  • Primers allow the extension of the known sequence "outward" generating amplified nucleotide sequences containing new, unknown nucleotide sequence for the region of interest (US Patent Application 08/487,112, filed June 7, 1995, specifically incorporated by reference).
  • the initial primers are designed from the cDNA using OLIO ® 4.06 Primer Analysis Software (National Biosciences), or another appropriate program, to be 22-30 nucleotides in length, to have a GC content of 50% or more, and to anneal to the target sequence at temperatures about 68°-72° C. Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations is avoided.
  • the original, selected cDNA libraries, or a human genomic library is used to extend the sequence. A genomic library is most useful to obtain 5' upstream regions. If more extension is necessary or desired, additional sets of primers are designed to further extend the known region.
  • PCR is performed using the Peltier Thermal Cycler (PTC200; MJ Research, Watertown MA) and the following parameters:
  • Step 1 94 ° C for 1 min (initial denaturation)
  • Step 2 65° C for 1 min
  • Step 3 68° C for 6 min
  • Step 4 94° C for 15 sec
  • Step 7 Repeat steps 4-6 for 15 additional cycles
  • Step 8 94° C for 15 sec
  • Step 9 65 ° C for 1 min
  • Step 11 Repeat step 8-10 for 12 cycles
  • a 5-10 ⁇ l aliquot of the reaction mixture is analyzed by electrophoresis on a low concentration (about 0.6-0.8%) agarose mini-gel to determine which reactions were successful in extending the sequence. Bands thought to contain the largest products were selected and cut out of the gel. Further purification involves using a commercial gel extraction method such as QIAQuickTM gel extraction (QIAGEN Inc). After recovery of the DNA, Klenow enzyme was used to trim single-stranded, nucleotide overhangs creating blunt ends which facilitate religation and cloning.
  • the products are redissolved in 13 ⁇ l of ligation buffer, l ⁇ l T4-DNA ligase (15 units) and l ⁇ l T4 polynucleotide kinase are added, and the mixture is incubated at room temperature for 2-3 hours or overnight at 16° C.
  • Competent E ⁇ coli cells (in 40 ⁇ l of appropriate media) are transformed with 3 ⁇ l of ligation mixture and cultured in 80 ⁇ l of SOC medium (Sambrook J et al, supra). After incubation for one hour at 37° C, the whole transformation mixture is plated on Luria Bertani (LB)-agar (Sambrook J et al, supra) containing 2xCarb.
  • PCR amplification For PCR amplification, 18 ⁇ l of concentrated PCR reaction mix (3.3x) containing 4 units of rTth DNA polymerase. a vector primer and one or both of the gene specific primers used for the extension reaction are added to each well. Amplification is performed using the following conditions:
  • Step 1 94° C for 60 sec
  • Step 2 94° C for 20 sec
  • Step 3 55° C for 30 sec
  • Step 4 72° C for 90 sec
  • Step 5 Repeat steps 2-4 for an additional 29 cycles
  • 5-aza-2'-deoxycytidine induces transcription of silent genes, presumably by demethylating cytosines in CpG islands which are regulatory regions located upstream of most genes, btaining libraries from cells treated with 5-aza-2'-deoxycytidine will enhance the discovery of rare genes and genes expressed in specialized cell types difficult to isolate and prepare RNA from.
  • THP1 cells at a density of 1.1 million cells per ml were treated for three days with 0.8 micromolar 5-aza-2'-deoxycytidine.
  • the medium used for growth conditions was Iscove's modified DMEM with 10% Fetal Bovine Serum.
  • HNT precursor cells at 80% confluency were treated for three days with 0.35 micromolar 5-aza-2'-deoxycytidine.
  • the medium used for growth conditions was Iscove's modified DMEM.
  • 5-aza-2'-deoxycytidine has been shown to be toxic in cultured cells and animal, initial experiments were conducted to assess the toxicity of 5-aza-2'-deoxycytidine on hNT and THP1 cells to establish conditions where the cells would survive and RNA could be recovered.
  • 5-aza-2'-deoxycytidine is a concentration typically used to induce silent gene transcription.
  • a concentration of 0.8 micro molar was selected for THP1 cells and 0.35 micro molar for hNT cells.
  • the following 3 genes were in the cDNA library constructed from mRNA from THP cells treated with 5 -aza-2' deoxycytidine library and not in the control cDNA library: enBank g29382, human BBC1 mRNA; GenBank g337507, human ribosomal protein S25 mRNA; and gl 84553 human insulinoma pig-analog mRNA.
  • the following genes were in the cDNA library constructed from mRNA from THP cells treated with 5 -aza-2' deoxycytidine library and in the control cDNA library and found upregulated in THP1 : GenBank g 182055 human neutrophil elastase mRNA, 3' end; g36143 human mRNA for ribosomal protein SI 1; g793842 human mRNA for ribosomal protein L29; g28976 human mRNA for azurocidin; and g34891 1 human glycoprotein mRNA.
  • the following 3 genes were in the cDNA library constructed from mRNA from hNT cells treated with 5 -aza-2' deoxycytidine library and not in the control cDNA library: gl90233 human acidic ribosomal phosphoprotein PI ; g436217 human mRNA (KIAA0037) for ORF; and g385936 hingeOXPHOS system complex III.
  • An additional result of 5-aza-2'-deoxycytidine treatment was a decrease in the expression of many genes, particularly more abundant mRNAs.
  • Plasmid pIIEZ-1 was derived from pUC19 vector by removing the 991 base pairs from the restriction endonuclease sites Sspl through Afl3. A 90 base pair synthetic polylinker containing type IIs restriction endonuclease sites were ligated to the remaining pUC19 vector, as shown in Figure 13.
  • the type IIs polylinker was made by annealing the two synthetic oligomers P-II-1 and P- II-2.
  • the oligomer sequences are: P-II-1 5'
  • the double stranded linker has a 5' blunt end which is ligated into the
  • the polylinker contains 4 type IIs restriction endonuclease sites which can be used for generation of 14 bp transcript markers. Standard PCR primer directed site specific mutagenesis is carried out to eliminate all type
  • a method directed toward genome closure involves the systematic identification of all transcribed genes.
  • the method involves the generation of cDNA's with a defined 5'-end. generation of defined priming sites for extension PCR amplification from within the cloned cDNAs, systematic amplification of all cDNAs, and sequencing of all extension products.
  • the cDNA's are digested with a 4 base pair cutter.
  • the cDNAs are ligated into the vector displaying a type IIs restriction endonuclease site at one end.
  • the cDNAs are cloned into a vector, amplified and digested with the type IIs restriction 15 endonuclease. Linkers are cloned into the linearized plasmid/cDNA fragment. 8)Bbs I CATGATCATGG
  • 5'ATCATGGCCATGATCATGAGCA 3' will form a pair with 5'ATCATGGCCATGATCATGTCGT 3 *
  • Any PCR-reaction will cover 2 primer combinations.
  • 256 oligonucleotides will be synthesized (5'ATCATGGCCATGATCATGN NN 3'). Since every PCR-reaction covers 2 combinations, 128 reactions will cover all possible combinations. A range of 200-300 PCR-products/reaction will be expected for a specific tissue. Since this is an inverse PCR approach, the products are sequenced directly with the wobble primer, increasing the resolution by a factor 3. The PCR-products then are directly thermo-cycled with the wobble primers resulting in a higher resolution. One colour is choosen for each nucleotide with no terminator included in the reaction. The resulting products are run on a sequencing gel.
  • MOLECULE TYPE cDNA ( xi ) SEQUENCE DESCRI PT ION : SEQ I D NO : 4 4 : ATCATGGCCA TGATCATGNN NN 22

Abstract

The present invention relates generally to the field of molecular biology and specifically to rapid, high-throughput gene discovery methods that facilitate genome closure and to methods for analyzing gene expression patterns. The present invention provides methods and vectors useful for constructing libraries of transcript markers. The present invention also provides sequence specific methods for extending the nucleotide sequence of partial transcripts in a high-throughput manner using polymerase chain reaction.

Description

METHODS FOR GENERATING AND ANALYZING TRANSCRIPT MARKERS
TECHNICAL FIELD
The present invention relates generally to the field of molecular biology and specifically to rapid, high-throughput gene discovery methods that facilitate genome closure and to methods for analyzing gene expression patterns.
BACKGROUND ART Many genes have been isolated and their structures determined since the introduction of recombinant cDNA technology. It has been predicted that through the efforts of the Human Genome Project, sequencing of the entire human genome (genome closure) will be accomplished sometime between 2001 and 2005 (Boguski, 1995, Molecular Medicine 333:645-647). Strategies for achieving genome closure include methods for constructing 3' directional normalized libraries which equalize cDNA representation (Soares et al. 1994, Proc. Natl. Acad. S . 91 :9928 and Patanjali et al, 1991. Proc. Natl. Acad. Sci. USA 88:1943-1947) and methods for creating cDNA libraries based on subtractive hybridization techniques which provide methods for isolating the coding sequences of genes which are differentially expressed, such as during development or in disease states which are described in Rubenstein et al (1990, Nucleic Acids Research. 18:4833-4842); Travis et al. (1988, Proc. Natl. Acad. Sci. USA, 85:1696-1700).
Elucidation of gene expression represents another level of complexity equally important to the elucidation of genetic structure. The generation of a gene expression pattern can be used directly as a diagnostic profile or as a gene discovery method. Seilhamer et al. (WO 95/20681, filed January 27, 1995) disclose methods for the high-throughput sequence-specific analysis of cDNAs and generation of transcript images. Matsubara et al (WO 95/14772, filed November 11,1994) disclose methods for generating 3' directed cDNA libraries which accurately reflect the abundance ratio of mRNA in a cell. Velculescu et al. (1995, Science 270:484-487) describe a method for the analysis of gene "tags" or transcripts which uses type IIs restriction enzymes and involves the generation and analysis of short, 3' nucleotide sequences which may inherently contain a substantial amount of 3' non-coding information. Kato (1995, Nucleic Acids Research 23:3685-3690) describes a method for the identification of 3' end cDNA fragments which involves the use of type IIs restriction enzymes and PCR methodology. Kato notes that there are several technical limitations to the method including the presence of PCR generated artifacts and the fact that cDNA sequences lacking enzyme recognition sites will not be displayed.
Aoto et al. (1995, Eur. J. Biochem. 234:8-15) describe isolation of a cDNA clone obtained from a cDNA library of mRNA prepared from cells treated with 5-azacytidine. The deoxycytidine analog, 5-aza-2'-deoxycytidine (5-azadCyd), is highly toxic in cultured cells and animals and has been used clinically as an anti-cancer agent.
5-azadCyd has been used experimentally as a DNA methylation inhibitor to induce gene expression and cellular differentiation (Juttermann et al. 1994, Proc. Natl. Acad. Sci. USA 91 : 11797-11801).
In spite of the availability of methods designed to facilitate gene discovery and elucidate gene expression, there remains a need in the art for methods which will expedite the process of gene discovery in a rapid, high-throughput manner, thereby contributing to the process of gene closure. There also remains a need in the art for methods for analyzing gene expression patterns in a rapid, high throughput manner.
DISCLOSURE OF THE INVENTION The present invention relates, in part, to rapid, high-throughput gene discovery methods that facilitate genome closure and to methods for analyzing gene expression patterns. The present invention also relates to methods for the rapid, sequence-specific identification of transcripts derived from an mRNA population. The present invention further relates to methods for extending the nucleotide sequences of partial transcripts in a high-throughput manner using polymerase chain reaction technology.
The present invention provides rapid, high-throughput methods for generating and analyzing transcript markers from the 5' most end of cDNAs and methods for generating and analyzing two, discontinuous transcript markers from a single cDNA providing the advantage of obtaining more information from a single transcript than is possible by current methods. In one aspect of the present invention, the two discontinuous markers are derived from both the 5' and 3' ends of a single cDNA and in another embodiment of the present invention, the two discontinuous markers are derived from random areas of the cDNA.
In one aspect of the present invention, a method is provided for generating and analyzing transcript markers from the 5' most end of individual cDNAs of a cDNA library comprising the steps of obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site for a restriction endonuclease that digests the cDNA at the 5' most end within an expected distance from its recognition site and a second endonuclease restriction site; subjecting the cDNAs to digestion with the first and the second restriction endonuclease thereby excising transcript markers from the 5' most end of the individual cDNAs; ligating said transcript markers to a vector; transforming said vector containing the transcript markers in a host cell; culturing said host cells; and performing nucleic acid sequence analysis of the transcript markers. This method can be used for gene discovery purposes or transcript imaging purposes. This method can be combined with PCR based technology for the rapid nucleic acid sequence analysis of the transcript markers.
In another aspect of the present invention, a method is provided for generating and analyzing non-contiguous transcript markers derived from the 5' most and 3' most end of individual cDNAs, comprising the steps of obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site at the 5' most end and at the 3' most end for a restriction endonuclease that digests nucleic acid within an expected distance from the first endonuclease recognition site, and a second endonuclease restriction site; subjecting the cDNAs to digestion with the first endonuclease thereby creating linearized cDNAs containing transcript markers from the 5' most end and the 3' most end of the individual cDNAs; self-ligating said linearized cDNAs thereby joining the transcript markers from the 5' most and 3' most ends to create cDNAs containing non-contiguous transcript markers; transforming said linearized cDNAs in a host cell and culturing said host cells; isolating said cDNA from said host cells; digesting the cDNA with the second restriction endonuclease thereby excising the non-contiguous transcript markers; ligating said excised transcript markers to a second vector; transforming said second vector containing the transcript markers in a host cell and culturing said host cells; and performing nucleic acid sequence analysis of the transcript markers.
In another aspect of the present invention, a method is provided for the rapid, sequence- specific identification of cDNAs derived from a human mRNA population, comprising the steps of obtaining a cDNA library comprising individual cDNAs wherein said cDNAs contain first restriction endonuclease sites for an endonuclease having a 4 base pair recognition site and wherein said cDNAs are cloned into a first vector lacking the first restriction endonuclease sites; subjecting the cDNA library to digestion with the first restriction endonuclease thereby creating linearized cDNAs containing a portion of the original cDNA; ligating an adapter to said linearized cDNAs wherein the adapter contains a second restriction endonuclease site for an endonuclease that cleaves within an expected range of its recognition site thereby creating cDNAs containing two non-contiguous transcript markers joined by the adapter; digesting the cDNAs with the second restriction endonuclease thereby excising the transcript markers from said cDNAs; concatenating said excised transcript markers; ligating said concatenated transcript
- markers to a second vector; transforming said second vector containing the transcript markers in a host cell and culturing said host cells; and performing nucleic acid sequence analysis on the transcript markers in said host cells.
The present invention also provides novel methods and vectors used in preparation of cDNA libraries containing transcript markers. The present invention also provides novel, high- throughput methods for obtaining complete nucleotide sequence information on transcript markers. In one embodiment of the present invention, cDNA libraries containing a serial arrangement of multiple transcript markers are constructed and subjected to high-throughput nucleic acid sequence analysis in a multi-well format using polymerase chain reaction (PCR) technology and specialized PCR primers.
In additional aspects of the present invention, the cDNA libraries are constructed to bias the cDNA population toward rare cDNAs. In one aspect of the present invention, a cDNA library is constructed by normalization techniques and in another, the cDNA library is constructed using subtractive hybridization techniques. In yet another aspect of the present invention, the cDNA library has been constructed from mRNA treated with demethylating agents, such as, 5-aza-2' deoxycytidine and 5-azacytidine, which induces the transcription of silent genes. In an additional aspect of the present invention, the cDNA library has been constructed using oligo dT primers and in another aspect, the cDNA library is constructed using random primers.
BRIEF DESCRIPTION OF DRAWINGS Figure 1 illustrates a general schematic of cDNA library construction.
Figure 2 illustrates Array of Transcript Markers (ATM) Strategy 1 for constructing a cDNA library containing an array of transcript markers derived from the 5' most end of a cDNA (5' transcript markers).
Figure 3 illustrates ATM Strategy II for constructing a cDNA library containing an array of 5' transcript markers.
Figures 4A-4B illustrate ATM Strategy III for constructing a cDNA library containing an array of 5' transcript markers.
Figures 5A-5B illustrate ATM Strategy IV for constructing a cDNA library containing an array of 5' transcript markers. Figures 6A-6B illustrate ATM Strategy V for constructing a cDNA library containing an array of 5' transcript markers.
Figure 7 illustrates a strategy for simultaneously obtaining two non-contiguous transcript markers from both the 5' and 3' end of cDNA.
Figure 8 illustrates a strategy for obtaining non-contiguous transcript markers from random areas of a cDNA.
Figure 9 illustrates a sequence specific approach to the identification of all transcribed genes from an mRNA population.
Figure 10 is a list of 12 known sequences identified by the method illustrated in Figure 2.
Figure 11 illustrates the toxicity curve for THP1 cells treated for 3 days with 5-aza-2'- cytidine.
Figures 12A-12B illustrate examples of Type IIs restriction endonucleases useful for generating ATM libraries. Bpml, Bsgl and Eco57I are three restriction endonuclease that cleave 16 bp away from their recognition site with a 2 nucleotide 3' overhang. Bpml is illustrated in Figure 12A. N's represent nucleotides found adjacent to the restriction site (e.g., the 5' end of a cDNA). Treatment with T4 DNA polymerase removes the 31 extension which leaves 14 bp derived from the adjacent sequence. Figure 12B illustrates that the restriction endonuclease BSMFI cleaves 14 bp away from its recognition site with a 4 nucleotide 51 overhang. This end may be filled in by treatment with a DNA polymerase which results in 14 bp derived from adjacent sequence.
Figure 13 illustrates a vector useful in the strategy of Figure 8 which has all restriction endonuclease sites having a 4 base pair recognition site removed. Figures 14A-14B illustrate a typical concatenated array of transcript markers.
Figure 15 illustrates an abundance profile of THP cells treated with 5-aza-2' deoxycytidine. The black-shaded area represents control clones, the gray shaded area represents 5-aza-2' deoxycytidine treated cells and the gray line represents the abundance profile of the 5- aza-2' deoxycytidine treated cells. MODES FOR CARRYING OUT THE INVENTION
Before the present compounds, variants, formulations and methods for making and using such are described, it is to be understood that this invention is not limited to the particular compounds, variants, formulations or methods described, as such variants, formulations and methodologies may, of course, vary. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and it is not intended to be limiting since the scope of the present invention will be limited only by the appended claims.
It must be noted that as used in the specification and the appended claims, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise which will be known to those skilled in the art or will become known to them upon reading this specification. L Definitions As used herein, the term "transcript marker derived from a cDNA library" refers to an isolated polynucleotide derived from an individual cDNA and being preferably from about 10 base pairs to about 20 base pairs in length derived from an individual cDNA.
As used herein, the term "non-contiguous transcript marker from an individual cDNA" refers to two polynucleotides which are not adjacent to one another under naturally occurring conditions, but which are constructed to exist in tandem, with each polypeptide being preferably from about 10 base pairs to about 20 base pairs in length.
As used herein, the term "3' transcript marker" or "transcript marker from the 3' most end of a cDNA" refers to an isolated polynucleotide derived from the 3' most end of an individual cDNA and being preferably from about 10 base pairs to about 20 base pairs in length. As used herein the term "5' transcript marker" or "51 transcript marker derived from a cDNA library" refers to an isolated 5' most nucleic acid sequence of a cDNA which is preferably about 10 base pairs to about 20 base pairs in length. As used herein, "5' most" means that a 5' transcript marker may represent the 5' end of the full-length coding region of a cDNA and may include 5' untranslated sequences. Alternatively, a 5' most transcript marker may represent an internal coding region of a individual cDNA. Each 5' transcript marker can reflect the expression of an individual cDNA.
As used herein, the term "adapter" refers to a synthetic fragment of nucleic acid which is ligated to a cDNA and which may contain recognition sites for restriction endonucleases.
As used herein the term "5' adapter" refers to a synthetic fragment of nucleic acid which is ligated onto the 5' end of cDNAs prior to ligation to a vector. In the present invention, the 5' adapter contains a first restriction endonuclease site which digests nucleic acid at a expected distance from its enzymatic recognition site and at least one other restriction endonuclease site. The second restriction endonuclease site is for a restriction endonuclease which digests the cDNA to form blunt ends or 5' or 3' overhangs. In a preferred embodiment of the present invention, digestion of the cDNA with the first and second restriction enzymes excises transcript markers which are at least 20 base pairs in length and contain at least 14 base pairs of cDNA sequence and 6 base pairs of adapter sequence. As used herein the term "array(s) of transcript markers" or "ATM" refers to the collection or serial arrangement of transcript markers prepared by concatenation of individual transcript markers. As used herein, an "ATM cDNA library" is a cDNA library containing transcript markers as inserts. The inserts may be individual inserts, multiple inserts or a serial arrangement of multiple inserts which may be in sense or antisense orientation.
As used herein the term "full-length" coding region refers to the cDNA sequence for the entire transcribed mRNA for a particular protein from initiating methionine to the poly A tail.
As used herein the term "type IIs restriction endonuclease" or "type IIs enzyme " refers to that category of restriction endonucleases that digests nucleic acid at an expected distance from the enzymatic recognition site. Preferred examples of type IIs enzymes for use in the present invention are those which digest nucleic acid at least 10 base pairs from the recognition site or which digest nucleic acid less than 10 base pairs from the recognition site but can be filled in enzymatically to give a 10 base pair transcript marker. Examples of such Type IIs enzymes include, but are not limited to Bpml, Bsgl Eco57I and BsmFI. Examples of type IIs restriction endonucleases is illustrated in Figures 12A-12B. As will be understood by those of skill in the art, it is possible to alter enzymatic digestion conditions to change the nucleic acid digestion position of the type IIs restriction endonuclease.
As used herein the term "type Ilsg restriction endonuclease" or "type Ilsg enzyme" refers to that category of restriction endonucleases that digests nucleic acid within an expected range of its enzymatic recognition site. Examples of type Ilsg restriction enzymes include, but are not limited to Bcgl.
As used herein the term "concatenating" refers to the process of ligating multiple transcript markers prior to ligation in a cloning vector.
As used herein the term "normalized cDNA library" or "normalized library" refers to a cDNA library constructed in such a manner as to reduce the redundancy in high- level abundance cDNAs.
As used herein the term "subtractive hybridization" (when referring to construction of a cDNA library) refers to a process wherein a first population of nucleic acid is hybridized with a second labelled population of nucleic acid (driver) and the resultant nucleic acid hybrids removed to completion thereby identifying and isolating a set of nucleic acid sequences unique to the first population of nucleic acid which may be used in the construction of a cDNA library.
As used herein the term "selected set" of random primers refers to a set of primers designed to anneal to a specific region of mRNA. In a preferred embodiment herein, the selected set of random primers is designed to anneal to the 51 most region of mRNA used as starting material for cDNA synthesis.
As used herein the term "transcript imaging" refers to a method of determining the relative abundance of individual transcript markers in a cDNA library. The term relative abundance refers to the number of times an individual transcript marker appears relative to the total number of transcript markers identified.
As used herein the term "non-amplified growth" of host cells refers to growth conditions which allow for uniform growth of recombinant host cells. As used herein, the term "high abundance" or "high-level abundance" messages exist at greater than 10,000 copies per cell.
As used herein, the term "mid abundance" or "mid-level abundance" species exist at 100 to 400 copies per cell.
As used herein, the term "low abundance" or "low-level abundance" species are found at less than 15 copies per cell. This low-abundance class represents 20 to 50 percent of the unique transcripts in the cell.
As used herein the term "wobble primer" refers to a sequencing primer degenerate at the 3'-end to allow sequencing for all three possible bases following the poly A tail.
The present invention relates to rapid, high-throughput gene discovery methods that facilitate genome closure and to methods for analyzing gene expression patterns. The present invention is based, in part, upon the discovery of methods for the generation of transcript markers derived from the 5' end of cDNAs contained within a cDNA library. The present invention is also based, in part, upon the discovery of methods for the generation of two discontinuous transcript markers derived from both the 5' and 3' ends of cDNAs contained within a cDNA library.
The transcript markers of the present invention are derived from previously constructed cDNA libraries. The nucleotide information contained within transcript markers can be used to identify novel transcripts or to provide the basis for the design of PCR primers useful in extending and identifying a transcript marker nucleotide sequence contained within a cDNA library. CDNA libraries of the present invention can be constructed by methods which equalize the population of cDNAs or bias the population of cDNAs toward rare cDNAs, e.g. by normalization techniques, subtraction techniques, treatment with demethylating agents and treatment with differentiating agents, for example.
The present invention also provides novel methods for achieving genome closure which combine rapid nucleic acid sequence identification of transcript markers from a cDNA with subsequent extension of the sequence of the identified transcript markers using PCR technology. Methods of the present invention may also be used to provide a transcript image of specific tissues or biological samples. II. Construction of cDNA libraries
CDNA libraries for use in the methods of the present invention may be prepared by methods described in Maniatis et al. (1982, Molecular Cloning: a Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York) or any means known to those of skill in the art.
CDNA libraries may be constructed to bias the population of individual cDNAs toward desired coding regions. A cDNA library constructed with oligo dT primers would be expected to contain nucleic acid sequences from the 3' coding region of a cDNA, as well as 3' untranslated regions. Additionally, a cDNA library constructed with oligo dT primers may also contain part or all of the entire coding region of a particular cDNA. A cDNA library constructed with random primers would be expected to contain nucleic acid sequences from all parts of the coding region of a cDNA, including the 5' most nucleic acid sequence of the full-length coding region.
CDNA libraries constructed with selected sets of random primers, such as primers which specifically prime first strand cDNA synthesis toward the 5' end, such as with the incorporation of CapFinder™ PCR construction kit (Clontech K1051-1), are desirable for obtaining 5' transcript markers from the 5' most end of a putative transcript or cDNA.
For gene discovery purposes, preferred methods for the construction of cDNA libraries include the use of random primers or a selected set of random primers designed to prime cDNA synthesis from the 5' end of the mRNA; the use of cell lines treated with demethylating compounds, such as, 5-aza-2'-deoxycytidine, as starting material for the preparation of mRNA; the use of cell lines treated with compounds which induce differentiation, such as retinoic acid; normalization or equalization methods; the use of subtractive hybridization techniques in the construction of cDNA libraries; and any method that induces the transcription of silent genes or allows for the identification of rare cDNAs, such as size fractionation of specific cDNA populations expected to contain rare cDNAs.
Preferred methods for the construction of cDNA libraries intended for transcript imaging purposes are those which produce an unbiased population of cDNAs and would include a step for the non-amplified or uniform growth of host cells used in constructing the cDNA library. Such growth conditions are described in Current Protocols in Molecular Biology "Amplification of Cosmid and Plasmid Libraries" Unit 5.10. (1987). In the normalization process, the prevalence of high-abundance cDNA clones decreases dramatically, clones with mid-level abundance are relatively unaffected, and clones for rare transcripts are effectively increased in abundance. In the Soares et al. (1994, supra) normalization procedure, the abundance levels of individual cDNA clones have been equalized by a kinetic re-annealing hybridization. This approach is designed to reduce the initial 10,000- fold variation in individual cDNA frequencies in order to achieve abundances within one order of magnitude while maintaining the overall sequence complexity of the library.
CDNA libraries prepared by normalization techniques are not an accurate reflection of the source tissue's gene-expression profile; however cDNA libraries produced by normalization techniques may provide a source of low abundance transcripts and therefore, would be useful for gene discovery purposes.
A variety of normalization techniques are known by those of skill in the art for the production of normalized cDNA libraries. Weissmann S.M. (1987 Mol. Bio. Med. 4:133-143) describe a method based on hybridization to genomic DNA wherein the frequency of each hybridized cDNA in the resulting normalized library would be proportional to that of each corresponding gene in the genomic cDNA. Ko. (1990, Nucleic Acid Res. 18:5705-571 1) and Patanjali et al (1991, Proc. Natl. Acad. Sci. 88:1943-1947) describe a kinetic approach to constructing cDNA libraries. Soares (WO 95/08647, filed September 23, 1994 and published March 30, 1995) describe a method to normalize a 3' directional cDNA library.
Subtractive hybridization of nucleic acids is a method to isolate the coding sequences of a gene which are differentially expressed such as during development or in disease states. CDNA libraries prepared by subtractive hybridization techniques are described in Rubenstein et al (1990, Nucleic Acids Research. 18:4833-4842); Travis et al. (1988, Proc. Natl. Acad. Sci. USA, 85:1696-1700).
Obtaining cDNA libraries from cells treated with 5-aza-2'-deoxycytidine should enhance the discovery of rare genes and genes expressed in specialized cell types from which it is difficult to isolate and/or prepare DNA. A non-toxic concentration of 5-aza-2'-deoxycytidine can be predetermined through titration/toxicity assays as illustrated in Figure 1 1. 5-aza-2'-deoxycytidine has been shown to induce the transcription of silent genes through demethylation (Hsieh et al supra). The preferred amount of 5-aza-2'-deoxycytidine to use is that amount that induces the transcription of silent genes without being toxic to the cell. This method can be coupled to subtractive methods to enhance further the discovery of novel transcripts or genes. IL Construction of ATM Libraries
ATM libraries containing transcript markers can be constructed using any of the methods described in Figures 2-8. Figure 2 illustrates ATM Strategy 1 for constructing a cDNA library containing an array of transcript markers derived from the 5' most end of a cDNA (5' transcript markers). In Strategy 1 , a cDNA library is constructed using an adapter containing a Bpml site and a PvuII restriction site. The constructed cDNA library is digested with Bpml and PvuII to isolate 20 bp 5' transcript markers (14 base pairs of cDNA and 6 base pairs from the adapter) from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt ends and the markers are concatenated to form an array or serial arrangement of multiple markers. The concatenated array of 5' transcript markers is then used to create a library containing an array of 5' transcript markers. In a variation of this method, individual transcript markers can be subjected to subtractive hybridization methods to bias the population toward rare transcript markers.
Another strategy is shown in Figure 3 which illustrates ATM Strategy II for constructing a cDNA library containing an array of 5' transcript markers. In Strategy II, a cDNA library is constructed using an adapter containing a Bpml site and a PvuII restriction site. The constructed cDNA library is digested with Bpml and a second degenerate adapter containing a PvuII site is ligated onto the Bpml site. After PCR amplification of the template and PvuII digestion, 25 bp transcript markers are isolated. The isolated transcript markers are concatenated to form arrays, ligated into a vector and transformed into host cells to create a library containing an array of 5' transcript markers.
ATM Strategy III, as shown in Figures 4A-4B illustrates the construction of a cDNA library containing an array of 5' transcript markers. In Strategy III, a cDNA library is constructed using an adapter containing a Bpml site and a PvuII restriction site. The constructed cDNA library is digested with Bpml and a second degenerate adapter containing a PvuII site is ligated onto the Bpml site. Sense RNA is transcribed from the template and the mixture is subjected to DNAse treatment. First and second strand cDNA is synthesized from the template. The double stranded cDNA is digested with PvuII which yields 25 bp transcript markers. The transcript markers are concatenated to form arrays, the arrays are cloned into a vector and the vector transformed into host cells to create a library containing an array of 5' transcript markers.
ATM strategy IV, as shown in Figures 5A-5B, illustrates ATM Strategy IV for constructing a cDNA library containing an array of 5' transcript markers. In strategy IV, a cDNA library is constructed using an adapter containing a Bpml and PvuII restriction site. The constructed cDNA library is digested with Bpml and PvuII to isolate 20 bp 5' transcript markers from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt ends and the markers are concatenated to form an array or serial arrangement of multiple markers. Adapters are added on to the ends of the transcript marker arrays and the arrays are amplified using PCR technology. The arrays are cloned into a vector to create a library containing arrays of the transcript markers (ATM).
Another strategy is shown in Figures 6A-6B which illustrate ATM Strategy V for constructing a cDNA library containing an array of 5' transcript markers. In strategy V, a cDNA library is constructed using an adapter containing a Bpml and PvuII restriction site. The constructed cDNA library is digested with Bpml and PvuII to isolate 20 bp 5' transcript markers from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt ends and the markers are concatenated to form an array or serial arrangement of multiple markers. At this point, the arrays are ligated into a plasmid vector, subjected to PCR to amplify the transcript markers, and cloned into a vector to create a cDNA library containing PCR amplified arrays of 5' transcript markers.
Figure 7 illustrates a strategy for simultaneously obtaining two non-contiguous transcript markers from both the 5' and 3' end. In this strategy, first strand cDNA synthesis is performed using a modified random hexamer. The hexamer is designed to provide directionality. Second strand cDNA synthesis is performed by standard means and an adapter containing a Bpml and PvuII site is ligated onto both ends of the cDNA. The cDNA is ligated to a vector modified to delete all Bpml restriction sites. The vector containing the cDNA is transformed into a host cell to create a cDNA library and plasmid DNA is isolated and treated with Bpml, thereby creating individual linearized cDNAs containing both a 5' and 3' transcript marker. The linearized cDNA is treated with T4 polymerase to blunt end, self-ligated and re-transformed to create a library. The plasmid DNA of the library is isolated and digested with PvuII to excise the nucleic acid containing non-contiguous transcript markers from both the 5' and 3' end. The excised transcript markers are concatenated and cloned to create a library containing a serial arrangement of non- contiguous transcript markers from the 5' and 3' end which can be subjected to high-throughput nucleic acid sequence analysis. In another variation of this method, the vector is constructed to contain 51 and 3' Bpml sites at a cloning site thereby providing a means to excise the transcript marker from the cDNA. Figure 8 illustrates a strategy for obtaining non-contiguous transcript markers from random areas of a cDNA. In this strategy, mRNA is converted to cDNA following standard procedures. The cDNA is ligated to a vector that has been constructed to have restriction endonuclease sites for restriction endonucleases having 4 base pair recognition sites removed. The cDNA library is amplified and the plasmids prepared. The cDNA library is digested with a restriction endonuclease having a 4 base pair recognition site and the digested plasmid is purified. A 12 base pair adapter containing a Bcgl site, which digests nucleic acid within a range of its recognition site, is ligated onto the linearized cDNA, and the cDNA is transformed into a host cell. The cDNA library is digested with Bcgl resulting in the release of a 36 base pair fragment from each cDNA which originally contained the 4 base pair restriction endonuclease site. The fragments may be ligated into a vector directly or concatenated and ligated into a vector and subjected to sequence specific analysis.
The ATM libraries of the present invention may contain a single transcript marker per individual clone or multiple transcript markers constructed in a serial arrangement. As illustrated in the figures, transcript markers may be concatenated, PCR amplified and then ligated into a vector or concatenated, ligated into a vector and then PCR amplified.
In a preferred embodiment of the present invention, transcript markers that are excised from the cDNA library contain a cDNA portion and a synthetic adapter portion. In a preferred embodiment disclosed herein and as shown in Figure 2, the cDNA portion is at least 14 base pairs in length and the adapter portion is 6 base pairs which are designed to be asymmetric. The adapter portion provides the means for determining the sense orientation of the transcript marker in the vector as well as a means for determining the beginning of each transcript marker excised. Nucleic acid sequencing of the excised transcript markers can be performed to create a nucleic acid data set. For nucleic acid sequence analysis purposes, the nucleic acid adapter portion of the transcript marker can be subtracted or removed from the transcript marker nucleic acid data set. The ATM markers may exist in sense or antisense orientation in the vector. The presence of an nucleic acid fragment which provides directionality, such as an asymmetric adapter portion, provides the means for discerning the sense from the anti-sense stand and provides the means for determining the beginning of each transcript marker. Figure 14A illustrates sequence data from a single clone containing an array of 6 transcript markers. The 6 bp "spacer" DNA sequences that distinguish one transcript marker from another are underlined. The spacer sequence "CTGGAG" indicates that the immediately adjacent 14 bp to the right is a transcript marker (sense strand). The spacer sequence "CTCCAG" indicates that adjacent 14 bp to the left is a transcript marker (antisense strand). Figure 14B provides a list of the six 14 bp transcript markers (sense strand) without the spacer DNA sequence derived from the array in (14a). Asterisks (*) indicate sequences which are the reverse complement of the sequence actually found in the array.
IV. Method for Nucleic Acid Sequencing Nucleic acid sequencing of transcript markers can be performed by any means known to those of skill in the art. Methods for cDNA sequencing employ such enzymes as the Klenow fragment of cDNA polymerase I, Sequenase® (US Biochemical Corp, Cleveland OH)), Taq polymerase (Perkin Elmer, Norwalk CT), thermostable T7 polymerase (Amersham, Chicago IL), or combinations of recombinant polymerases and proofreading exonucleases such as the ELONGASE Amplification System marketed by Gibco BRL (Gaithersburg MD). Preferably, the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown MA) and the ABI 377 cDNA sequencers (Perkin Elmer).
V, PCR Methods Numerous PCR methods are known to those of skill in the art that would facilitate isolation, amplification and/or extension of nucleic acid sequences in the 5' or 3' direction. Gobinda et al (1993; PCR Methods Applic 2:318-22) disclose "restriction-site" polymerase chain reaction (PCR) as a direct method which uses universal primers to retrieve unknown sequence adjacent to a known locus. First, cDNA is amplified in the presence of primer to a linker sequence and a primer specific to the known region. The amplified sequences are subjected to a second round of PCR with the same linker primer and another specific primer internal to the first one. Products of each round of PCR are transcribed with an appropriate RNA polymerase and sequenced using reverse transcriptase.
Inverse PCR can be used to amplify or extend sequences using divergent primers based on a known region (Triglia T et al (1988) Nucleic Acids Res 16:8186). Adapters are ligated onto cDNAs which then allow cDNAs to be circularized. The intramolecular ligation products then serve as PCR templates. The method uses several restriction enzymes to generate a suitable fragment in the known region of a gene. The fragment is then circularized by intramolecular ligation and used as a PCR template.
Capture PCR (Lagerstrom M et al (1991) PCR Methods Applic 1 : 1 1 1-19) is another method which may be used. The method involves PCR amplification of cDNA fragments adjacent to a known sequence in human and yeast artificial chromosome cDNA. Capture PCR also requires multiple restriction enzyme digestions and ligations to place an engineered double-stranded sequence into an unknown portion of the cDNA molecule before PCR. Another PCR method which may be used to retrieve sequences is that of Parker JD et al (1991 ; Nucleic Acids Res 19:3055-60). Capillary electrophoresis may be used to analyze the size or confirm the nucleotide sequence of sequencing or PCR products. Systems for rapid sequencing are available from Perkin Elmer. Beckman Instruments (Fullerton CA), and other companies. Capillary sequencing may employ flowable polymers for electrophoretic separation, four different fluorescent dyes (one for each nucleotide) which are laser activated, and detection of the emitted wavelengths by a charge coupled devise camera. Output/light intensity is converted to electrical signal using appropriate software (eg. Genotyper™ and Sequence Navigator™ from Perkin Elmer) and the entire process from loading of samples to computer analysis and electronic data display is computer controlled. Capillary electrophoresis is particularly suited to the sequencing of small pieces of cDNA which might be present in limited amounts in a particular sample. The reproducible sequencing of up to 350 bp of M13 phage cDNA in 30 min has been reported (Ruiz-Martinez MC et al (1993) Anal Chem 65:2851-2858). VI. Method for determining Genome Closure
The present invention provides novel methods for achieving genome closure by combining rapid nucleic acid sequence identification of transcript markers from a cDNA with subsequent extension of the sequence of the identified transcript markers using PCR technology. Transcript markers constructed by the method illustrated in Figure 8 which have been concatenated and ligated to a vector provide for the sequence specific identification of 15-20 transcript markers per vector. In one embodiment described herein, PCR technology is used to extend the sequence of a transcript marker in an outward direction as described in US Patent Application 08/487,112, filed June 7, 1995, specifically incorporated by reference. One primer is synthesized to initiate extension in the antisense direction (XLR) and the other is synthesized to extend sequence in the sense direction (XLF). Primers allow the extension of the known sequence "outward" generating amplified nucleotide sequences containing new, unknown nucleotide sequence for the region of interest. As shown below, PCR primers which contain the sequence of a known adapter joining the non-contiguous transcript markers and all possible combinations of nucleotides for the 5 nucleotide positions flanking the adapter are used in PCR reactions to extend the sequence of the identified transcript markers using PCR technology.
N N N N N 16 base pair adapter N N N N N 3'
Considering that for each N position, there could be 4 possible nucleotides, a total of 1024 individual primers would be required for each 5' and 3' extension for a total of 1,048,576 combinations which will be specific enough to amplify any DNA.
CDNA libraries which have been prepared with oligo dT primers will allow for the identification of the 3' most part of the cDNA and part of the coding region or the complete 5' most end of a cDNA. CDNA libraries constructed with random primers constructed in combination with techniques that result in a high representation of the 5' ends of cDNA. such as the Cap-Finder™ PCR construction kit (Clontech), will allow for the amplification of the 5' ends of genes.
In another method illustrated in Figure 9, genome closure can be achieved by the identification of all transcribed genes from an mRNA source. As illustrated in Figure 9, step 1, a cDNA library containing a biotinylated poly A tail and having a Not I restriction site is constructed by standard means and cloned into a vector from which restriction endonuclease sites for 4 base pair restriction endonucleases have been removed. Multiple cDNA libraries derived from a variety of mRNA sources can be pooled to provide one sample. In Figure 9, step 2, the cDNA is digested with a restriction endonuclease which has a 4 base pair recognition site and the digested cDNA is captured by streptavidin-beads (Figure 9 step 3). The cDNA is digested with Not I which removes the biotin and ligated into a vector having a Type IIs restriction endonuclease site for MboII (Figure 9, step 4). Other Type IIs restriction endonucleases, such as Bpml, Bsgl Eco57I and BsmFI, can be used. The cDNA is digested with MboII which cuts 8 base pairs into the cDNA. A known linker of sufficient size for PCR amplification is cloned into the Mbo II restriction site and the cDNA pools are subjected to PCR analysis using the primers described above. Generation of the PCR-extension products can either be sequenced directly or with a wobble primer, ie, a sequencing primer degenerate at the 3 '-end to allow for all 3 possible bases following the poly A tail, or religated and cloned into a vector and then subjected to nucleic acid sequencing.
VII. Transcript Imaging
Transcript imaging is a method for evaluating changes in gene expression caused by factors such as disease progression, pharmacologic treatment and aging. Transcript imaging is accomplished by sequencing several thousand clones from a particular tissue or cell type and electronically recording the abundance levels for each mRNA species identified. Electronic manipulations can then be done to examine which mRNAs are up- or down-regulated, or unchanged.
Transcript imaging can be achieved by using transcript markers produced by the present invention. After nucleic acid sequencing of the transcript markers is accomplished, the abundance levels for each transcript marker are electronically recorded. Electronic manipulations can then be performed to examine which transcript markers are up- or down- regulated, or unchanged.
INDUSTRIAL APPLICABILITY L Construction of cDNA Libraries
The following example describes the construction of a cDNA library. The first step is to isolate mRNA from a desired biological tissue or cell source. The mRNA is then used in the synthesis of cDNA.
RNA Isolation RNA is isolated using guanidinium isothiocyanate and 2-mercaptoethanol lysis, followed by ultracentrifugation over a cesium chloride gradient to obtain total RNA (Chirgwin et al).
Alternatively, total RNA can be isolated using acid/phenol extraction (Chowzisky et al) and polyadenylated RNA can be isolated directly using a biotinylated oligo dT primer. An optical density measurement is taken to assess the quantity of total RNA isolated, and an aliquot is run on an electrophoresis gel to assess the quality and integrity of the RNA. The RNA is then stored until needed at -80 °C, which prevents degradation.
In order to obtain cleaner total RNA, each sample is treated with DNAse and acid phenol. followed by precipitation and washing. The RNA is again run on an electrophoresis gel to make sure it is free of genomic DNA contamination. Subsequent selection of polyadenylated (poly A) RNA is done with either an oligo(dT)-based affinity column or Oligotex™ latex microspheres.
The quality of the isolated mRNA is checked confirmed, and the sample is used in cDNA library construction. cDNA Library Construction
Synthesis of first-strand cDNA is initiated using a poly(dT) primer that is complementary to the polyA stretch at the 3' end of most transcripts; and the reverse transcriptase enzyme. The primer used in this reaction contains a restriction enzyme recognition site (NotI) that permits directional insertion into an appropriate cloning vector. Second-strand cDNA synthesis is based on the method developed by Gubler and Hoffman (1983). RNAse H nicks the RNA/cDNA hybrid created in the reverse transcription reaction, creating priming sites for E. coli DNA polymerase to synthesize second-strand cDNA. The gaps in the second strand are ligated together using E. coli DNA ligase. After the ends of the cDNA are blunted with T4 or Pfu DNA polymerase, an adaptor is ligated to the double-stranded cDNA. This oligonucleotide, which contains an EcoRI-compatible sticky end, allows for directional cloning on the cDNA once digestion is complete with NotI, the restriction enzyme site found at the 3' terminus of the cDNA. The cDNA is then size-fractionated to remove very short cDNAs, which would inhibit the generation of highly complex libraries. At this point, the cDNA is ligated into a plasmid vector system and transformed into bacterial cells for propagation. II. Construction of ATM cDNA libraries
Three cDNA libraries were constructed containing an adapter having the Type IIs restriction endonuclease, Bpm I, at the 5' end of each cDNA insert. Two ug each of poly A+ RNA from human colon, human prostate, and a single species control RNA (bacterial chloramphenicol transferase) were reverse transcribed using an oligo-dT primer and Superscript reverse transcriptase according to manufacturer's instructions (Gibco Superscript Plasmid System). Following second strand synthesis, 3 ug of the adapter containing the Bpm I site were ligated to each cDNA. The adapter containing the Bpm I site was prepared previously in the following manner: two oligonucleotides (5' AATTCAGCTGGAG and 5' phos-CTCCAGCTG) were synthesized and purified by HPLC and polyacrylamide gel electrophoresis (New England Biolabs). Equimolar amounts of the two oligos were combined in annealing buffer (20 mM Tris, pH 7.4, 2 mM MgCU, 50 mM NaCl), boiled, and allowed to cool to <30 degrees C in a heating block. Following adapter ligation, the cDNA was digested with Not I, fractionated over a Sepharose CL- 4B column and cloned into a pSPORT vector (LTI, Inc.).
To create an ATM library we followed the procedure as outlined in ATM Strategy 1 , as illustrated in Figure 2, starting with prostate mRNA containing the Bpm I adapter described above. This library was transformed into E. coli strain DHIOB by electroporation and allowed to grow either on LB plates or LB broth each supplemented with carbenicillin. Transformed cells were collected either by scraping (plates) or by centrifugation (media) and plasmid DNA was isolated (Promega or Qiagen systems). Greater than 100 ug of plasmid DNA was then digested with Bpm I and Pvu II. Multiple aliquots of 20 ug of plasmid were digested for 1.5 hours using 14 U of Bpm I (New England Biolabs) in a 150 ul volume at 37 degrees C. Thirty units of Pvu II (New England Biolabs) were then added and digestion proceeded for an additional hour at 37 degrees C. The digested DNA was fractionated on 20% TBE acrylamide. A 20 bp band was excised, the DNA was recovered by electroelution or by incubating the crushed gel slices in tris/EDTA overnight. The transcript markers were treated with T4 DNA polymerase and cloned either directly into pSPORT or concatenated. 2. Analysis of ATM libraries
ATM libraries were made according to ATM Strategy 1, illustrated in Figure 2. Vector background of these libraries was determined by screening randomly selected clones from the respective libraries by restriction enzyme digestion of plasmids with Bpm I and PvuII, thereby releasing transcript marker inserts (n= number of clones screened). Library size is determined by subtracting the background from the number of individual colonies or transformants that are generated from a single ligation reaction of the transcript markers to pSport vector. Average number of markers per clone as shown in Table I is determined by both insert size and DNA sequence verification.
TABLE I library vector background library size avg.#markers/clone
Prostate 1 10%(n=10) 1.1 x 106 3.8 Prostate 2 0% (n=12) 0.95 x 106 4.9
Prostate 3 0% (n= 12) 0.12 x 106 3.8
Arrays of transcript markers from the prostate ATM library were PCR amplified for 30 cycles using Ml 3 forward and reverse primers. These arrays were size selected on a 6% acrylamide gel and cloned into a pSPORT-derived vector. Twenty-seven random clones were sequenced from a total of 131 total transcript markers analyzed. There was an average of 4.9 markers per clone with a range of 4-7 markers/clone. Table II illustrates the average marker size per number of clones. TABLE II marker size number (%)
12 2 (1.5)
13 6 (4.6)
14 102 (77.9)
15 14 (10.7)
16 1 (0.8)
14/15 6 (4.6)
The marker size "14/15" refers to an ambiguous situation where there are 29 bp (instead of 28 bp) with markers which are concatenated back to back as shown: spacer | 29 bp~two markers back to back 1 spacer
CTGGAGNNNNNNNN NNN NNNNNNN NNNNNN CTCCAG Because the frequency of 13 and 16 bp markers is much lower than 14 and 15 bp markers. the 29 bp between the spacer sequences are assumed to be composed of one 14 bp and one 15 bp marker, rather than one 13 bp and one 16 bp marker. Thus, the true total number of 14 bp markers was 105 (102+3; 80.2%) and the true total number of 15 bp markers was 17 (14+3; 13%). III. Isolation of a full length cDNA A transcript marker (5' ccgagagtcgtcgg was identified from a prostate ATM library which corresponds to the gene apoferritin H (GenBank GI numbers: g31340, g31342, g28434). Based on the published sequence, this marker is present at the 5' end of the mRNA, 14 nucleotides downstream from the start of transcription and 181 nucleotides upstream of the start of the coding sequence. The entire mRNA is predicted to be about 0.9 kb. As illustrated below, to isolate the apoferritin H cDNA from the ATM library, a gene specific primer ("g31340") that contains 7 nucleotides of the Bpm I adapter and the 14 nucleotides of the transcript marker (5' gctggagccgagagtcgtcgg) was designed. The cDNA inserts from 1 ug of the prostate ATM library were amplified by 30 cycles of PCR using the Ml 3 forward and reverse primers. This PCR reaction was diluted 1 :50 in water and 1 ul was reamplified for 33 cycles using the nested T7 and "g31340" primers. A 0.9 kb band was isolated, gel purified and cloned. DNA sequencing confirmed the identity of the cloned gene as full length apoferritin H. Alternatively, the 0.9 kb PCR product is sequenced directly with appropriate primers. Ml 3 forward Ml 3 reverse g31340r T7
5' vector
IV. High Throughput Isolation of cDNA clones
High throughput isolation of cDNA clones from an ATM library is achieved in the following manner. First, a master pool of insert cDNA is created from an ATM library by PCR amplification using primers found in the vector (e.g., Ml 3 forward and reverse). Second, gene specific primers are synthesized in 96 well arrays (Gibco). Third, aliquots of gene specific primers, PCR reagents, and the master cDNA pool are aliquoted to 96 PCR wells for PCR and subsequently analyzed by gel electrophoresis. In addition, an initial screen for successful PCR reactions is accomplished by doing real time flourescent detection of PCR products. With this technique only those reactions which give a significant fluorescent signal above background would then be analyzed by gel electrophoresis. V. Extension of Transcript Markers to Full Length
The nucleic acid sequence of transcript markers can be used to design oligonucleotide primers for extending a partial nucleotide sequence to full length or for obtaining 5' sequences from genomic libraries. One primer is synthesized to initiate extension in the antisense direction (XLR) and the other is synthesized to extend sequence in the sense direction (XLF). Primers allow the extension of the known sequence "outward" generating amplified nucleotide sequences containing new, unknown nucleotide sequence for the region of interest (US Patent Application 08/487,112, filed June 7, 1995, specifically incorporated by reference). The initial primers are designed from the cDNA using OLIO® 4.06 Primer Analysis Software (National Biosciences), or another appropriate program, to be 22-30 nucleotides in length, to have a GC content of 50% or more, and to anneal to the target sequence at temperatures about 68°-72° C. Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations is avoided. The original, selected cDNA libraries, or a human genomic library, is used to extend the sequence. A genomic library is most useful to obtain 5' upstream regions. If more extension is necessary or desired, additional sets of primers are designed to further extend the known region. By following the instructions for the XL-PCR kit (Perkin Elmer) and thoroughly mixing the enzyme and reaction mix, high fidelity amplification is obtained. Beginning with 40 pmol of each primer and the recommended concentrations of all other components of the kit, PCR is performed using the Peltier Thermal Cycler (PTC200; MJ Research, Watertown MA) and the following parameters:
Step 1 94 ° C for 1 min (initial denaturation)
Step 2 65° C for 1 min Step 3 68° C for 6 min
Step 4 94° C for 15 sec
Step 5 65° C for 1 min
Step 6 68° C for 7 min
Step 7 Repeat steps 4-6 for 15 additional cycles Step 8 94° C for 15 sec
Step 9 65 ° C for 1 min
Step 10 68° C for 7:15 min
Step 11 Repeat step 8-10 for 12 cycles
Step 12 72° C for 8 min Step 13 4° C (and holding)
A 5-10 μl aliquot of the reaction mixture is analyzed by electrophoresis on a low concentration (about 0.6-0.8%) agarose mini-gel to determine which reactions were successful in extending the sequence. Bands thought to contain the largest products were selected and cut out of the gel. Further purification involves using a commercial gel extraction method such as QIAQuick™ gel extraction (QIAGEN Inc). After recovery of the DNA, Klenow enzyme was used to trim single-stranded, nucleotide overhangs creating blunt ends which facilitate religation and cloning.
After ethanol precipitation, the products are redissolved in 13 μl of ligation buffer, lμl T4-DNA ligase (15 units) and lμl T4 polynucleotide kinase are added, and the mixture is incubated at room temperature for 2-3 hours or overnight at 16° C. Competent E\ coli cells (in 40 μl of appropriate media) are transformed with 3 μl of ligation mixture and cultured in 80 μl of SOC medium (Sambrook J et al, supra). After incubation for one hour at 37° C, the whole transformation mixture is plated on Luria Bertani (LB)-agar (Sambrook J et al, supra) containing 2xCarb. The following day, several colonies are randomly picked from each plate and cultured in 150 μl of liquid LB/2xCarb medium placed in an individual well of an appropriate, commercially-available, sterile 96- well microtiter plate. The following day, 5 μl of each overnight culture is transferred into a non-sterile 96-well plate and after dilution 1 : 10 with water.
5 μl of each sample is transferred into a PCR array.
For PCR amplification, 18 μl of concentrated PCR reaction mix (3.3x) containing 4 units of rTth DNA polymerase. a vector primer and one or both of the gene specific primers used for the extension reaction are added to each well. Amplification is performed using the following conditions:
Step 1 94° C for 60 sec
Step 2 94° C for 20 sec
Step 3 55° C for 30 sec Step 4 72° C for 90 sec
Step 5 Repeat steps 2-4 for an additional 29 cycles
Step 6 72° C for 180 sec
Step 7 4° C (and holding)
Aliquots of the PCR reactions are run on agarose gels together with molecular weight markers. The sizes of the PCR products are compared to the original partial cDNAs, and appropriate clones are selected, ligated into plasmid and sequenced.
VI. 5 - aza-2' Deoxycytidine Treatment of Cells
5-aza-2'-deoxycytidine induces transcription of silent genes, presumably by demethylating cytosines in CpG islands which are regulatory regions located upstream of most genes, btaining libraries from cells treated with 5-aza-2'-deoxycytidine will enhance the discovery of rare genes and genes expressed in specialized cell types difficult to isolate and prepare RNA from.
Methods
THP1 cells at a density of 1.1 million cells per ml were treated for three days with 0.8 micromolar 5-aza-2'-deoxycytidine. The medium used for growth conditions was Iscove's modified DMEM with 10% Fetal Bovine Serum.
HNT precursor cells at 80% confluency were treated for three days with 0.35 micromolar 5-aza-2'-deoxycytidine. The medium used for growth conditions was Iscove's modified DMEM.
Because 5-aza-2'-deoxycytidine has been shown to be toxic in cultured cells and animal, initial experiments were conducted to assess the toxicity of 5-aza-2'-deoxycytidine on hNT and THP1 cells to establish conditions where the cells would survive and RNA could be recovered. Around 1 micro molar 5-aza-2'-deoxycytidine is a concentration typically used to induce silent gene transcription. A concentration of 0.8 micro molar was selected for THP1 cells and 0.35 micro molar for hNT cells.
-22 Results
Northern analysis to measure the RNA levels of two identified genes was performed on the cells to verify that 5-aza-2'-deoxycytidine was inducing gene transcription under the conditions used. Significant induction of both identified genes by 5-aza-2'-deoxycytidine was obtained in the hNT cells. Many genes were found to be induced by 5-aza-2'-deoxycytidine in both THP1 cells and hNT cells.
The following 3 genes were in the cDNA library constructed from mRNA from THP cells treated with 5 -aza-2' deoxycytidine library and not in the control cDNA library: enBank g29382, human BBC1 mRNA; GenBank g337507, human ribosomal protein S25 mRNA; and gl 84553 human insulinoma pig-analog mRNA.
The following genes were in the cDNA library constructed from mRNA from THP cells treated with 5 -aza-2' deoxycytidine library and in the control cDNA library and found upregulated in THP1 : GenBank g 182055 human neutrophil elastase mRNA, 3' end; g36143 human mRNA for ribosomal protein SI 1; g793842 human mRNA for ribosomal protein L29; g28976 human mRNA for azurocidin; and g34891 1 human glycoprotein mRNA.
The following 3 genes were in the cDNA library constructed from mRNA from hNT cells treated with 5 -aza-2' deoxycytidine library and not in the control cDNA library: gl90233 human acidic ribosomal phosphoprotein PI ; g436217 human mRNA (KIAA0037) for ORF; and g385936 hingeOXPHOS system complex III. An additional result of 5-aza-2'-deoxycytidine treatment was a decrease in the expression of many genes, particularly more abundant mRNAs. VII. Construction of the Plasmid pIIEZ-1
Plasmid pIIEZ-1 was derived from pUC19 vector by removing the 991 base pairs from the restriction endonuclease sites Sspl through Afl3. A 90 base pair synthetic polylinker containing type IIs restriction endonuclease sites were ligated to the remaining pUC19 vector, as shown in Figure 13.
The type IIs polylinker was made by annealing the two synthetic oligomers P-II-1 and P- II-2. The oligomer sequences are: P-II-1 5'
CATGTGCGGCCGCGGCGCGCCGTCAGCTGCACAGATGCGAGCTCCAGGCATTCATCC TTAAGTACTTCAGTAGACGTCCCTGCAGGTGAATTC P-II-2
5'
GAATTCACCTGCAGGGACGTCTACTGAAGTACTTAAGGATGAATGCCTGGAGCTCGC
ATCTGTGCAGCTGACGGCGCGCCGCGGCCGCA After annealing, the double stranded linker has a 5' blunt end which is ligated into the
Sspl restriction endonuclease site and a 4 bp (5' CATG) extension which is complimentary to the
Afl3 restriction endonuclease site.
As illustrated in Figure 13, the polylinker contains 4 type IIs restriction endonuclease sites which can be used for generation of 14 bp transcript markers. Standard PCR primer directed site specific mutagenesis is carried out to eliminate all type
IIs restriction endonuclease sites and all 4 base pair restriction endonuclease sites.
VIII. Genome Closure
A method directed toward genome closure involves the systematic identification of all transcribed genes. The method involves the generation of cDNA's with a defined 5'-end. generation of defined priming sites for extension PCR amplification from within the cloned cDNAs, systematic amplification of all cDNAs, and sequencing of all extension products.
Generation of cDNA's with a defined 5'-end
As illustrated below, after conversion of mRNA into cDNA with standard methods, the cDNA's are digested with a 4 base pair cutter. The cDNAs are ligated into the vector displaying a type IIs restriction endonuclease site at one end.
1) Generation of cDNAs
*Biotin cDNA TTTTTTTTTTTTTT NotI AAAAAAAAAAAAAA
2) Digestion with Hpa II
*Biotin V- — cDNA V TTTTTTTTTTTTTT NotI V V AAAAAAAAAAAAAA
3) Capture of cDNA's with Streptavidin-beads *Biotin
CGG TTTTTTTTTTTTTT NotI Y
C AAAAAAAAAAAAAA bead
4) Ligation of adapter/linker
adapter/linker
*Biotin TCGACTAGTACGAAGA CGG τττττττττχττττ NotI γ GATCATGCTTCTGC C AAAAAAAAAAAAAA bead
*Biotin
TCGACTAGTACGAAGACGG ΓTTTTTTTTTTTTT NotI Y GATCATGCTTCTGCC AAAAAAAAAAAAAA bead
5) Not I digest, ligation into pUC 18 vector (Not I/Sal I)
- v τττττ NotI
vector-
6) Digestion with Bbs I, cuts 6bp into cDNA Bbs I (digest with Bbs I)
GAAGACGC N NNNNN N
CTTCTGCGNNNN NNNN
Sail vector- -Notl 7) Bbs I (fill in with Klenow)
GAAGACGCN NN NN NNNN N
CTTCTGCGNNNN NNNNNNNNN
Sail vector- -Notl
10
Generation of defined priming sites for extension PCR amplification from within the cloned cDNAs.
The cDNAs are cloned into a vector, amplified and digested with the type IIs restriction 15 endonuclease. Linkers are cloned into the linearized plasmid/cDNA fragment. 8)Bbs I CATGATCATGG
GTACTAGTACCGG
20 9)Ligate on linkers
GAAGACGCNNNNCATGATCATGG GGCCATGATCATGNNNNNNNNN
CTTCTGCGNNNNGTACTAGTACCGG GGTACTAGTACNNNNNNNNN
25
Sail vector- -Notl
30 10)Treat with T4-DNA polymerase in presence of (dA, dT) a) GAAGACGCNNNNCATGATCAT GGCCATGATCATGNNNNNNNNN CTTCTGCGNNNNGTACTAGTACCGG TACTAGTACNNNNNNNNN
Sail - vector NotI
Systematic amplification of all cDNAs
Extension PCR as described in Example V with combinations of defined primer sets (up to 10-fold degeneracy can be achieved, depending on the type IIs enzyme selected in the step 4 adapter). Therefore selective amplification of every cDNA present in the library can be achienved in a systematic manner.
ATCATGGCCATGATCATGN4-> GAAGACGCNNNNCATGATCATGGCCATGATCATGNNNNNNNNNNNNNNNNN
CTTCTGCGNNNNGTACTAGTACCGGTACTAGTACNNNNNNNNNNNNNNNNN I <-4NGTACTAGTACCGGTACTA |
Sail vector NotI
Since NNNN and NNNN are symmetrical combinations for primer pairs are given (e. 5' NNNN 3' = 5' AGCA 3' then
5'ATCATGGCCATGATCATGAGCA 3' will form a pair with 5'ATCATGGCCATGATCATGTCGT 3*
Any PCR-reaction will cover 2 primer combinations.
combination 1 :
5ATCATGGCCATGATCATGAGCA 3' 5'ATCATGGCCATGATCATGTCGT 3' and combination 2:
5'ATCATGGCCATGATCATGAGCA 3' 5'ATCATGGCCATGATCATGTCGT 3'
Example of selective amplification with a 4 fold degenerate primer set resulting in 128 pools of cDNAs
To cover all possible combinations of 4 bases, 256 oligonucleotides will be synthesized (5'ATCATGGCCATGATCATGN NN 3'). Since every PCR-reaction covers 2 combinations, 128 reactions will cover all possible combinations. A range of 200-300 PCR-products/reaction will be expected for a specific tissue. Since this is an inverse PCR approach, the products are sequenced directly with the wobble primer, increasing the resolution by a factor 3. The PCR-products then are directly thermo-cycled with the wobble primers resulting in a higher resolution. One colour is choosen for each nucleotide with no terminator included in the reaction. The resulting products are run on a sequencing gel.
All publications and patents mentioned in the above specification are herein incorporated by reference. Various modifications of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the above-described modes for carrying out the invention which are obvious to those skilled in the field of molecular biology or related fields are intended to be within the scope of the following claims.
SEQUENCE LISTING
(1) GENERAL INFORMATION
(l) APPLICANT: INCYTE PHARMACEUTICALS, INC.
(11) TITLE OF THE INVENTION: METHODS FOR GENERATING AND ANALYZING TRANSCRIPT MARKERS
(m) NUMBER OF SEQUENCES: 50
(IV) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: Incyte Pharmaceuticals, Inc.
(B) STREET: 3174 Porter Drive
(C) CITY: Palo Alto
(D) STATE: CA
(E) COUNTRY: U.S.
(F) ZIP: 94304
(v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Diskette
(B) COMPUTER: IBM Compatible
(C) OPERATING SYSTEM: DOS
(D) SOFTWARE: FastSEQ Version 1.5
(vi) CURRENT APPLICATION DATA:
(A) PCT APPLICATION NUMBER: To Be Assigned
(B) FILING DATE: Filed Herewith
(vn) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: US 08/723,646
(B) FILING DATE: 03-OCT-1996
(vm) ATTORNEY/AGENT INFORMATION:
(A) NAME: Billings, Lucy J.
(B) REGISTRATION NUMBER: 36,749
(C) REFERENCE/DOCKET NUMBER: IN-0001 PCT
(lx) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: 650-855-0555
(B) TELEFAX: 650-845-4166
(2) INFORMATION FOR SEQ ID NO : 1 :
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 13 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: AATTCAGCTG GAG 13
(2) INFORMATION FOR SEQ ID NO : 2 :
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(11) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: CTCCAGCTG 9
(2) INFORMATION FOR SEQ ID NO: 3:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 41 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : CTGGAGNNNN NNNNNNNNNN NNNNNNNNNN NNNNNCTCCA G 41
(2) INFORMATION FOR SEQ ID NO : 4 :
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(vn) IMMEDIATE SOURCE:
(A) LIBRARY: GenBank
(B) CLONE: 31340, 31342, 28434
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: CCGAGAGTCG TCGG 14
(2) INFORMATION FOR SEQ ID NO: 5:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(vn) IMMEDIATE SOURCE:
(A) LIBRARY: GenBank
(B) CLONE: 31340
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : GCTGGAGCCG AGAGTCGTCG G 21
(2) INFORMATION FOR SEQ ID NO : 6 :
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 13 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : AATTCAGCTG GAG 13
(2) INFORMATION FOR SEQ ID NO : 7 :
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 35 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : AATTCAGCTG GAGNNNNNNN NNNNNNNNNC AGCTG 35
(2) INFORMATION FOR SEQ ID NO: 8:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II ) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: GTCGACCTCN NNNNNNNNNN NNNNNGTCGA C 31
(2) INFORMATION FOR SEQ ID NO : 9 :
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 16 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : CCCGAGGTCG ACTTAA 16
(2) INFORMATION FOR SEQ ID NO: 10:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA (vi) ORIGINAL SOURCE:
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: CTGGAGNNNN NNNNNNNNNN NN 22
(2) INFORMATION FOR SEQ ID NO: 11:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 20 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: GACCTCNNNN NNNNNNNNNN 20
(2) INFORMATION FOR SEQ ID NO: 12:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTh: 21 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 12: GTTGAATACT CATACTCTTC C 21
(2) INFORMATION FOR SEQ ID NO: 13:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 20 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: GCTGGCCTTT TGCTCACATG 20
(2) INFORMATION FOR SEQ ID NO: 14: ( ) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 60 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: GAATTCACCT GCAGGGACGT CTACTGAAGT ACTTAAGGAT GAATGCCTGG AGCTCGCATC 60 (2) INFORMATION FOR SEQ ID NO: 15:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 29 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: TGTGCAGCTG ACGGCGCGCC GCGGCCGCA 29
(2) INFORMATION FOR SEQ ID NO: 16:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 120 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16:
CAACCACCCG GGCCCTCCAG CTGGAGGAAA AAATGCTAGG CTGGAGGGCT GATGTTTTCC 60 CTGGAGCTAG TTCTAGATCG CTGGAGCTGC GCCCGGCCCG GGGACCGGCT ATCCCTCCAG 120
(2) INFORMATION FOR SEQ ID NO: 17:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 17: GGCCCGGGTG GTTG 14
(2) INFORMATION FOR SEQ ID NO: 18:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear (11) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: GAAAAAATGC TAGG 14
(2) INFORMATION FOR SEQ ID NO: 19:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: GGCTGATGTT TTCC 14
(2) INFORMATION FOR SEQ ID NO: 20:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: CTAGTTCTAG ATCG 14
(2) INFORMATION FOR SEQ ID NO: 21:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: CTGCGCCCGG CCCG 14
(2) INFORMATION FOR SEQ ID NO: 22:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: GGATAGCCGG TCCC 14
(2) INFORMATION FOR SEQ ID NO:23:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 93 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:
CATGTGCGGC CGCGGCGCGC CGTCAGCTGC ACAGATGCGA GCTCCAGGCA TTCATCCTTA 60 AGTACTTCAG TAGACGTCCC TGCAGGTGAA TTC 93
(2) INFORMATION FOR SEQ ID NO: 24:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 89 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:
GAATTCACCT GCAGGGACGT CTACTGAAGT ACTTAAGGAT GAATGCCTGG AGCTCGCATC 60 TGTGCAGCTG ACGGCGCGCC GCGGCCGCA 89
(2) INFORMATION FOR SEQ ID NO:25:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:
(2) INFORMATION FOR SEQ ID NO: 26:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: AAAAAAAAAA AAAA 14
(2) INFORMATION FOR SEQ ID NO: 27: (ι) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 16 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ll) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: TCGACTAGTA CGAAGA 16
(2) INFORMATION FOR SEQ ID NO: 28:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: CGTCTTCGTA CTAG 14
(2) INFORMATION FOR SEQ ID NO:29:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 12 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: GAAGACGCNN NN 12
(2) INFORMATION FOR SEQ ID NO: 30:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 12 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: NNNNGCGTCT TC 12
(2) INFORMATION FOR SEQ ID NO: 31:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (11) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: CATGATCATG G 11
(2) INFORMATION FOR SEQ ID NO: 32:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 13 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: GGCCATGATC ATG 13
(2) INFORMATION FOR SEQ ID NO:33:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 23 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: GAAGACGCNN NNCATGATCA TGG 23
(2) INFORMATION FOR SEQ ID NO: 34:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 25 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: GGCCATGATC ATGNNNNGCG TCTTC 25
(2) INFORMATION FOR SEQ ID NO: 35:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: GGCCATGATC ATGNNNNNNN NN 22 (2) INFORMATION FOR SEQ ID NO: 36:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 20 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: NNNNNNNNNC ATGATCATGG 20
(2) INFORMATION FOR SEQ ID NO: 37:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 21 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: GAAGACGCNN NNCATGATCA T 21
(2) INFORMATION FOR SEQ ID NO: 38:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 25 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: GGCCATGATC ATGNNNNGCG TCTTC 25
(2) INFORMATION FOR SEQ ID NO: 39:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II ) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: GGCCATGATC ATGNNNNNNN NN 22
(2) INFORMATION FOR SEQ ID NO: 40:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 18 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(11) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: NNNNNNNNNC ATGATCAT 18
(2) INFORMATION FOR SEQ ID NO : 41 :
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 41 : ATCATGGCCA TGATCATGNN NN 22
(2) INFORMATION FOR SEQ ID NO: 42:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 51 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: GAAGACGCNN NNCATGATCA TGGCCATGAT CATGNNNNNN NNNNNNNNNN N 51
(2) INFORMATION FOR SEQ ID NO : 43 :
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 51 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 :
NNNNNNNNNN NNNNNNNCAT GATCATGGCC ATGATCATGN NNNGCGTCTT C 51
(2) INFORMATION FOR SEQ ID NO: 44:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA ( xi ) SEQUENCE DESCRI PT ION : SEQ I D NO : 4 4 : ATCATGGCCA TGATCATGNN NN 22
( 2 ) INFORMATION FOR SEQ I D NO : 45 :
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: ATCATGGCCA TGATCATGAG CA 22
(2) INFORMATION FOR SEQ ID NO : 46 :
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : ATCATGGCCA TGATCATGTC GT 22
(2) INFORMATION FOR SEQ ID NO: 47:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: ATCATGGCCA TGATCATGAG CA 22
(2) INFORMATION FOR SEQ ID NO : 48 :
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 48 : ATCATGGCCA TGATCATGTC GT 22
(2) INFORMATION FOR SEQ ID NO : 49 : (I) SEQUENCE CHARACTERISTICS:
(A) LENGTh: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 49 : ATCATGGCCA TGATCATGAG CA 22
(2) INFORMATION FOR SEQ ID NO: 50:
(I) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 22 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(II) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: ATCATGGCCA TGATCATGTC GT 22

Claims

Claims:
1. A method for generating and analyzing transcript markers from the 5' most end of individual cDNAs of a cDNA library comprising the steps of: a) obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site for a restriction endonuclease that digests the cDNA at the 5' most end within an expected distance from its recognition site and a second endonuclease restriction site; b) subjecting the cDNAs to digestion with the first and the second restriction endonuclease thereby excising transcript markers from the 5' most end of the individual cDNAs; c) ligating said transcript markers to a vector; d) transforming said vector containing the transcript markers in a host cell and culturing said host cells; and e) performing nucleic acid sequence analysis of the transcript markers.
2. The method of Claim 1 further comprising the step of creating blunt ends on the
5' transcript markers after step b.
3. The method of Claim 1 wherein said first restriction enzyme is a Type IIS restriction enzyme.
4. The method of Claim 3 wherein the Type IIS restriction enzyme is selected from the group consisting of Bpml, Bsgl, Eco57I and BsmFI.
5. The method of Claim 1 wherein said cDNA library has been constructed by normalization techniques.
6. The method of Claim 1 wherein said cDNA library has been constructed using random primers.
7. The method of Claim 1 wherein said cDNA library has been constructed using olio dT primers.
8. The method of Claim 1 wherein the cDNA library has been constructed from mRNA treated with a demethylating agent.
9. The method of Claim 8 wherein the demethylating agent is 5-aza-2' deoxycytidine.
10. The method of Claim 1 wherein digesting the cDNA library with the second restriction enzyme creates a blunt end.
1 1. The method of Claim 1 optionally comprising the step of concatenating said 5' transcript markers after step b) thereby forming a serial arrangement of multiple 5' transcript markers.
12. The method of Claim 11 optionally comprising the step of amplifying the serial arrangement of multiple 5' transcript markers by polymerase chain reaction prior to ligating to a vector.
13. A method for determining a representation of gene expression in a cDNA library comprising the steps of: a) obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site for a restriction endonuclease that digests the cDNA at the 5' most end within an expected distance from its recognition site and a second endonuclease restriction site and wherein the cDNA library has been grown under non-amplified growth conditions; b) generating and isolating 5' transcript markers from the cDNA library wherein each isolated 5' transcript marker identifies an individual cDNA; c) ligating the isolated 5' transcript markers to a vector and transforming the vector in a host cell; d) culturing said host cells; e) performing nucleic acid sequencing on the 5' transcript markers in said transformed host cells to create a nucleic acid data set; and f) subjecting the data set to analysis to determine the relative abundance of individual 5' transcript markers thereby determining a representation of gene expression in the cDNA library.
14. The method of Claim 13 comprising concatenating the isolated 5' transcript markers after step b) thereby creating a serial arrangement of multiple 5' transcript markers.
15. The method of Claim 13 wherein said cDNA library has been constructed with olio dT primers.
16. The method of Claim 13 wherein after step b) the isolated 5' transcript markers are blunt ended.
17. A method for rapid nucleic acid sequence analysis of cDNA comprising the steps of: a) obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site for a restriction endonuclease that digests the cDNA at the 5' most end within an expected distance from its recognition site and a second endonuclease restriction site; b) subjecting the cDNA library to digestion with the first restriction endonuclease and the second restriction endonuclease to create 5' transcript markers which identify individual cDNAs; c) isolating said 5' transcript markers; d) concatenating said 5' transcript markers to create a serial arrangement of multiple transcript markers; e) ligating said serial arrangement of multiple transcript markers to a vector and transforming said host cell with the vector; f) performing nucleic acid sequencing analysis on said serial arrangement of 5' transcript markers and identifying transcript markers; g) preparing a first and a second PCR primer, wherein the first PCR primer is designed from a transcript marker identified in step f) and the second PCR primer is designed from a section of nucleic acid common to all cDNAs in the cDNA library; h) subjecting the cDNA library to a polymerase chain reaction using the primers of step g) thereby identifying the cDNA designated by a transcript marker; and i) performing nucleic acid sequencing on the identified DNA.
18. The method of Claim 17 wherein wherein the polymerase chain reaction of step h) is performed in 96 well plates.
19. The method of Claim 17 optionally comprising the step of amplifying the serial arrangement of multiple 5' transcript markers by polymerase chain reaction after step e).
20. The method of Claim 17 further comprising the step of creating blunt ends on the 5' transcript markers after step c.
21. The method of Claim 17 wherein said first restriction enzyme is a Type IIS restriction enzyme.
22. The method of Claim 17 wherein the Type IIS restriction enzyme is selected from the group consisting of Bpml, Bsgl, Eco57I and BsmFI
23. The method of Claim 17 wherein said cDNA library has been constructed by normalization techniques.
24. The method of Claim 17 wherein said cDNA library has been constructed using random primers.
25. The method of Claim 17 wherein said cDNA library has been constructed using olio dT primers.
26. The method of Claim 17 wherein the cDNA library has been constructed from mRNA treated with a demethylating agent.
27. The method of Claim 26 wherein the demethylating agent is 5- aza-2'- deoycytidine.
28. The method of Claim 17 wherein digesting the cDNA library with the second restriction enzyme creates a blunt end.
29. A method for the rapid, sequence-specific identification of cDNAs derived from a human mRNA population, comprising the steps of: a) obtaining a cDNA library comprising individual cDNAs wherein said cDNAs contain first restriction endonuclease sites for an endonuclease having a 4 base pair recognition site and wherein said cDNAs are cloned into a first vector lacking the first restriction endonuclease sites; b) subjecting the cDNA library to digestion with the first restriction endonuclease thereby creating linearized cDNAs containing a portion of the original cDNA; c) ligating an adapter to said linearized cDNAs wherein the adapter contains a second restriction endonuclease site for an endonuclease that cleaves within an expected range of its recognition site thereby creating cDNAs containing two non-contiguous transcript markers joined by the adapter; d) digesting the cDNAs of step c) with the second restriction endonuclease thereby excising the transcript markers from said cDNAs; e) concatenating said excised transcript markers; f) ligating said concatenated transcript markers to a second vector; g) transforming said second vector containing the transcript markers in a host cell and culturing said host cells; and h) performing nucleic acid sequence analysis on the transcript markers in said host cells.
30. The method of Claim 29 optionally comprising concatenating the transcript markers after step e) to create a serial arrangement of multiple transcript markers.
31. The method of Claim 29 wherein the cDNA library has been constructed from mRNA treated with a demethylating agent.
32. The method of Claim 29 wherein the demethylating agent is 5-aza-2' deoxycytidine.
33. The method of Claim 29 wherein the cDNA library has been constructed by normalization techniques.
34. The method of Claim 29 wherein the second endonuclease is Bcgl.
35. The method of Claim 29 wherein the nucleic acid sequence analysis comprises amplifying transcript markers in an outward manner using 2 specific primers and sequencing directly using a wobble primer.
36. The vector as shown in Figure 13.
37. A method for generating and analyzing non-contiguous transcript markers derived from the 5' most and 3' most end of individual cDNAs, comprising the steps of: a) obtaining a cDNA library comprising individual cDNAs having a first restriction endonuclease site at the 5' most end and at the 3' most end for a restriction endonuclease that digests nucleic acid within an expected distance from the first endonuclease recognition site, and a second endonuclease restriction site; b) subjecting the cDNAs to digestion with the first endonuclease thereby creating linearized cDNAs containing transcript markers from the 5' most end and the 3' most end of the individual cDNAs; c) self-ligating said linearized cDNAs thereby joining the transcript markers from the 5' most and 3' most ends to create cDNAs containing non-contiguous transcript markers; d) transforming said linearized cDNAs in a host cell and culturing said host cells; e) isolating said cDNA from said host cells; f) digesting the cDNA with the second restriction endonuclease thereby excising the non-contiguous transcript markers; g) ligating said excised transcript markers to a second vector; f) transforming said second vector containing the transcript markers in a host cell and culturing said host cells; and h) performing nucleic acid sequence analysis of the transcript markers.
38. The method of Claim 37 further comprising creating blunt ends on the 5' transcript markers after step b.
39. The method of Claim 37 wherein said first restriction enzyme is a Type IIS restriction enzyme.
40. The method of Claim 37 wherein the Type IIS restriction enzyme is selected from the group consisting of Bpml, Bsgl, Eco57I and BsmFI.
41. The method of Claim 37 wherein said cDNA library has been constructed by normalization techniques.
42. The method of Claim 37 wherein said cDNA library has been constructed using random primers.
43. The method of Claim 37 wherein said cDNA library has been constructed using olio dT primers.
44. The method of Claim 37 wherein the cDNA library has been constructed from mRNA treated with a demethylating agent.
45. The method of Claim 44 wherein the demethylating agent is 5-aza-2' deoxycytidine.
46. The method of Claim 37 wherein digesting the cDNA library with the second restriction enzyme creates a blunt end.
47. The method of Claim 37 comprising concatenating said 5' transcript markers after step b) thereby forming a serial arrangement of multiple 5' transcript markers.
48. The method of Claim 47 comprising amplifying the serial arrangement of multiple 5' transcript markers by polymerase chain reaction prior to ligating to a vector.
PCT/US1997/018344 1996-10-03 1997-10-03 Methods for generating and analyzing transcript markers WO1998014619A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU47533/97A AU4753397A (en) 1996-10-03 1997-10-03 Methods for generating and analyzing transcript markers

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US72364696A 1996-10-03 1996-10-03
US08/723,646 1996-10-03

Publications (1)

Publication Number Publication Date
WO1998014619A1 true WO1998014619A1 (en) 1998-04-09

Family

ID=24907109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/018344 WO1998014619A1 (en) 1996-10-03 1997-10-03 Methods for generating and analyzing transcript markers

Country Status (2)

Country Link
AU (1) AU4753397A (en)
WO (1) WO1998014619A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998031838A1 (en) * 1997-01-15 1998-07-23 Chugai Pharmaceutical Co., Ltd. Method for analyzing quantitative expression of genes
US5981190A (en) * 1997-01-08 1999-11-09 Ontogeny, Inc. Analysis of gene expression, methods and reagents therefor
EP0981535A1 (en) * 1997-05-12 2000-03-01 Life Technologies, Inc. Methods for production and purification of nucleic acid molecules
WO2000053806A1 (en) * 1999-03-05 2000-09-14 Chugai Pharmaceutical Co. Ltd. Method of identifying gene transcription patterns
WO2001000816A1 (en) * 1999-03-18 2001-01-04 Complete Genomics As Methods of cloning and producing fragment chains with readable information content
US6399334B1 (en) 1997-09-24 2002-06-04 Invitrogen Corporation Normalized nucleic acid libraries and methods of production thereof
WO2004024953A1 (en) * 2002-09-12 2004-03-25 Kureha Chemical Industry Company, Limited Method for preparation of cdna tags for identifying expressed genes and method for analysis of gene expression
US7135464B2 (en) 2002-06-05 2006-11-14 Supergen, Inc. Method of administering decitabine

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994370A (en) * 1989-01-03 1991-02-19 The United States Of America As Represented By The Department Of Health And Human Services DNA amplification technique
WO1995008647A1 (en) * 1993-09-24 1995-03-30 The Trustees Of Columbia University In The City Of New York METHOD FOR CONSTRUCTION OF NORMALIZED cDNA LIBRARIES
WO1995020681A1 (en) * 1994-01-27 1995-08-03 Incyte Pharmaceuticals, Inc. Comparative gene transcript analysis
US5508169A (en) * 1990-04-06 1996-04-16 Queen's University At Kingston Indexing linkers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994370A (en) * 1989-01-03 1991-02-19 The United States Of America As Represented By The Department Of Health And Human Services DNA amplification technique
US5508169A (en) * 1990-04-06 1996-04-16 Queen's University At Kingston Indexing linkers
WO1995008647A1 (en) * 1993-09-24 1995-03-30 The Trustees Of Columbia University In The City Of New York METHOD FOR CONSTRUCTION OF NORMALIZED cDNA LIBRARIES
WO1995020681A1 (en) * 1994-01-27 1995-08-03 Incyte Pharmaceuticals, Inc. Comparative gene transcript analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JÜTTERMANN ET AL.: "TOXICITY OF 5-AZA-2'-DEOXYCYTIDINE TO MAMMALIAN CELLS IS MEDIATED PRIMARILY BY COVALENT TRAPPING OF DNA METHYLTRANSFERASE RATHER THAN DNA DEMETHYLATION", PNAS, vol. 91, 1994, pages 11797 - 11801, XP002053722 *
KATO: "DESCRIPTION OF THE ENTIRE mRNA POPULATION BY A 3'END cDNA FRAGMENT GENERATED BY CLASS IIS RESTRICTION ENZYMES", NUCLEIC ACID RESEARCH, vol. 23, no. 18, 1995, pages 3685 - 3690, XP002053720 *
VELCULESCU ET AL.: "SERIAL ANALYSIS OF GENE EXPRESSION", SCIENCE, vol. 270, 1995, pages 484 - 487, XP002053721 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5981190A (en) * 1997-01-08 1999-11-09 Ontogeny, Inc. Analysis of gene expression, methods and reagents therefor
US6461814B1 (en) 1997-01-15 2002-10-08 Dominic G. Spinella Method of identifying gene transcription patterns
US5968784A (en) * 1997-01-15 1999-10-19 Chugai Pharmaceutical Co., Ltd. Method for analyzing quantitative expression of genes
WO1998031838A1 (en) * 1997-01-15 1998-07-23 Chugai Pharmaceutical Co., Ltd. Method for analyzing quantitative expression of genes
EP0981535A1 (en) * 1997-05-12 2000-03-01 Life Technologies, Inc. Methods for production and purification of nucleic acid molecules
EP0981535A4 (en) * 1997-05-12 2000-11-29 Life Technologies Inc Methods for production and purification of nucleic acid molecules
US6399334B1 (en) 1997-09-24 2002-06-04 Invitrogen Corporation Normalized nucleic acid libraries and methods of production thereof
WO2000053806A1 (en) * 1999-03-05 2000-09-14 Chugai Pharmaceutical Co. Ltd. Method of identifying gene transcription patterns
WO2001000816A1 (en) * 1999-03-18 2001-01-04 Complete Genomics As Methods of cloning and producing fragment chains with readable information content
US7371851B1 (en) 1999-03-18 2008-05-13 Complete Genomics As Methods of cloning and producing fragment chains with readable information content
US7135464B2 (en) 2002-06-05 2006-11-14 Supergen, Inc. Method of administering decitabine
US7144873B2 (en) 2002-06-05 2006-12-05 Supergen, Inc. Kit for delivering decitabine in vivo
WO2004024953A1 (en) * 2002-09-12 2004-03-25 Kureha Chemical Industry Company, Limited Method for preparation of cdna tags for identifying expressed genes and method for analysis of gene expression

Also Published As

Publication number Publication date
AU4753397A (en) 1998-04-24

Similar Documents

Publication Publication Date Title
US5637685A (en) Normalized cDNA libraries
EP0592626B1 (en) METHODS TO CLONE mRNA
US5262311A (en) Methods to clone polyA mRNA
Pastorian et al. Optimization of cDNA representational difference analysis for the identification of differentially expressed mRNAs
US5827658A (en) Isolation of amplified genes via cDNA subtractive hybridization
EP0981609B1 (en) A method to clone mrnas and display of differentially expressed transcripts (dodet)
WO2003064691A2 (en) Methods and means for amplifying nucleic acid
WO1997024455A2 (en) METHODS AND COMPOSITIONS FOR FULL-LENGTH cDNA CLONING
WO2000075356A1 (en) Rna polymerase chain reaction
US6461814B1 (en) Method of identifying gene transcription patterns
CA2332339A1 (en) Improved method for simultaneous identification of differentially expressed mrnas and measurement of relative concentrations
EP1212449A1 (en) Method for amplifying signal-flanking sequences from unknown genomic dna
WO1997007244A1 (en) ISOLATION OF AMPLIFIED GENES VIA cDNA SUBTRACTIVE HYBRIDIZATION
US5891637A (en) Construction of full length cDNA libraries
WO1998014619A1 (en) Methods for generating and analyzing transcript markers
US20030099962A1 (en) Methods to isolate gene coding and flanking DNA
US20060228714A1 (en) Nucleic acid representations utilizing type IIB restriction endonuclease cleavage products
WO1988007585A1 (en) Equal-abundance transcript composition and method
CA2333254A1 (en) Method for simultaneous identification of differentially expressed mrnas and measurement of relative concentrations
JP2002537821A (en) A method for large-scale cDNA cloning and sequencing by repeating subtraction
US20040002104A1 (en) Constant length signatures for parallel sequencing of polynucleotides
WO1999043844A1 (en) Reciprocal subtraction differential display
JP2006508677A (en) Oligonucleotide-induced analysis of gene expression
WO1999040208A1 (en) In vivo construction of dna libraries
JP4403069B2 (en) Methods for using the 5 &#39;end of mRNA for cloning and analysis

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BR CA CH CN DE DK ES FI GB IL JP KR MX NO NZ RU SE SG US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
CFP Corrected version of a pamphlet front page

Free format text: ADD INID NUMBER (63) "RELATED BY CONTINUATION (CON) OR CONTINUATION-IN-PART (CIP) TO EARLIER APPLICATION" WHICH WAS INADVERTENTLY OMITTED FROM THE FRONT PAGE

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1998516982

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase