WO2001012856A2 - Analysis of sequence tags with hairpin primers - Google Patents

Analysis of sequence tags with hairpin primers Download PDF

Info

Publication number
WO2001012856A2
WO2001012856A2 PCT/US2000/022246 US0022246W WO0112856A2 WO 2001012856 A2 WO2001012856 A2 WO 2001012856A2 US 0022246 W US0022246 W US 0022246W WO 0112856 A2 WO0112856 A2 WO 0112856A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
ofthe
fragments
haiφin
sample
Prior art date
Application number
PCT/US2000/022246
Other languages
French (fr)
Other versions
WO2001012856A3 (en
Inventor
Paul M. Lizardi
Darin R. Latimer
Original Assignee
Yale University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/544,713 external-priority patent/US6261782B1/en
Application filed by Yale University filed Critical Yale University
Priority to AU67708/00A priority Critical patent/AU6770800A/en
Publication of WO2001012856A2 publication Critical patent/WO2001012856A2/en
Publication of WO2001012856A3 publication Critical patent/WO2001012856A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/705Receptors; Cell surface antigens; Cell surface determinants
    • C07K14/70578NGF-receptor/TNF-receptor superfamily, e.g. CD27, CD30, CD40, CD95
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10TTECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
    • Y10T436/00Chemistry: analytical and immunological testing
    • Y10T436/14Heterocyclic carbon compound [i.e., O, S, N, Se, Te, as only ring hetero atom]
    • Y10T436/142222Hetero-O [e.g., ascorbic acid, etc.]
    • Y10T436/143333Saccharide [e.g., DNA, etc.]

Definitions

  • the disclosed invention is generally in the field of nucleic acid characterization and analysis, and specifically in the area of analysis and comparison of gene expression patterns and genomes.
  • RNA subsets allow the expression of mRNA subsets to be determined in a parallel manner.
  • sequence of the analyzed mRNA has to be known in order to synthesize a labeled cDNA that forms a hybrid with the selected mRNA; such hybrids resist RNA degradation by a single-strand-specific nuclease and can be detected by gel electrophoresis.
  • differential plaque-filter hybridization allows the identification of specific differences in the expression of cloned cDNAs
  • One of these is the selective amplification of differentially expressed mRNAs via biotin- and restriction-mediated enrichment (SABRE; Lavery et al., Proc. Natl. Acad. Sci. U.S.A. 94:6831-6836 (1997)), cDNAs derived from a tester population are hybridized against the cDNAs of a driver (control) population. After a purification step specific for tester-cDNA-containing hybrids, tester-tester homohybrids are specifically amplified using an added linker, thus allowing the isolation of previously unknown genes.
  • SABRE biotin- and restriction-mediated enrichment
  • RDA representational difference analysis
  • the linkers of the tester and driver cDNA are digested and a new linker is ligated to the ends of the tester cDNA.
  • the tester and driver cDNAs are then mixed in a 1 : 100 ratio with an excess of driver cDNA in order to promote hybridization between single-stranded cDNAs common in both tester and driver cDNA pools.
  • a PCR exponentially amplifies only those homoduplexes generated by the tester cDNA, via the priming sites on both ends of the double-stranded cDNA (O'Neill and Sinclair, Nucleic Acids Res. 25:2681-2682 (1997); Wada et al., Kidney Int. 51 :1629-1638 (1997); Edman et al., J 323:113-118 (1997)).
  • the gene-expression pattern of a cell or organism determines its basic biological characteristics. In order to accelerate the discovery and characterization of mRNA-encoding sequences, the idea emerged to sequence fragments of cDNA randomly, direct from a variety of tissues (Adams et al, Science 252:1651-1656 (1991); Adams et al., Nature 377:3-16 (1995)). These expressed sequence tags (ESTs) allow the identification of coding regions in genome-derived sequences. Publicly available EST databases allow the comparative analysis of gene expression by computer. Differentially expressed genes can be identified by comparing the databases of expressed sequence tags of a given organ or cell type with sequence information from a different origin (Lee et al, Proc. Natl. Acad. Sci. U.S.A.
  • Serial analysis of gene expression is a sequence-based approach to the identification of differentially expressed genes through comparative analyses (Velculescu et al., Science 270:484-487 (1995)). It allows the simultaneous analysis of sequences that derive from different cell population or tissues. Three steps form the molecular basis for SAGE: (1) generation of a sequence tag (10-14 bp) to identify expressed transcripts; (2) ligation of sequence tags to obtain concatemers that can be cloned and sequenced; and (3) comparison of the sequence data to determine differences in expression of genes that have been identified by the tags. This procedure is performed for every mRNA population to be analyzed.
  • SAGE sequencing experiments yield diminishing returns for rare mRNAs, whose unique tags will begin to accumulate in the database only after many weeks of sequencing effort.
  • DNA microarrays are systematically gridded at high density. Such microarrays are generated by using cDNAs (for example, ESTs), PCR products or cloned DNA, which are linked to the surface of nylon filters, glass slides or silicon chips (Schena et al., Science 270, 467-470 (1995).
  • cDNAs for example, ESTs
  • PCR products for example, PCR products
  • cloned DNA which are linked to the surface of nylon filters, glass slides or silicon chips (Schena et al., Science 270, 467-470 (1995).
  • DNA arrays can also be assembled from synthetic ohgonucleotides, either by directly applying the synthesized ohgonucleotides, either by directly applying the synthesized ohgonucleotides to the matrix or by a more sophisticated method that combines photolithography and solid-phase chemical synthesis (Fodor et al., Nature 364:555-556 (1993)).
  • labeled cDNAs or ohgonucleotides are hybridized to the DNA-or oligomer-carrying arrays.
  • two probes can be applied simultaneously to the array and compared at different wavelengths.
  • the method involves amplifying nucleic acid fragments of interest using a primer that can form a hairpin structure; sequence-based coupling of the amplified fragments detector probes; and detection of the coupled fragments.
  • the amplified fragments are coupled by hybridization and covalent coupling, preferably by ligation, to a detector probe.
  • the probe is preferably immobilized in an array or on sortable beads.
  • a hai ⁇ in structure formed at the end of the amplified fragments facilitates coupling of the fragments to the probes.
  • the method allows detection of the fragments where detection provides some sequence information about the fragments.
  • the method allows a complex sample of nucleic acid to be cataloged quickly and easily in a reproducible and sequence-specific manner.
  • the method can also be used to detect amplified fragments having a known sequence.
  • Figures 1 A- IE are a listing of examples of hairpin primers and the hai ⁇ in structure that forms from the resulting hai ⁇ in ligator inco ⁇ orated at the end of an amplified fragment.
  • Nucleotides in one of the strands of the stem of the hai ⁇ in structure are represented by H.
  • Nucleotides in the primer sequence of the hai ⁇ in primer are represented by p and P.
  • Nucleotides in the part of the primer sequence involved in one of the strands of the stem of the hai ⁇ in structure are represented by P.
  • Nucleotides in the fragment are represented by f and F.
  • Nucleotides in the part of the fragment sequence involved in one of the strands of the stem of the hai ⁇ in structure are represented by F.
  • Other nucleotides in the hairpin primer are represented by n.
  • hai ⁇ in ligator for hai ⁇ in primer 10 which represents an example of a hai ⁇ in primer used with adaptor-indexers
  • nucleotides in the primer sequence corresponding to sticky end sequences are boldface, nucleotides corresponding to adaptor-indexer sequences are underlined, and the recognition sequence of the restriction endonuclease (Fokl in this example) is listed as CCTAC.
  • Figures 2A-2B are a diagram of nucleic acid molecules used and formed during an example of the disclosed method using generic sequences. Ligation of the top strand of the amplified fragment is illustrated. Nucleotides in one of the strands of the stem of the hai ⁇ in structure are represented by H. Nucleotides in the primer sequence of the hairpin primer are represented by p and P. Nucleotides in the part of the primer sequence involved in one of the strands of the stem of the hai ⁇ in structure are represented by P. Nucleotides in the fragment are represented by c, f, and F. Nucleotides in the part of the fragment sequence involved in one of the strands of the stem of the hai ⁇ in structure are represented by F.
  • Nucleotides in the fragment complementary to the primer sequence of the hai ⁇ in primer are represented by c. Nucleotides in the detector probe are represented by I. Nucleotides in the fragment complementary to the detector probe are represented by f (boldface). Other nucleotides in the hai ⁇ in primer (that is, nucleotides that are neither part of the stem nor part of the primer sequence) are represented by n.
  • Figures 3A-3C are a diagram of nucleic acid molecules used and formed during an example of the disclosed method using specific sequences. Ligation of the top strand of the amplified fragment is illustrated. Nucleotides in the fragment complementary to the detector probe are boldface.
  • hai ⁇ in primer SEQ ID NO:2
  • nucleic acid fragment SEQ ID NO:3
  • the hai ⁇ in primer hybridized to bottom strand of the nucleic acid fragment
  • amplified nucleic acid fragment SEQ ID NO:4
  • hai ⁇ in structure formed in the top strand of the amplified nucleic acid fragment and the amplified nucleic acid strand ligated to a detector probe (SEQ ID NO:32).
  • the molecules and structures of Figures 3A-3C can be directly compared with those of Figure 2 to identify sequences in Figures 3 A-3C having particular significance.
  • Figures 4A-4B are a diagram examples of an amplified fragment (SEQ ID NO:4), the hai ⁇ in structures that can be formed from the hai ⁇ in ligators in the fragment strands, and the detector probes to which the hai ⁇ in ligators can be ligated.
  • the diagram illustrates the relationship of an amplified fragment to the formation of 5' hai ⁇ in structures and 3' hai ⁇ in structures and the relationship of the polarity of a hai ⁇ in structure and the polarity of the detector probe to which it can be ligated.
  • Figure 5 is a diagram of an example of the disclosed method where hai ⁇ in primers are used to prime amplification of both strands of a nucleic acid molecule.
  • FIGS. 6 A-6C are a diagram of nucleic acid molecules used and formed during an example of the disclosed method using adaptor-indexers. Ligation of the top strand of the amplified fragment is illustrated. The restriction enzyme recognition sequence is underlined and the sticky end sequence is in bold. The fragment (SEQ ID NO:5) is shown at the top of the diagram.
  • nucleic acid molecule after cleavage with Fokl Depicted in order from top to bottom are the nucleic acid molecule after cleavage with Fokl; the nucleic acid fragment (left) and an example of a compatible adaptor-indexer (SEQ ID NO:6; right); the adaptor-indexer ligated to the nucleic acid fragment (SEQ ID NO:7); the hai ⁇ in primer (SEQ ID NO:8) hybridized to the top strand of the adaptor/fragment (nucleotides 13-47 of SEQ ID NO:7); the fragment after amplification (SEQ ID NO:9); the hai ⁇ in structure formed by the bottom strand of the amplified fragment; the hai ⁇ in structure mixed with the probe array (showing the relevant detector probe); and the fragment ligated to the probe array (SEQ ID NO:31).
  • the fragment sequence determined in this example is GGATGNNNTTAGCATACC (SEQ ID NO:l).
  • the disclosed method allows a complex sample of nucleic acid to be quickly and easily cataloged in a reproducible and sequence-specific manner.
  • a catalog can be compared with other, similarly prepared catalogs of other nucleic acid samples to allow convenient detection of differences between the samples.
  • the catalogs which inco ⁇ orate information about the nucleic acid samples, can serve as finge ⁇ rints of the nucleic acid samples which can be used both for detection of related nucleic acid samples and comparison of nucleic acid samples.
  • the presence or identity of specific organisms can be detected by producing a catalog of nucleic acid of the test organism and comparing the resulting catalog with reference catalogs prepared from known organisms.
  • Changes and differences in gene expression patterns can also be detected by preparing catalogs of mRNA from different cell samples and comparing the catalogs.
  • the catalog of sequences can also be used to produce a set of probes or primers that is specific for the source of a nucleic acid sample.
  • Comparison of nucleic acid catalogs produced with the disclosed method is facilitated by the highly ordered nature of the sequence information produced and cataloged in the method.
  • Use of immobilization, sorting, and/or array detection in the method allows automation of the method, the cataloging of the information, and comparisons to other catalogs.
  • the method results in the equivalent of a large number of sequence-specific bins that can be filled, empty, or filled to different levels, with the pattern of filled and empty bins, and/or the amount of signal in a bin, providing information about the nucleic acid sample that has been cataloged.
  • the disclosed method also allows specific and sensitive detection of nucleic acid fragments of interest.
  • the use of sequence-based covalent coupling in the detection increases the reliability of detection over detection methods based only on probe hybridization.
  • the disclosed method is also more efficient and less time consuming than conventional nucleic acid sequencing techniques.
  • the nucleic acid sample is preferably divided into aliquots (referred to as index samples) before amplification.
  • index samples aliquots
  • the nucleic acid sample is divided into as many aliquots as the number of primer sequences used.
  • Preferred nucleic acid samples for use in the disclosed method are samples to which adapter-indexers have been coupled.
  • the nucleic acid sample is preferably not divided into index samples.
  • Each index sample is then mixed with a different hai ⁇ in primer, each of which has a different primer sequence.
  • a second primer is also mixed with each index sample. It is preferred that the second primer not be a hai ⁇ in primer.
  • the index samples are then amplified.
  • the index samples are treated to allow formation of hai ⁇ in structures at the fragment ends containing hai ⁇ in primer sequences. This is preferably accomplished by digesting one of the strands of the amplified fragments.
  • the index samples are reacted with and coupled to detector probes. It is preferred that the probes include every possible sequence of a given length (for example, every possible six base sequence).
  • the ends of the detector probes and the hai ⁇ in ends are coupled only if the probe hybridizes adjacent to the end of the hai ⁇ in ligator.
  • each index sample is reacted with a different probe array. Coupling can be accomplished using any suitable technique, including ligation and chemical reactions. Ligation is preferred. When coupling is by ligation, there should be a 5 '-phosphate capable of participating in ligation on the appropriate strand.
  • Each processed DNA fragment from the sample will result in a signal based on coupling of an amplified fragment to a probe.
  • a complex nucleic acid sample will produce a unique pattern of signals. It is this pattern that allows unique cataloging of nucleic acid samples and sensitive and powerful comparisons of the patterns of signals produced from different nucleic acid samples.
  • the detector probe to which a DNA fragment is coupled identifies the sequence of the DNA fragment to which the primer hybridized and the adjacent sequence of the DNA fragment to which the detector probe hybridized.
  • Coupling of amplified fragments to probes can be detected directly or indirectly.
  • any of the probe or the amplified fragment can be detected.
  • Association of an amplified fragment with a given probe is indicative of coupling of the probe and the amplified fragment. Detection of such associations can be facilitated through immobilization of the probes or hai ⁇ in primers, and through the use of capture tags, sorting tags and detectable labels in association with the probes, hai ⁇ in primers, and/or amplified fragments. Any combination of immobilization and association with capture tags, sorting tags, and labels can be used.
  • the probes are immobilized in arrays and the amplified fragments are associated with a detectable label.
  • detection of a signal at a particular location in a particular array of detector probes can provide information about nucleic acid fragments indexed from the nucleic acid sample.
  • the array, and location in the array, where a DNA fragment generates a signal identify the sequence of the DNA fragment.
  • the same effect can be accomplished by otherwise capturing, sorting, or detecting particular probes (via capture tags, sorting tags, and labels). That is, so long as the probe and the DNA fragment coupled to it can be identified, a pattern can be determined.
  • a preferred form of the disclosed method uses nucleic acid fragments to which adapter-indexers have been covalently coupled for amplification using hai ⁇ in primers.
  • the manner in which the adaptor-indexers are coupled to nucleic acid fragments results in indexing of different fragments and preservation of sequence information about the fragments.
  • Adaptor-indexes are coupled to nucleic acid fragments using the following basic steps.
  • a nucleic acid sample is cleaved with one or more nucleic acid cleaving reagents (preferably restriction endonucleases) that results in a set of DNA fragments having sticky ends with a variety of sequences.
  • the sample may also be divided into aliquots (referred to as index samples); preferably as many aliquots as there are sticky end sequences.
  • the nucleic acid sample is preferably divided into index samples before digestion. Where a single nucleic acid cleaving reagent is used, the nucleic acid sample is preferably divided into index samples following digestion. Each index sample is then mixed with a different adaptor-indexer, each of which has a sticky end compatible with one of the possible sticky ends on the DNA fragments in that index sample. The adaptor-indexes are then covalently coupled to compatible DNA fragments.
  • Each index sample can then be cleaved with one or more other nucleic acid cleaving reagents (referred to as second nucleic acid cleaving reagents), preferably a restriction enzyme having a four base recognition sequence.
  • a second adaptor can then be covalently coupled to the DNA fragments in the index samples.
  • the DNA fragments are then amplified using hai ⁇ in primers as described above. For this form of the method, it is preferred that the primer sequences of the hai ⁇ in primers are complementary to sequences in the adaptor- indexers.
  • nucleic acid sample can be used with the disclosed method.
  • suitable nucleic acid samples include genomic samples, mRNA samples, cDNA samples, nucleic acid libraries (including cDNA and genomic libraries), whole cell samples, environmental samples, culture samples, tissue samples, bodily fluids, and biopsy samples. Numerous other sources of nucleic acid samples are known or can be developed and any can be used with the disclosed method.
  • Preferred nucleic acid samples for use with the disclosed method are nucleic acid samples of significant complexity such as genomic samples, cDNA samples, and mRNA samples.
  • Nucleic acid fragments are segments of larger nucleic molecules. Nucleic acid fragments, as used in the disclosed method, generally refer to nucleic acid molecules that have been amplified or that have been cleaved. A nucleic acid sample that has been amplified is referred to as an amplified sample. A nucleic acid sample that has been cleaved using a nucleic acid cleaving reagent is referred to as a digested sample.
  • index sample is a nucleic acid sample that has been divided into different aliquots for further processing.
  • index samples are preferably aliquots of a nucleic acid sample to which different hai ⁇ in primers will be added.
  • different nucleic acid fragments are processed in the different index samples based on the primer sequences of the hai ⁇ in primers.
  • nucleic acid samples be divided into as many index samples as the number of hai ⁇ in primers used for amplification.
  • a control nucleic acid sample is a nucleic acid sample to which another nucleic acid sample (which can be referred to as a tester nucleic acid sample) is to be compared.
  • a control index sample is an index sample to which another index sample (which can be referred to as a tester index sample) is to be compared.
  • Secondary index samples are aliquots of index samples.
  • index samples can be divided into a plurality of secondary index samples.
  • Secondary index samples are to be cleaved with a nucleic acid cleaving reagent, preferably a restriction enzyme.
  • Restricted index samples and non-restricted index samples are aliquots of index samples. Restricted index samples are to be cleaved with a nucleic acid cleaving reagent while non-restricted index samples are not.
  • Restricted secondary index samples and non-restricted secondary index samples are aliquots of secondary index samples. Restricted secondary index samples are to be cleaved with a nucleic acid cleaving reagent while non-restricted secondary index samples are not.
  • derivative index samples Secondary index samples, restricted index samples, non-restricted index samples, restricted secondary index samples, and non-restricted secondary index samples are referred to collectively herein as derivative index samples. Each is derived from an index sample and, in some cases, from another derivative index sample. Hairpin Primers
  • a hai ⁇ in primer is a nucleic acid molecule that contains a primer sequence and that can form a stem-loop or hai ⁇ in structure.
  • hai ⁇ in structures and stem-loop structures are referred to herein as hai ⁇ in structures.
  • the base paired portion of a hai ⁇ in structure is referred to as the stem of the hai ⁇ in structure.
  • Hai ⁇ in primers are used in the disclosed method as specialized amplification primers that, following amplification, can form a hai ⁇ in structure at the end on amplified nucleic acid fragments.
  • the hai ⁇ in is designed to allow sequence-specific covalent coupling of a detector probe to the end of the hai ⁇ in based on the adjacent sequence of the amplified fragment.
  • the primer sequence of a hai ⁇ in primer is at the 3' end of the hai ⁇ in primer.
  • the stem of a hai ⁇ in primer can involve all or part of the primer sequence. Although it is preferred, the stem need not extend to the 3' end ofthe primer sequence.
  • the stem can also extend into the sequence ofthe amplified fragment. It is preferred that the stem of a hai ⁇ in primer involves all ofthe primer sequence without extending into the sequence ofthe amplified fragment.
  • the primer sequence ofthe hai ⁇ in primers be complementary to sequences in the adaptor-indexer.
  • the stem of a hai ⁇ in primer can involve all or part ofthe sticky end sequence (or recognition sequence) for which the adaptor-indexer is designed. Although it is preferred, the stem need not extend to the 3' end ofthe sticky end sequence (or recognition sequence). The stem can also extend into the sequence ofthe amplified fragment beyond the sticky end sequence (or recognition sequence). It is preferred that the stem of a hai ⁇ in primer involves all ofthe sticky end sequence (or recognition sequence) without extending further into the sequence of the amplified fragment.
  • hai ⁇ in structures of hai ⁇ in primers and their relationships to amplified nucleic acids are illustrated in Figures 1A-1B.
  • Hai ⁇ in primers 1 and 4-9 are examples of hai ⁇ in primers where the stem extends to the end ofthe primer sequence.
  • Hai ⁇ in primer 2 is an example of a hai ⁇ in primer where the stem does not extend to the end ofthe primer sequence.
  • Hairpin primer 3 is an example of a hairpin primer where the stem extends into the sequence of the amplified fragment.
  • Hai ⁇ in primer 9 is an example of a hai ⁇ in primer where the stem involves all ofthe primer sequence.
  • Hai ⁇ in primers 1-8 are examples of hai ⁇ in primers where the stem does not involve all ofthe primer sequence.
  • Hai ⁇ in primers 1-5 are examples of hai ⁇ in primers where the stem is 10 base pairs long.
  • Hai ⁇ in primer 6 is an example of a hai ⁇ in primer where the stem is 12 base pairs long.
  • Hai ⁇ in primer 7 is an example of a hai ⁇ in primer where the stem is 8 base pairs long.
  • Hai ⁇ in primer 8 is an example of a hai ⁇ in primer where the stem is 3 base pairs long.
  • Hai ⁇ in primer 9 is an example of a hai ⁇ in primer where the stem is 16 base pairs long.
  • Amplification using hai ⁇ in primers results in amplified nucleic acid fragments having hai ⁇ in primer sequences at one or both ends ofthe fragments.
  • hai ⁇ in primer sequences in amplified fragments are referred to as hai ⁇ in ligators.
  • the hai ⁇ in ligators can form hai ⁇ in structures. A hai ⁇ in structure with a 3 ' end is refe ⁇ ed to as a 3 ' hai ⁇ in structure and a hai ⁇ in structure with a 5' end is refe ⁇ ed to as a 5' hai ⁇ in structure (hai ⁇ in ligators containing these structures are refe ⁇ ed to as 3 ' hai ⁇ in ligators and 5 ' hai ⁇ in ligators, respectively).
  • the stem of a hai ⁇ in structure can have any length that allows formation of the hai ⁇ in structure and which is of sufficient stability to allow covalent coupling of a detector probe.
  • the stem of the hai ⁇ in structure of a hai ⁇ in ligator is from 3 to 16 base pairs long, and more preferably from 6 to 10 base pairs long.
  • the sequence ofthe stem portion of a hai ⁇ in primer should not include the recognition sequence of any nucleic acid cleaving reagent to be used in a subsequent step in the method.
  • inclusion of restriction sites in hai ⁇ in primers is useful in some embodiments ofthe disclosed method.
  • hybridization ofthe fragments to detector probes can be aided by shortening the fragment length prior to hybridization. This can be accomplished, for example, by digesting the fragment with a restriction endonuclease or other nucleic acid cleaving reagent.
  • the recognition site for the nucleic acid cleaving reagent is included in the sequence ofthe hai ⁇ in primer.
  • the nucleic acid cleaving reagent used has a cleavage site offset from the recognition site.
  • An example of such a nucleic acid cleaving reagent is the restriction enzyme EcoP15I.
  • Hai ⁇ in primers can contain labile nucleotides, preferably in the loop, that allow the hai ⁇ in structure to be broken.
  • uracil rather than thymine can be used in hai ⁇ in primers (phosphoramidite chemicals available from Glenn Research).
  • UDG uracil-DNA glycosylase
  • hai ⁇ in primers not have additional sequences that are self-complementary, other than the self-complementary stem portion. It is considered that this condition is met if there are no complementary regions greater than six nucleotides long without a mismatch or gap. While the hai ⁇ in primers (and amplified nucleic acid fragments) can be detected using sequence-based detection systems, the hai ⁇ in primers (or amplified nucleic acid fragments) can also contain a label to facilitate detection. Numerous labels are known and can be used for this pu ⁇ ose. Hai ⁇ in primers can also contain or be associated with capture tags to facilitate immobilization or capture of fragments in which hai ⁇ in primers have been inco ⁇ orated.
  • the capture tag can be one member of a binding pair such as biotin and streptavidin. Capture tags are discussed more fully elsewhere herein.
  • Hai ⁇ in primers can also contain or be associated with sorting tags to facilitate sorting or separation of fragments in which hai ⁇ in primers have been inco ⁇ orated.
  • the sorting tag can be a detectable label such as a fluorescent moiety or a manipulable moiety such as a magnetic bead. Sorting tags are discussed more fully elsewhere herein. Hai ⁇ in primers can also be immobilized on a substrate.
  • Hai ⁇ in primers can also include a few phosphorothioate linkages or other non-hydrolyzable bonds at the 5' end to protect the strand ofthe amplified fragment containing the hai ⁇ in primer from exonuclease digestion. This allows one ofthe strands ofthe amplified fragments to be degraded. Hai ⁇ in primers can also include one or more photocleavable nucleotides to facilitate release of probe sequences and amplified fragments coupled to the probe. Photocleavable nucleotides and their use are described in WO 00/04036.
  • Hai ⁇ in primers need not be composed of naturally occu ⁇ ing nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the primer have the general structure described herein and be capable ofthe interactions and reactions required in the disclosed method. Detector Probes
  • Detector probes are molecules, preferably ohgonucleotides, that can hybridize to nucleic acids in a sequence-specific manner.
  • detector probes are used to capture nucleic acid fragments amplified using the disclosed hai ⁇ in primers based on complementary sequences present in the amplified nucleic acid fragments.
  • Detector probes are preferably used in sets having a variety of probe sequences, preferably a set of probes having every possible combination (or hybridizable to every combination) of nucleotide sequence the length ofthe probe.
  • Detector probes are preferably used in sets where each probe has the same length. Prefe ⁇ ed lengths for the probe portion of detector probes are five, six, seven, and eight nucleotides.
  • Detector probes preferably include a probe portion (for hybridization to sample fragments) and linker portions through which the probe portion is coupled to a substrate, capture tag, sorting tag, or label. These linker portions can have any suitable structure and will generally be chosen based on the method of immobilization or synthesis ofthe detector probes.
  • the linker portion can be made up of or include nucleotides.
  • the linker portions can have any suitable length and preferably are of sufficient length to allow the probe portion to hybridize effectively. For convenience and unless otherwise indicated, reference to the length of detector probes refers to the length ofthe probe portion ofthe probes.
  • Immobilized detector probes are detector probes immobilized on a support. Detector probes can be, and preferably are, immobilized on a substrate.
  • Detector probes can also contain or be associated with capture tags to facilitate immobilization or capture ofthe probes and amplified fragments to which they have been coupled. Detector probes can also contain or be associated with sorting tags to facilitate sorting or separation ofthe probes and amplified fragments to which they have been coupled. Detector probes can also contain or be associated with labels to facilitate detection ofthe probes and amplified fragments to which they have been coupled.
  • Detector probes can also include one or more photocleavable nucleotides to facilitate release of probe sequences and amplified fragments coupled to the probe. Photocleavable nucleotides and their use are described in WO 00/04036. Detector probes need not be composed of naturally occurring nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the probe have the general structure described herein and be capable ofthe interactions and reactions required in the disclosed method.
  • Different detector probes can be used together as a set.
  • the set can be used as a mixture of all or subsets ofthe probes, probes used separately in separate reactions, or immobilized in an array.
  • Probes used separately or as mixtures can be physically separable through, for example, the use of capture tags, sorting tags, or immobilization on beads.
  • a probe a ⁇ ay (also refe ⁇ ed to herein as an a ⁇ ay) includes a plurality of probes immobilized at identified or predetermined locations on the a ⁇ ay.
  • a plurality of probes refers to a multiple probes each having a different sequence.
  • Each predetermined location on the a ⁇ ay has one type of probe (that is, all the probes at that location have the same sequence). Each location will have multiple copies ofthe probe.
  • the spatial separation of probes of different sequence in the a ⁇ ay allows separate detection and identification of amplified fragments that become coupled to the probes via hybridization of the probes to nucleic acid fragments in a nucleic acid sample. If an amplified fragment is detected at a given location in a probe a ⁇ ay, it indicates that the sequence adjacent to the site in the nucleic acid fragment where the fragment hybridized is complementary to the probe immobilized at that location in the a ⁇ ay.
  • Adaptor-indexers can also be immobilized in a ⁇ ays. Different modes of the disclosed method can be performed with different components immobilized, labeled, or tagged.
  • Arrays of adaptor-indexers can be made and used as described below and elsewhere herein for the detector probes.
  • the detector probes in a probe a ⁇ ay will all be ofthe same polarity. That is, each probe will have a free 5' end or each probe will have a free 3' end.
  • the polarity of a probe determines to which form of hai ⁇ in structure the probe can be coupled.
  • a probe a ⁇ ay with probes having 5' ends is refe ⁇ ed to as a 5' probe a ⁇ ay.
  • a probe a ⁇ ay with probes having 3' ends is refe ⁇ ed to as a 3' probe a ⁇ ay.
  • a probe a ⁇ ay can also have probes of both polarities. If so, it is prefe ⁇ ed that probes of different polarities be immobilized at identified or predetermined locations on the probe a ⁇ ay.
  • Solid-state substrates for use in probe a ⁇ ay can include any solid material to which ohgonucleotides can be coupled, directly or indirectly. This includes materials such as acrylamide, cellulose, nitrocellulose, glass, silicon, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, glass, polysilicates, polycarbonates, teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polylactic acid, polyorthoesters, polypropylfumerate, collagen, glycosaminoglycans, and polyamino acids.
  • materials such as acrylamide, cellulose, nitrocellulose, glass, silicon, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, glass, polysilicates, polycarbonates, teflon, fluorocarbons, nylon, silicon rubber, poly
  • Solid-state substrates can have any useful form including thin films or membranes, beads, bottles, dishes, fibers, woven fibers, shaped polymers, particles and microparticles.
  • a prefe ⁇ ed form for a solid-state substrate is a microtiter dish. The most prefe ⁇ ed form of microtiter dish is the standard 96-well type.
  • Detector probes can be coupled to substrates using established coupling methods. For example, suitable attachment methods are described by Pease et al., Proc. Natl. Acad. Sci. USA 91(11):5022-5026 (1994), and Khrapko et al, Mol Biol (Mosk) (USSR) 25:718-730 (1991).
  • a method for immobilization of 3'-amine ohgonucleotides on casein-coated slides is described by Stimpson et al, Proc. Natl Acad. Sci. USA 92:6379-6383 (1995).
  • a prefe ⁇ ed method of attaching ohgonucleotides to solid-state substrates is described by Guo et al, Nucleic Acids Res. 22:5456-5465 (1994).
  • the probes in a ⁇ ays can also be designed to have similar hybrid stability. This would make hybridization of fragments to detector probes more efficient and reduce the incidence of mismatch hybridization.
  • the hybrid stability of probes can be calculated using known formulas and principles of thermodynamics (see, for example, Santa Lucia et al., Biochemistry 35:3555- 3562 (1996); Freier et al., Proc. Natl. Acad. Sci. USA 83:9373-9377 (1986); Breslauer et al., Proc. Natl. Acad. Sci. USA 83:3746-3750 (1986)).
  • hybrid stability ofthe probes can be made more similar (a process that can be refe ⁇ ed to as smoothing the hybrid stabilities) by, for example, chemically modifying the probes (Nguyen et al., Nucleic Acids Res. 25(15):3059-3065 (1997); Hohsisel, Nucleic Acids Res. 24(3):430-432 (1996)).
  • Hybrid stability can also be smoothed by carrying out the hybridization under specialized conditions (Nguyen et al., Nucleic Acids Res. 27(6): 1492-1498 (1999); Wood et al., Proc. Natl. Acad. Sci. USA 82(6):1585-1588 (1985)).
  • hybrid stability of the probes is to vary the length ofthe probes. This would allow adjustment ofthe hybrid stability of each probe so that all ofthe probes had similar hybrid stabilities (to the extent possible). Since the addition or deletion of a single nucleotide from a probe will change the hybrid stability ofthe probe by a fixed increment, it is understood that the hybrid stabilities ofthe probes in a probe a ⁇ ay will not be equal. For this reason, similarity of hybrid stability as used herein refers to any increase in the similarity ofthe hybrid stabilities ofthe probes (or, put another way, any reduction in the differences in hybrid stabilities ofthe probes). This is useful since any such increased similarity in hybrid stability can improve the efficiency and fidelity of hybridization and coupling of the detector probes.
  • the efficiency of hybridization and coupling of detector probes to sample fragments can also be improved by grouping detector probes of similar hybrid stability in sections or segments of a probe a ⁇ ay that can be subjected to different hybridization conditions. In this way, the hybridization conditions can be optimized for particular classes of probes.
  • a second primer is a nucleic acid molecule that contains a primer sequence.
  • the primer sequence of a second primer is at the 3' end.
  • a second primer differs from a hai ⁇ in primer in that a second primer is not designed to form a hai ⁇ in structure.
  • Second primers are used to amplify the opposite strand of nucleic acid fragments when the amplification technique requires a second primer (and when a second hai ⁇ in primer is not used to amplify the opposite strand). Where fragments containing second adaptors are amplified, it is prefe ⁇ ed that the primer sequence ofthe second primers (or the second hai ⁇ in primers, if used) be complementary to sequences in the second adaptor.
  • Second primers can also contain detector sequences 5' ofthe primer sequences. Such detector sequences can be used to facilitate detection of nucleic acid fragments amplified in the disclosed method. Detector sequences can have any arbitrary sequence, preferably sequences that do not interfere with operation ofthe method. For example, it is prefe ⁇ ed that detector sequences be chosen that are not significantly complementary to sequences in the second primer or sequences in hai ⁇ in primers or other second primers. Detector sequences are preferably the same. Also prefe ⁇ ed are sets of second primers where the detector sequences within a set are the same but which differ between sets.
  • Second primers can also contain or be associated with capture tags to facilitate immobilization or capture of fragments in which second primers have been inco ⁇ orated. Capture tags are discussed more fully elsewhere herein. Second primers can also contain or be associated with sorting tags to facilitate sorting or separation of fragments in which second primers have been inco ⁇ orated. Sorting tags are discussed more fully elsewhere herein. Second primers can also contain or be associated with labels to facilitate detection of fragments in which second primers have been inco ⁇ orated. Second primers can also be immobilized on a substrate.
  • Second primers can also include one or more photocleavable nucleotides to facilitate release of second primer sequences for detection. Photocleavable nucleotides and their use are described in WO 00/04036.
  • Second primers need not be composed of naturally occurring nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the second primer have the general structure described herein and be capable of the interactions and reactions required in the disclosed method.
  • Labels To aid in detection and quantitation of fragments coupled to detector probes, labels can be inco ⁇ orated into, coupled to, or associated with hai ⁇ in primers, second primers, detector probes, and/or the fragments.
  • a label is any molecule that can be associated with nucleic acid fragments, directly or indirectly, and which results in a measurable, detectable signal, either directly or indirectly.
  • a label is associated with a component when it is coupled or bound, either covalently or non-covalently, to the component.
  • a label is coupled to a component when it is covalently coupled to the component.
  • Many suitable labels for inco ⁇ oration into, coupling to, or association with nucleic acid are known. Examples of labels suitable for use in the disclosed method are radioactive isotopes, fluorescent molecules, phosphorescent molecules, bio luminescent molecules, enzymes, antibodies, and ligands.
  • fluorescent labels examples include fluorescein (FITC), 5,6- carboxymethyl fluorescein, Texas red, nitrobenz-2-oxa-l,3-diazol-4-yl (NBD), coumarin, dansyl chloride, rhodamine, 4'-6-diamidino-2-phenylinodole (DAPI), and the cyanine dyes Cy3, Cy3.5, Cy5, Cy5.5 and Cy7.
  • FITC fluorescein
  • NBD nitrobenz-2-oxa-l,3-diazol-4-yl
  • DAPI 4'-6-diamidino-2-phenylinodole
  • cyanine dyes Cy3, Cy3.5, Cy5, Cy5.5 and Cy7 Prefe ⁇ ed fluorescent labels are fluorescein (5-carboxyfluorescein-N-hydroxysuccinimide ester) and rhodamine (5,6-tetramethyl rhodamine).
  • Prefe ⁇ ed fluorescent labels for simultaneous detection are FITC and the cyanine dyes Cy3, Cy3.5, Cy5, Cy5.5 and Cy7.
  • the abso ⁇ tion and emission maxima, respectively, for these fluors are: FITC (490 nm; 520 nm), Cy3 (554 ran; 568 nm), Cy3.5 (581 nm; 588 nm), Cy5 (652 nm: 672 nm), Cy5.5 (682 nm; 703 nm) and Cy7 (755 nm; 778 nm), thus allowing their simultaneous detection.
  • the fluorescent labels can be obtained from a variety of commercial sources, including Molecular Probes, Eugene, OR and Research Organics, Cleveland, Ohio.
  • Labeled nucleotides are prefe ⁇ ed form of label since they can be directly inco ⁇ orated into nucleic acids during synthesis.
  • labels that can be inco ⁇ orated into DNA or RNA include nucleotide analogs such as BrdUrd (Hoy and Schimke, Mutation Research 290:217-230 (1993)), BrUTP (Wansick et al, J. Cell Biology 122:283-293 (1993)) and nucleotides modified with biotin (Langer et al, Proc. Natl. Acad. Sci. USA 78:6633 (1981)) or with suitable haptens such as digoxygenin (Kerkhof, Anal. Biochem.
  • Suitable fluorescence-labeled nucleotides are Fluorescein-isothiocyanate-dUTP, Cyanine-3-dUTP and Cyanine-5-dUTP (Yu et al, Nucleic Acids Res., 22:3226- 3232 (1994)).
  • a prefe ⁇ ed nucleotide analog detection label for DNA is BrdUrd (BUDR triphosphate, Sigma), and a prefe ⁇ ed nucleotide analog detection label for RNA is Biotin- 16-uridine-5'-triphosphate (Biotin-16-dUTP, Boehringher Mannheim). Fluorescein, Cy3, and Cy5 can be linked to dUTP for direct labeling. Cy3.5 and Cy7 are available as avidin or anti-digoxygenin conjugates for secondary detection of biotin- or digoxygenin-labeled probes.
  • Biotin can be detected using streptavidin-alkaline phosphatase conjugate (Tropix, Inc.), which is bound to the biotin and subsequently detected by chemiluminescence of suitable substrates (for example, chemiluminescent substrate CSPD: disodium, 3-(4-methoxyspiro-[l,2,-dioxetane-3-2'-(5'- chloro)tricyclo [3.3.1.1 ' ]decane]-4-yl) phenyl phosphate; Tropix, Inc.).
  • suitable substrates for example, chemiluminescent substrate CSPD: disodium, 3-(4-methoxyspiro-[l,2,-dioxetane-3-2'-(5'- chloro)tricyclo [3.3.1.1 ' ]decane]-4-yl
  • labels include molecular or metal barcodes, mass labels, and labels detectable by nuclear magnetic resonance, electron paramagnetic resonance, surface enhanced raman scattering, surface plasmon resonance, fluorescence, phosphorescence, chemiluminescence, resonance raman, microwave, or a combination.
  • Mass labels are compounds or moieties that have, or which give the labeled component, a distinctive mass signature in mass spectroscopy. Mass labels are useful when mass spectroscopy is used for detection. Prefe ⁇ ed mass labels are peptide nucleic acids and carbohydrates. Combinations of labels can also be useful. For example, color-encoded microbeads having, for example, 265 unique combinations of labels, are useful for distinguishing numerous components.
  • 256 different detector probes can be uniquely labeled and detected allowing mutiplexing and automation ofthe disclosed method.
  • Useful labels are described in de Haas et al., "Platinum po ⁇ hyrins as phosphorescent label for time-resolved microscopy," J Histochem. Cytochem. 45(9): 1279-92 (1997); Karger and Gesteland, "Digital chemiluminescence imaging of DNA sequencing blots using a charge-coupled device camera," Nucleic Acids Res. 20(24):6657-65 (1992); Keyes et al., “Overall and internal dynamics of DNA as monitored by five-atom-tethered spin labels," Biophys. J. 72(l):282-90 (1997); Kirschstein et al., “Detection ofthe DeltaF508 mutation in the CFTR gene by means of time- resolved fluorescence methods,"
  • Metal barcodes a form of molecular barcode, are 30-300 nm diameter by
  • the system can have up to 12 zones encoded, in up to 7 different metals, where the metals have different reflectivity and thus appear lighter or darker in an optical microscope depending on the metal; this leads to practically unlimited identification codes.
  • the metal bars can be coated with glass or other material, and probes attached to the glass using methods commonly known in the art; assay readout is by fluorescence from the target, and the identity ofthe probe is from the light dark pattern ofthe barcode. Methods for detecting and measuring signals generated by labels are known.
  • radioactive isotopes can be detected by scintillation counting or direct visualization; fluorescent molecules can be detected with fluorescent spectrophotometers; phosphorescent molecules can be detected with a spectrophotometer or directly visualized with a camera; enzymes can be detected by detection or visualization ofthe product of a reaction catalyzed by the enzyme; antibodies can be detected by detecting a secondary detection label coupled to the antibody.
  • detection molecules are molecules which interact with amplified nucleic acid and to which one or more detection labels are coupled.
  • labels can be distinguished temporally via different fluorescent, phosphorescent, or chemiluminescent emission lifetimes. Multiplexed time-dependent detection is described in Squire et al., J. Microscopy 197(2):136-149 (2000), and WO ⁇ 00/08443.
  • Quantitative measurement ofthe amount or intensity of a label can be used. For example, quantitation can be used to determine if a given label, and thus the labeled component, is present at a threshold level or amount.
  • a threshold level or amount is any desired level or amount of signal and can be chosen to suit the needs ofthe particular form ofthe method being performed.
  • nucleic acid cleaving reagents are compounds, complexes, and enzymes that cause, mediate, or catalyze cleavage on nucleic acid molecules.
  • Prefe ⁇ ed nucleic acid cleaving reagents are those that cleave nucleic acid molecules in a sequence-specific manner.
  • Restriction enzymes also refe ⁇ ed to as restriction endonucleases are the prefe ⁇ ed form of nucleic acid cleaving reagents.
  • nucleic acid cleaving reagents include the universal restriction endonucleases of Szybalski (Szybalski, Gene 40(2-3): 169-73 (1985); Podhajska and Szybalski, Gene 40(2-3): 175-82 (1985)[published e ⁇ atum appears in Gene 43(3):325 (1985)]), the advanced DNA cleavage systems developed by Breaker et al. (Carmi et al., Proc Natl Acad Sci U S A 95(5):2233-2237 (1998)), and the use of zinc fingers to direct site recognition of restriction enzymes such as the hybrid restriction enzymes described by Kim et al., Proc. Natl. Acad. Sci. USA 93(3): 1156-1160 (1996), and Smith et al., Nucleic Acids Res. 27(2):674-681 (1999).
  • Szybalski Szybalski, Gene 40(2-3): 169-73 (1985); Podhajska and Szybalski, Gene 40(2-
  • nucleic acid cleaving reagents are known and can be used with the disclosed method. Relevant to the disclosed method, nucleic acid cleaving reagents generally have a recognition sequence and a cleavage site. Many nucleic acid cleaving reagents, especially restriction enzymes, also generate sticky ends at the cleavage site.
  • a recognition sequence is the nucleotide sequence which, if present in a nucleic acid molecule, will direct cleavage ofthe nucleic acid molecule by a cognate nucleic acid cleaving reagent.
  • the cleavage site of a nucleic acid cleaving reagent is the site, usually in relation to the recognition sequence, where the nucleic acid cleaving reagent cleaves a nucleic acid molecule.
  • Sticky ends also refe ⁇ ed to as cohesive ends, protruding ends, and 5' or 3' overhangs are single-stranded nucleic acid segments at the end of a double-stranded nucleic acid segment.
  • the nucleic acid cleaving reagents used will have certain properties and/or certain relationships to other restriction enzymes used in the method.
  • nucleic acid cleaving reagents that generates sticky ends having a plurality of different sequences are prefe ⁇ ed, with nucleic acid cleaving reagents having a cleavage site offset from the recognition sequence being most prefe ⁇ ed.
  • Other embodiments ofthe disclosed method require the use of different nucleic acid cleaving reagents that have different recognition sequences and/or generate different sticky ends than other nucleic acid cleaving reagents used on the same index sample at other stages in the method.
  • nucleic acid cleaving reagents used in each ofthe digests have a recognition sequence different from that ofthe nucleic acid cleaving reagents used in the other digests.
  • the known properties of nucleic acid cleaving reagents can be used to select or design appropriate nucleic acid cleaving reagents.
  • nucleic acid cleaving reagent cleaves DNA at a site different or offset from the recognition sequence
  • a variety of sticky ends having different sequences can be generated. This is because recognition sequences in nucleic acids can occur next to any sequence and therefore the site of cleavage can have any sequence.
  • restriction enzymes such as Type IIS restriction enzymes can be said to generate sticky ends having a plurality of different sequences.
  • digest, digestion, digested, and digesting refer generally to a cleavage reaction or the act of cleaving and is not intended to be limited to cleavage by a protein enzyme or by any particular mechanism.
  • restricted is intended to refer to any nucleic acid cleavage, not just cleavage by a restriction enzyme.
  • sequence-specific requires only some sequence specificity, not absolute sequence specificity. That is, nucleic acid cleaving reagents having a completely or partially defined recognition sequence are prefe ⁇ ed. Thus, nucleic acid cleaving reagents having some degeneracy in their recognition sequence are still considered sequence-specific.
  • a second nucleic acid cleaving reagent is a nucleic acid cleaving reagent used to digest a secondary index sample.
  • a third nucleic acid cleaving reagent is an nucleic acid cleaving reagent used to digest a restricted index sample or a restricted secondary index sample.
  • Second and third nucleic acid cleaving reagents are preferably Type II restriction endonucleases that cleave in the recognition sequence.
  • a second restriction enzyme is a restriction enzyme used to digest a secondary index sample.
  • a third restriction enzyme is an enzyme used to digest a restricted index sample or a restricted secondary index sample.
  • Second and third restriction enzymes are preferably Type II restriction endonucleases that cleave in the recognition sequence.
  • Type IIS enzymes as universal restriction endonuclease as described by Szybalski (Szybalski, Gene 40(2-3):169-73 (1985); Podhajska and Szybalski, Gene 40(2-3): 175-82 (1985)[published erratum appears in Gene 43(3):325 (1985)]).
  • Szybalski Single stranded or double stranded DNA can be cleaved at any arbitrary (but specific) site utilizing the structure described in combination with a Type IIS enzyme. More advanced DNA cleavage systems have been evolved by Breaker et al. (Carmi et al., Proc
  • Adaptor-indexers are double-stranded nucleic acids containing a single- stranded portion and a double-stranded portion. The single-stranded portion is at one end ofthe adaptor-indexer and constitutes a sticky end.
  • the sticky end is refe ⁇ ed to as the sticky end portion ofthe adaptor-indexer. It is preferable that the protruding single strand (sticky end) have two, three, four, or five nucleotides.
  • the double-stranded portion of adaptor-indexers may have any convenient sequence or length. In general, the sequence and length ofthe double-stranded portion is selected to be adapted to subsequent steps in the method. For example, sequences in the adaptor-indexer may be used for primer or probe hybridization. A main pu ⁇ ose of adaptor-indexers is to provide sequence for hybridization by a hai ⁇ in primer for amplification.
  • Adaptor-indexers can also include a detector portion which is designed to facilitate detection ofthe adaptor-indexer.
  • the detection portion can be, for example, a sequence that is a hybridization target or it can be a label or tag.
  • sequence ofthe double-stranded portion of an adaptor- indexer should not include the recognition sequence of any restriction enzyme to be used in a subsequent step in the method. It is prefe ⁇ ed that adaptor-indexers not have any sequences that are self-complementary. It is considered that this condition is met if there are no complementary regions greater than six nucleotides long without a mismatch or gap.
  • a set of adaptor-indexers for use in the disclosed method should include different adaptor-indexers where the single-stranded portion each have a different nucleotide sequence selected from combinations and permutations of the nucleotides A, C, G, and T. Where multiple nucleic acid cleaving reagents are used in the first digest, the single-stranded portion of each adaptor-indexer can have a different nucleotide sequence compatible with a sticky end sequence generated by one ofthe nucleic acid cleaving reagents.
  • adaptor-indexers in one set have different sequences, it is prefe ⁇ ed that they be ofthe same length to facilitate use ofthe set to index fragments produced by cleavage by one nucleic acid cleaving reagent. It is preferable that the members of a set of adaptor-indexers contain a double-stranded portion which is identical for each member ofthe set.
  • a prefe ⁇ ed set of indexing linker strands comprising: (a) at least two single-stranded first ohgonucleotides each having a common identical sequence, and a unique sequence of a length selected from 2, 3, 4 and 5 nucleotides selected from permutations and combinations of A, G, C and T nucleotides, at one end selected from a 3' end and a 5' end; and (b) a single stranded second ohgonucleotide whose sequence is complementary to the common sequence of the first ohgonucleotides such that, when hybridized with any one ofthe first ohgonucleotides, a double-stranded adaptor-indexer would result which includes an end having a sticky end with a unique sequence.
  • Adaptor-indexers can also contain or be associated with capture tags to facilitate immobilization or capture of fragments to which adaptor-indexers have been coupled.
  • the capture tag can be one member of a binding pair such as biotin and streptavidin. Capture tags are discussed more fully elsewhere herein.
  • Adaptor-indexers can also contain or be associated with sorting tags to facilitate sorting or separation of fragments to which adaptor-indexers have been coupled.
  • the sorting tag can be a detectable label such as a fluorescent moiety or a manipulable moiety such as a magnetic bead. Sorting tags are discussed more fully elsewhere herein.
  • Adaptor-indexers can also contain or be associated with labels to facilitate detection of fragments to which adaptor-indexers have been coupled. Adaptor-indexers can also be immobilized on a substrate.
  • Adaptor-indexers can also include a protruding end at the end opposite the sticky end. Such an end can be used as, for example, a hybridization target for a label to be associated with the adaptor-indexer (and thus can be considered the detection portion ofthe adaptor-indexer). Adaptor-indexers can also include one or more photocleavable nucleotides to facilitate release of adaptor-indexer sequences for detection. Photocleavable nucleotides and their use are described in WO 00/04036.
  • Adaptor-indexers need not be composed of naturally occurring nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the adaptor- indexer have the general structure described herein and be capable of the interactions and reactions required in the disclosed method. Second Adaptors
  • Second adaptors are double-stranded nucleic acids containing a single- stranded portion and a double-stranded portion.
  • the single-stranded portion is at one end ofthe second adaptor and constitutes a sticky end. It is preferable that the protruding single strand (sticky end) have two, three, four, or five nucleotides.
  • the double-stranded portion of second adaptor may have any convenient sequence or length. In general, the sequence and length ofthe double-stranded portion is selected to be adapted to subsequent steps in the method.
  • the second adaptors can provide sequence for primer hybridization of a second primer or second hai ⁇ in primer.
  • sequence composition and length for the double-stranded portion of second adaptors will generally be those that are useful for primer hybridization.
  • sequence ofthe double-stranded portion of a second adaptor should not include the recognition sequence of any nucleic acid cleaving reagent to be used in a subsequent step in the method. It is prefe ⁇ ed that second adaptors not have any sequences that are self-complementary. It is considered that this condition is met if there are no complementary regions greater than six nucleotides long without a mismatch or gap.
  • a set of second adaptors for use in the disclosed method can include different second adaptors where the single-stranded portion each have a different nucleotide sequence compatible with a sticky end sequence generated by one of the second restriction enzymes. It is preferable that the members of a set of second adaptors contain a double-stranded portion which is identical for each member ofthe set.
  • Second adaptors can also contain or be associated with capture tags to facilitate immobilization or capture of fragments to which second adaptors have been coupled. Second adaptors can also contain or be associated with sorting tags to facilitate sorting or separation of fragments to which second adaptors have been coupled. Second adaptors can also contain or be associated with labels to facilitate detection of fragments to which second adaptors have been coupled. Second adaptors can also be immobilized on a substrate. Capture Tags
  • a capture tag is any compound that can be used to separate compounds or complexes having the capture tag from those that do not.
  • a capture tag is a compound, such as a ligand or hapten, that binds to or interacts with another compound, such as ligand-binding molecule or an antibody. It is also prefe ⁇ ed that such interaction between the capture tag and the capturing component be a specific interaction, such as between a hapten and an antibody or a ligand and a ligand-binding molecule.
  • Prefe ⁇ ed capture tags described in the context of nucleic acid probes, are described by Syvnen et ⁇ /., Nucleic Acids Res., 14:5037 (1986).
  • Prefe ⁇ ed capture tags include biotin, which can be inco ⁇ orated into nucleic acids.
  • capture tags inco ⁇ orated into adaptor-indexers or second adaptors can allow sample fragments (to which the adaptors have been coupled) to be captured by, adhered to, or coupled to a substrate.
  • capture tags inco ⁇ orated into hai ⁇ in primers or second primers can allow sample fragments (into which the primers have been inco ⁇ orated) to be captured, adhered to, or coupled to a substrate.
  • capture tags inco ⁇ orated into hai ⁇ in primers or second primers can allow sample fragments (into which the primers have been inco ⁇ orated) to be captured, adhered to, or coupled to a substrate.
  • Such capture allows simplified washing and handling of the fragments
  • Capturing sample fragments on a substrate may be accomplished in several ways.
  • capture docks are adhered or coupled to the substrate.
  • Capture docks are compounds or moieties that mediate adherence of a sample fragment by binding to, or interacting with, a capture tag on the fragment.
  • Capture docks immobilized on a substrate allow capture ofthe fragment on the substrate. Such capture provides a convenient means of washing away reaction components that might interfere with subsequent steps.
  • Substrates for use in the disclosed method can include any solid material to which components ofthe assay can be adhered or coupled.
  • substrates include, but are not limited to, materials such as acrylamide, cellulose, nitrocellulose, glass, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, polysilicates, polycarbonates, teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polylactic acid, polyorthoesters, polypropylfumerate, collagen, glycosaminoglycans, and polyamino acids.
  • Substrates can have any useful form including thin films or membranes, beads, bottles, dishes, fibers, woven fibers, shaped polymers, particles and microparticles.
  • Prefe ⁇ ed forms of substrates are plates and beads.
  • the most prefe ⁇ ed form of beads are magnetic beads.
  • the capture dock is an ohgonucleotide. Methods for immobilizing and coupling ohgonucleotides to substrates are well established. For example, suitable attachment methods are described by Pease et al, Proc. Natl. Acad. Sci. USA 91(l l):5022-5026 (1994), and Khrapko et al, MolBiol (Mosk) (USSR) 25:718-730 (1991).
  • the capture dock is an anti-hybrid antibody. Methods for immobilizing antibodies to substrates are well established.
  • Immobilization can be accomplished by attachment, for example, to aminated surfaces, carboxylated surfaces or hydroxylated surfaces using standard immobilization chemistries.
  • attachment agents are cyanogen bromide, succinimide, aldehydes, tosyl chloride, avidin-biotin, photocrosslinkable agents, epoxides and maleimides.
  • a prefe ⁇ ed attachment agent is glutaraldehyde.
  • Antibodies can be attached to a substrate by chemically cross-linking a free amino group on the antibody to reactive side groups present within the substrate.
  • antibodies may be chemically cross-linked to a substrate that contains free amirio or carboxyl groups using glutaraldehyde or carbodiimides as cross-linker agents.
  • aqueous solutions containing free antibodies are incubated with the solid-state substrate in the presence of glutaraldehyde or carbodiimide.
  • the reactants can be incubated with 2% glutaraldehyde by volume in a buffered solution such as 0.1 M sodium cacodylate at pH 7.4.
  • Sorting Tags are any compound that can be used to sort or separate compounds or complexes having the sorting tag from those that do not. In general, all capture tags can be a sorting tag. Sorting tags also include compounds and moieties that can be detected and which can mediate the sorting of tagged components. Such forms of sorting tags are generally not also capture tags. For example, a fluorescent moiety can allow sorting of components tagged with the moiety from those that are not (or those with a different tag). However, such a fluorescent moiety does not necessarily have a suitable capture dock with which it can interact and be captured. Preferably, a sorting tag is a label, such as a fluorescent label, that can mediate sorting. Method
  • the disclosed method involves the following basic steps.
  • a nucleic acid sample is subjected to amplification using primers where at least one ofthe primers is a hai ⁇ in primer.
  • Nucleic acids in the sample are amplified to result in amplified nucleic acid fragment having hai ⁇ in primer sequences at one or both ends.
  • hai ⁇ in primer sequences in amplified fragments are refe ⁇ ed to as hai ⁇ in ligators.
  • the amplified fragments are treated to allow the hai ⁇ in ligators to form stem- loop or hai ⁇ in structures at the end of the amplified fragments.
  • the amplified fragments are then contacted with a plurality of detector probes and the amplified fragments are covalently coupled to probes via the hai ⁇ in ligator. Coupled fragments can then be detected. Since the sequence ofthe amplified fragment adjacent to the hai ⁇ in structure ofthe hai ⁇ in ligator determines the sequence ofthe detector probe to which the hai ⁇ in ligator is coupled, this adjacent sequence in the amplified fragment is identified by noting to which probe a given fragment is coupled. This identification is preferably accomplished by having probes of known sequence immobilized at known locations in the probe array.
  • a catalog of nucleic acid sequences in a nucleic acid sample can be created by using multiple hai ⁇ in primers, each with a different primer sequence, to amplify the nucleic acid sample. Multiple different nucleic acid fragments will be amplified with different sequences adjacent to the hai in structure ofthe hai ⁇ in ligator. The pattern of fragments on the probe array provides a catalog ofthe fragments that can then be compared with other nucleic acid samples.
  • the nucleic acid sample is preferably divided into aliquots (referred to as index samples) before amplification.
  • index samples aliquots
  • Each index sample is then mixed with a different hai ⁇ in primer, each of which has a primer sequence.
  • the hai ⁇ in primers then mediate amplification of different nucleic acid sequences (based on the sequence ofthe primer sequence).
  • Each index sample can be amplified with one or more second primers (in conjunction with a hai ⁇ in primer).
  • the hai ⁇ in primer amplifies one strand and the second primer amplifies the opposite strand.
  • All index samples are preferably amplified with the same second primer(s).
  • the index samples can be further divided into secondary index samples with each amplified with a different second primer or set of second primers. Amplified fragments in each index sample (or secondary index sample) would then have primer sequences at each end. The sequences of these primers can be used as primer binding sites for further amplification ofthe fragments, preferably once the fragments are coupled to detector probes.
  • differential coupling ofthe strands can be accomplished by the simple expedient of using a probe array with detector probes all ofthe same polarity —that is, detector probes all with 5' ends (in a 5' probe a ⁇ ay) or detector probes all with 3' ends (in a 3' probe a ⁇ ay). Only the fragment strand with compatible polarity can be coupled to the detector probe.
  • a hai ⁇ in structure with a 3' end is refe ⁇ ed to as a 3' hai ⁇ in structure and a hai ⁇ in structure with a 5' end is refe ⁇ ed as a 5' hai ⁇ in structure
  • hai ⁇ in ligators containing these structures are referred to as 3' hai ⁇ in ligators and 5' hai ⁇ in ligators, respectively.
  • Selective strand coupling can also be accomplished, for example, by digesting one ofthe strands with an exonuclease (detector probes ofthe co ⁇ ect polarity must still be used). Such digestion is also preferred since it reduces the chance for interference by the opposite strand during coupling to the detector probes.
  • both ends ofthe amplified fragments will have hai ⁇ in ligators (see Figure 5, bottom).
  • both strands will form both a 5' hai ⁇ in structure and a 3' hai ⁇ in structure and both stands can be coupled to detector probes.
  • detector probes By subjecting both strands of such fragments to both a 5' probe array and a 3' probe a ⁇ ay, both ends of both strands of each fragment can be detected and cataloged. This provides a maximum of information about the nucleic acid sample.
  • Each sample (or each index sample or derivative index sample) can be reacted with and coupled to an array of detector probes.
  • Preferred arrays include every possible sequence of a given length (for example, every possible six base sequence), although arrays containing fewer combinations can also be used. Such arrays are refe ⁇ ed to herein as probe arrays.
  • the ends ofthe detector probes and the hai ⁇ in ligator are coupled together only if the detector probe hybridizes adjacent to the end ofthe hai ⁇ in ligator.
  • a hai ⁇ in ligator is coupled to an detector probe on the a ⁇ ay only when a sequence complementary to the detector probe is present immediately adjacent to the end ofthe stem sequence in an amplified fragment.
  • each amplified fragment from the sample will result in a signal at a particular location in a particular array of detector probes.
  • the probe a ⁇ ay in which the signal for a given fragment is detected is determined by the primer sequence ofthe hai ⁇ in primer. Where multiple hai ⁇ in primers (having different primer sequences) are used, each different primer sequence is preferably processed in a separate index sample and a separate probe a ⁇ ay is preferably used for each index sample or derivative index sample.
  • the location in the probe array in which the signal for a given fragment is detected is determined by the sequence in the fragment immediately adjacent to the end of the stem sequence in the fragment since the detector probe must hybridize to this sequence in order to be coupled to the hai ⁇ in ligator ofthe fragment.
  • a complex nucleic acid sample will produce a unique pattern of signals on the probe a ⁇ ays. It is this pattern that allows unique cataloging of nucleic acid samples and sensitive and powerful comparisons ofthe patterns of signals produced from different nucleic acid samples.
  • hai ⁇ in primers provides a means for generating different subsets of fragments from a complex sample. Such a defined subset of molecules may be further resolved by additional amplification and indexing, or by any ofthe established techniques such as cloning, PCR amplification, or gel electrophoresis. Individual members ofthe class may be distinguished by identifying characteristics such as length, sequence, or restriction endonuclease maps.
  • sequence ofthe primers sequences of the hai ⁇ in ligators provides a means of indexing a large number of nucleic acid fragments. Detector probes of different sequence can be immobilized at different locations on the probe a ⁇ ay.
  • the sequence ofthe detector probes on the probe a ⁇ ay and the sequence of nucleic acid fragments in the index samples determine where on the probe a ⁇ ay hai ⁇ in ligators (and thus, fragments) become coupled.
  • the presence of hai ⁇ in ligators at different locations in the probe a ⁇ ays thus forms a pattern of signals that provides a signature or fmge ⁇ rint of a nucleic acid sample based on the presence or absence of specific nucleic acid sequences in the sample. For this reason, cataloging of this pattern of signals (that is, the pattern ofthe presence of hai ⁇ in ligators) is an embodiment ofthe disclosed method that is of particular interest.
  • Catalogs can be made up of, or be refe ⁇ ed to, as, for example, a pattern of hai ⁇ in ligators on probe a ⁇ ays, a pattern ofthe presence of hai ⁇ in ligators on probe a ⁇ ays, a catalog of nucleic acid fragments in a sample, or a catalog of nucleic acid sequences in a sample.
  • the information in the catalog is preferably in the form of positional information (that is, location in the probe a ⁇ ay) or, more preferably, in the form of sequences.
  • Prefe ⁇ ed sequence information for catalogs include sequences of probe a ⁇ ay probes to which a hai ⁇ in ligator was coupled and sequences of nucleic acid fragments present in the sample (derived from the locations in the probe a ⁇ ay where hai ⁇ in ligators were coupled).
  • Such catalogs of nucleic acid samples can be compared to a similar catalog derived from any other sample to detect similarities and differences in the samples (which is indicative of similarities and differences in the nucleic acids in the samples).
  • a catalog of a first nucleic acid sample can be compared to a catalog of a sample from the same type of organism as the first nucleic acid sample, a sample from the same type of tissue as the first nucleic acid sample, a sample from the same organism as the first nucleic acid sample, a sample obtained from the same source but at a different time than the first nucleic acid sample, a sample from a different organism than the first nucleic acid sample, a sample from a different type of tissue than the first nucleic acid sample, or a sample from a different type of organism than the first nucleic acid sample.
  • the same type of tissue is tissue ofthe same type such as liver tissue, muscle tissue, or skin (which may be from the same or a different organism or type of organism).
  • the same organism refers to the same individual, animal, or cell.
  • two samples taken from a patient are from the same organism.
  • the same source is similar but broader, referring to samples from, for example, the same organism, the same tissue from the same organism, or the same cDNA, or the same cDNA library. Samples from the same source that are to be compared are preferably collected at different times (thus allowing for potential changes over time to be detected). This is especially useful when the effect of a treatment or change in condition is to be assessed.
  • a different organism refers a different individual organism, such as a different patient, a different individual animal.
  • Different organism includes a different organism of the same type or organisms of different types.
  • a different type of organism refers to organisms of different types such as a dog and cat, a human and a mouse, or E. coli and Salmonella.
  • a different type of tissue refers to tissues of different types such as liver and kidney, or skin and brain. Detecting the presence of hai ⁇ in ligators on a probe array can be accomplished by detection of labels inco ⁇ orated into, or coupled to, the hai ⁇ in ligators. Alternatively, the hai ⁇ in ligators can be detected based on detection of their sequence. Any ofthe numerous sequence-specific detection techniques can be used for this pu ⁇ ose, including, for example, hybridization of labeled probes.
  • the loop sequence ofthe hai ⁇ in primer is a preferred site for binding of a detector tag by complementary hybridization.
  • the loop portion ofthe hai ⁇ in primer should be long enough to permit effective binding of a complementary nucleic acid.
  • Design of hybridization probes and hybridization conditions are well known. Prefe ⁇ ed probe lengths for this pu ⁇ ose are 12 to 20 bases.
  • the nucleic acid tag may additionally bind to the bases in one side ofthe stem.
  • the presence of hai ⁇ in ligators can also be detected by generating a signal mediated by the hai ⁇ in ligator, its associated fragment, or the second primer sequence at the other end ofthe fragment.
  • Use ofthe second primer sequence as a primer for primer extension, described below, is a prefe ⁇ ed example of this.
  • the coupling event links the strand to the detector probe via the 5' end ofthe hai ⁇ in ligator, which contains, for example, a 5 '-phosphate capable of participating in ligation. After coupling, there remains a free 3'-terminus at the other end, which may be used for a labeling reaction.
  • the strand has a 3' hai ⁇ in structure at this other end (as in the bottom strand in Figure 5)
  • the strand can be labeled by primer extension.
  • Labeling is preferably performed using primer extension by the Klenow fragment of DNA polymerase I, in the presence of fluorescent dNTPs.
  • the signal to be detected for the nucleic acid fragments can be increased by nucleic acid amplification. It is prefe ⁇ ed either that the nucleic acid fragments (including hai ⁇ in ligators) that have been coupled to the detector probes be amplified or mediate amplification of another nucleic acid.
  • the fragments can be amplified using any suitable method. Prefe ⁇ ed amplification methods are those that work efficiently for the generation of surface-localizable signals.
  • a prefe ⁇ ed method is branch DNA amplification (Urdea,
  • Amplification primers can be based, for example, on the sequence ofthe hai ⁇ in primers and second primers. It is prefe ⁇ ed that amplification primers be based on hai ⁇ in primer sequences that appear in the loop ofthe hai ⁇ in structure.
  • the primer sequences and stem sequences ofthe hai ⁇ in primers can be different as discussed elsewhere herein.
  • Amplification ofthe fragment is facilitated by the presence of hai ⁇ in primer sequence at the end ofthe fragment (and by the presence of second primer sequence at the other end).
  • the primer sequences can be used for amplification primer sequences.
  • the primer sequences can also be used to circularize the adaptor/fragments for subsequent amplification by rolling circle replication. Rolling circle amplification is described in U.S. Patent No. 5,854,033 and PCT application WO 97/19193.
  • hybridization of amplified fragments to detector probes can be aided by shortening the fragment length prior to hybridization.
  • This can be accomplished, for example, by digesting the fragment with a restriction endonuclease.
  • the recognition site for the restriction endonuclease is included in the sequence ofthe hai ⁇ in primer.
  • the restriction enzyme used has a cleavage site offset from the recognition site.
  • the following example illustrates use ofthe non- palindromic Type III enzyme EcoP15I (New England Biolabs) to shorten the length of amplified fragments prior to hybridization. EcoPl 51 recognizes and cleaves the following site (SEQ ID NO: 10):
  • the bottom strand can then form the hai ⁇ in structure (nucleotides 1-53 of SEQ ID NO: 12)
  • the strands of amplified nucleic acid fragments can be separated prior to hybridization to the detector probes. Such strand separation can improve the efficiency of both formation ofthe hai ⁇ in structure and hybridization ofthe amplified fragment to the detector probe.
  • This separation can be accomplished using any suitable technique.
  • Strand separation is preferably accomplished by strand-specific digestion. This can be accomplished, for example, by digesting one ofthe strands with a nuclease such as T7 gene 6 exonuclease. By inco ⁇ orating a few phosphorothioate linkages at the 5' end ofthe hai ⁇ in primer, the strand containing the hai ⁇ in primer will be protected from exonuclease digestion while the other strand is digested.
  • the other (non-hai ⁇ in) primer can be made with 5' end phosphorothioate linkages. This will protect the opposite strand from digestion. Strand separation can also be accomplished by including a capture tag on the hai ⁇ in primer or the second primer. Capture tags and their use are described above.
  • a prefe ⁇ ed capture tag is a biotin inco ⁇ orated into a primer by using a biotin-T phosphoramidite (Glen Research No. 10-1038-95). This modified nucleotide does not interfere with primer function, and becomes inco ⁇ orated into all newly-synthesized DNA strands during PCR amplification.
  • the biotin-T is present as part ofthe hai ⁇ in primer.
  • the biotin-T is present as part of the second primer.
  • the preferred location of the biotin-T in the hai ⁇ in primer is any thymine base present in the loop sequence.
  • Capture ofthe biotinylated strand may be performed by methods well known in the art, such as the use of streptavidin-magnetic particles (Dynal, Inc.). This capture tag can then be used to immobilize one strand ofthe amplified fragments while the other strands are washed away. Either the immobilized or washed strand can be carried forward in the method.
  • the concentration ofthe various nucleic acid fragments in the index samples are normalized. Normalization can be preformed either before or after any amplification step that may be used.
  • a prefe ⁇ ed technique for fragment normalization involves immobilizing one strand ofthe nucleic acid fragments, denaturing the nucleic acid fragments, renaturing the nucleic acid fragments for a time greater than the for abundant nucleic acid fragments and less than the coti ⁇ for rare nucleic acid fragments, and collecting the un-renatured nucleic acid fragments.
  • the sequence information that can be obtained with the disclosed method can be illustrated using a specific example of a nucleic acid fragment. Assume a nucleic acid sample containing a nucleic acid fragment with the sequence (SEQ ID NOs:13 and l4)
  • the sequence ofthe detector probe is identified by the location in the probe a ⁇ ay where the hai ⁇ in ligator is detected.
  • the sequence ofthe adjacent primer sequence is identified by the probe a ⁇ ay in which the label ofthe hai ⁇ in ligator is detected (since a different set of probe a ⁇ ays is used for each index sample).
  • detection of label in the CAAATG hexamer position ofthe TCTAGTCCGAAATCCAAGCT (nucleotides 9-28 of SEQ ID NO: 17) probe array (TCTAGTCCGAAATCCAAGCT (nucleotides 9-28 of SEQ ID NO: 17) co ⁇ esponds to the primer sequence in the hai ⁇ in primer sequence in this example) indicates the presence of a nucleic acid fragment in the nucleic acid sample having the sequence
  • CAAATGTCTAGTCCGAAATCCAAGCT (nucleotides 3-28 of SEQ ID NO: 14).
  • Hai ⁇ in primers may also be utilized to multiplex a one color readout of a control and tester fragments of a gene from the same address of a slide a ⁇ ay.
  • One way to do this is to use labile and stable forms of hai ⁇ in primers as described in the following illustration. 1. Generate PCR products from cDNA using adaptor ligation. Use different hai ⁇ in primers for the tester and control, a uracil in the synthetic adapters for the testers and a thymine in the synthetic adapters for the controls. A fluorescence label my be inco ⁇ orated into the hai ⁇ in using standard fluorescent labeled nucleotides.
  • x is the hexamer probe
  • N is the hai ⁇ in
  • n is the amplified fragment
  • I indicates base pairing
  • * indicates a fluorescently labeled nucleotide.
  • nnnnnnnnNNNNNNNNNNNNNNNM where x is the hexamer probe, N is the hai ⁇ in, M is an additional base or bases, n is the binary sequence tag,
  • the released fragment to be analyzed will be: . . . nnnnnnnnnnnNNNNNNNNNNN (control ) . . . nnnnnnnnnnnNNNNNNNNNNNM (tester)
  • a prefe ⁇ ed form ofthe disclosed method involves amplification of nucleic acid fragments to which adaptor-indexers have been coupled.
  • An example of this form ofthe method is illustrated in Figures 6A-C.
  • Coupling of adaptor-indexers to nucleic acid fragments involves the following basic steps.
  • a nucleic acid sample, embodied in double stranded DNA is digested with one or more restriction endonucleases such that a set of DNA fragments having sticky ends with a variety of sequences is generated.
  • Prefe ⁇ ed for this pu ⁇ ose is the use of a single Type IIS restriction endonuclease having an offset cleavage site.
  • the nucleic acid sample is preferably divided into index samples before digestion. Where a single restriction endonuclease is used, the nucleic acid sample is preferably divided into index samples following digestion. Each index sample is then mixed with a different adaptor-indexer, each of which has a sticky end compatible with one ofthe possible sticky ends on the DNA fragments in that index sample. The adaptor-indexes are then coupled onto compatible DNA fragments.
  • Each index sample can then be digested with one or more other restriction enzymes (refe ⁇ ed to as second restriction enzymes), preferably restriction enzymes having a four base recognition sequences. All index samples are preferably digested with the same restriction enzyme(s). Alternatively, the index samples can be further divided into secondary index samples with each digested with a different second restriction enzyme or set of restriction enzymes.
  • a second adaptor can then be coupled to the DNA fragments in the index samples (or secondary index samples). Preferably, the same second adaptor is used for each index sample. Different second adaptors are preferably used with secondary index sample derived from the same index sample. In this case, it is prefe ⁇ ed that the same set of second adaptors be used with each set of secondary index samples.
  • DNA fragments in each index sample (or secondary index sample) now have adaptors coupled to each end. The DNA fragments can then be amplified using hai ⁇ in primers. Sequences in the adaptors can be used as primer binding sites for this amplification.
  • the index samples can divided into further aliquots. These are refe ⁇ ed to as restricted index samples and non-restricted index samples (or restricted secondary index samples and non-restricted secondary index samples, if there are secondary index samples).
  • the index samples (or secondary index samples) can be divided into one or more restricted index samples and one non-restricted index sample.
  • the restricted index samples (or restricted secondary index samples), but not the non-restricted index sample (or non-restricted secondary index sample) are then each digested with a different restriction endonuclease (refe ⁇ ed to as third restriction enzymes).
  • the third restriction enzymes are preferably different from any ofthe restriction enzymes or second restriction enzymes with which the sample has been digested.
  • the third restriction enzymes will cleave some DNA fragments in the restricted index samples (or restricted secondary index samples), thus making the fragment incompetent for amplification.
  • the signals generated by the restricted index samples and non-restricted index sample (or restricted and non-restricted secondary index samples) can differ, and fragments containing the recognition sequence of one ofthe third restriction enzymes can be identified.
  • Secondary index samples, restricted index samples, non-restricted index samples, restricted secondary index samples, and non-restricted secondary index samples are refe ⁇ ed to collectively herein as derivative index samples. Each is derived from an index sample and, in some cases, from another derivative index sample. In general, only those derivative index samples last generated are carried forward in the method. For example, if secondary index samples are created, the original index samples from which they were derived are no longer carried forward in the method (the secondary index samples are). Similarly, if restricted and non-restricted secondary index samples are created, then neither the original index samples nor the secondary index samples from which the restricted and non-restricted secondary index samples were derived are carried forward in the method.
  • each processed DNA fragment that is, each DNA fragment to which an adaptor-indexer was coupled
  • the probe a ⁇ ay in which the signal for a given fragment is detected is determined by the sequence ofthe original sticky end sequence (or recognition sequence).
  • Each different sticky end or recognition sequence is processed in a separate index sample; a separate probe a ⁇ ay is used for each index sample or derivative index sample.
  • the location in the probe a ⁇ ay in which the signal for a given fragment is detected is determined by the sequence in the DNA fragment adjacent to the stem ofthe hai ⁇ in structure, which is preferably the sequence adjacent to the sticky end sequence (or recognition sequence), since the detector probe must hybridize to this sequence in order to be coupled to the hai ⁇ in ligator on the fragment.
  • Hybridization based on the sequence adjacent to the sticky end sequence (or recognition sequence) is accomplished by designing the hai ⁇ in primer to result in formation of a hai ⁇ in structure with a stem that includes, and terminates at, the sticky end sequence (see example below).
  • a complex nucleic acid sample will produce a unique pattern of signals on the probe a ⁇ ays. It is this pattern that allows unique cataloging of nucleic acid samples and sensitive and powerful comparisons of the patterns of signals produced from different nucleic acid samples.
  • the probe a ⁇ ay, and location in the probe a ⁇ ay, where a DNA fragment generates a signal identifies the sequence ofthe sticky end of the DNA fragment and ofthe sequence adjacent to the sticky end (or the recognition sequence of the restriction enzyme and ofthe sequence adjacent to the recognition sequence). This is a ten base sequence when a four base sticky end and six base detector probes are used.
  • the fixed relationship between the recognition sequence and the cleavage site of a Type IIS restriction enzyme, when used, and the identity of the recognition sequence provide additional sequence information about the DNA fragment.
  • This form ofthe disclosed method is performed using one or more restriction enzymes that collectively produce a plurality of different sticky end sequences.
  • the sticky end sequences generated by the restriction enzyme are not limited by the recognition sequence ofthe restriction enzyme.
  • the sticky ends generated are preferably 2, 3, 4 or 5 nucleotides long.
  • Preferred restriction enzymes for use in the disclosed method are Type IIS restriction endonucleases, which are enzymes that cleave DNA at locations outside of (or offset from) the recognition site and which generate sticky ends. Examples of Type IIS restriction endonucleases are Fokl, Bbvl, Hgal, BspMI and SfaNI. Restriction endonucleases for use in this embodiment ofthe disclosed method produce sticky ends encompassing permutations and combinations of the four nucleotides, A, C, G, and T.
  • a restriction endonuclease such as Fokl
  • Fokl which releases fragments with four base, 5'-protruding sticky ends, will generate fragments having 4 4 or 256 possible protruding tetranucleotide ends.
  • Cleavage of a cDNA sample having an average of 12,000 different cDNAs with the restriction endonuclease Fokl will produce a mixture of fragments with four base, 5'-protruding ends.
  • Fokl cuts twice in every 4 5 base pairs giving an average fragment size of 512 base pairs.
  • each cDNA will produce approximately four fragments.
  • There are 4 4 256 possible tetranucleotide sequences and therefore 256 possible identities for each sticky end.
  • there will be 48,000/256 188 fragments with a given sticky end sequence.
  • Each of these fragments is sorted by hybridization to different detector probes based on the sequence adjacent to the sticky end sequence in each fragment.
  • a hexamer probe a ⁇ ay has 4,096 different six nucleotide probes. Thus, only 188 ofthe 4,096 hexamers in the probe a ⁇ ay will couple to a hai ⁇ in ligator, on average.
  • Each of these fragments is sorted by hybridization to different detector probes based on the sequence adjacent to the sticky end sequence in each fragment.
  • each restricted secondary index sample will be cleaved (since these restriction enzymes will cut about once every 256 base pairs).
  • there will be approximately 3,200 fragments (intact, with both an adaptor-indexer and a second adaptor) in each of the 20 * 2 * 5 200 restricted secondary index samples (there will be approximately 6,400 fragments in the non-restricted secondary index sample).
  • Each of these fragments is sorted by hybridization to different detector probes based on the sequence adjacent to the sticky end sequence in each fragment.
  • a hexamer probe a ⁇ ay has 4,096 different six nucleotide probes.
  • the length ofthe recognition sequence, the length ofthe sticky end generated, and the length ofthe detector probes used in the probe a ⁇ ays together determine the number of data bins into which the nucleic acid fragments are sorted.
  • the sorting of fragments can be matched to the complexity of the sample being analyzed.
  • a comprehensive panel of adaptor-indexers provides a means for attaching specific functional modifications to selected subsets of a complex mixture of nucleic acid fragments and identifying the molecules so modified. Such a defined subset of molecules may be further resolved by additional cleavage and indexing, or by any ofthe established techniques such as cloning, PCR amplification, or gel electrophoresis. Individual members ofthe class may be distinguished by identifying characteristics such as length, sequence, or restriction endonuclease maps. The sequence ofthe sticky ends ofthe adaptor- indexers provides a means of indexing a large number of nucleic acid fragments.
  • Detector probes of different sequence can be immobilized at different locations on the probe a ⁇ ay.
  • the sequence ofthe detector probes on the probe a ⁇ ay and the sequence of nucleic acid fragments in the index samples determine where on the probe a ⁇ ay amplified fragments become coupled.
  • the presence of fragments at different locations in the probe a ⁇ ays thus forms a pattern of signals that provides a signature or fmge ⁇ rint of a nucleic acid sample based on the presence or absence of specific nucleic acid sequences in the sample. For this reason, cataloging of this pattern of signals (that is, the pattern ofthe presence of fragments or hai ⁇ in ligators) is an embodiment ofthe disclosed method that is of particular interest.
  • Catalogs can be made up of, or be refe ⁇ ed to, as, for example, a pattern of fragments on probe a ⁇ ays, a pattern of the presence of fragments on probe a ⁇ ays, a pattern of hai ⁇ in ligators on probe a ⁇ ays, a pattern ofthe presence of hai ⁇ in ligators on probe a ⁇ ays, a catalog of nucleic acid fragments in a sample, or a catalog of nucleic acid sequences in a sample.
  • the information in the catalog is preferably in the form of positional information (that is, location in the probe a ⁇ ay) or, more preferably, in the form of sequences.
  • Prefe ⁇ ed sequence information for catalogs include sequences of detector probes to which a fragment was coupled and sequences of nucleic acid fragments present in the sample (derived from the locations in the probe a ⁇ ay where fragments were coupled).
  • sequence information can be illustrated with the following structures: DNA fragment: ..NNNNXXXX..NNNNRRRRROOOOOOOOOOOSSSSNNNN.. Sequence information:
  • each character represents a nucleotide.
  • N represents any nucleotide (having no special identity or relationship to the method).
  • R represents a nucleotide in the recognition sequence ofthe Type IIS restriction enzyme.
  • O represents a nucleotide in the offset between the recognition site and the cleavage site ofthe Type IIS restriction enzyme.
  • S represents a nucleotide in the sticky end resulting from cleavage with the Type IIS restriction enzyme.
  • X represents a nucleotide in the recognition/cleavage site ofthe second restriction enzyme.
  • I represents a nucleotide complementary to the detector probe.
  • the sequence information can be obtained.
  • the Type IIS restriction enzyme has a five base recognition sequence, a nine base offset to the cleavage site, and creates a four base sticky end.
  • the detector probes contain hexamer sequences. Each a ⁇ ay location where a signal is generated in this example thus represents a specific sequence : nnnnn — nnnnnnnnn (where n represents an identified nucleotide and each - represents an unidentified nucleotide). This is refe ⁇ ed to as a determined sequence.
  • the portion ofthe nucleic acid fragments for which the sequence is determined co ⁇ esponds to the sticky end sequence, the sequence adjacent to the sticky end sequence to which the detector probe hybridized, and the recognition sequence ofthe restriction enzyme (S, I, and R, respectively).
  • This sequence information can also be represented by the structure A-B-C-D where A is the recognition sequence ofthe restriction enzyme, B is the gap of unknown sequence, C is the sequence to which the detector probe hybridized, and D is the sticky end sequence.
  • the gap represents the nucleotides between the recognition sequence and the sequence to which the detector probe hybridized. C is always adjacent to the sticky end sequence D.
  • A is RRRRR
  • B is OOO
  • C is and D is SSSS.
  • sequence information that can be obtained with the disclosed method can be further illustrated using a specific example of a nucleic acid fragment. Assume a nucleic acid sample containing a nucleic acid fragment with the sequence (SEQ ID NO: 18)
  • the nucleic acid is hybridized to an appropriate detector probe (a hexamer in this example), and the detector probe and hai ⁇ in ligator are coupled the following structure is obtained (SEQ ID NO:22) support ATAG
  • the sequence ofthe detector probe is identified by the location in the probe a ⁇ ay where the fragment is detected.
  • the sequence ofthe adjacent sticky end is identified by the probe a ⁇ ay in which the fragment is detected (since a different probe a ⁇ ay is used for each sticky end sequence).
  • the sequence ofthe recognition sequence is identified by the relationship ofthe cleavage site to the recognition sequence.
  • A is CCTAC
  • B is NNN
  • C is ACTTCG
  • D is ATAC.
  • sequence information obtainable can be illustrated with the following structures: DNA fragment : . . NNXXXXNN . . NNRRRRNN . . . . Sequence : XXXX RRRR III IIISSSSSS
  • each character represents a nucleotide.
  • N represents any nucleotide (having no special identity or relationship to the method).
  • S represents a nucleotide in the recognition sequence (including sticky end) ofthe first restriction enzyme.
  • X represents a nucleotide in the recognition/cleavage site ofthe second restriction enzyme.
  • R represents a nucleotide in the recognition sequence ofthe third restriction enzyme.
  • I represents a nucleotide complementary to the detector probe. The sequence and distance between the recognition sites ofthe second and third restriction enzymes and between the recognition site ofthe second restriction enzyme and the probe complement are not determined in the basic method.
  • the sequence information can be obtained.
  • the detector probes contain hexamer sequences.
  • Each a ⁇ ay location where a signal is generated in this example thus represents a specific sequence : nnnn...nnnn...minnnnnnnnnn (where n represents an identified nucleotide and each ... represents an unidentified gap sequence). This is refe ⁇ ed to as a determined sequence.
  • the portion ofthe nucleic acid fragments for which the sequence is determined co ⁇ esponds to the recognition sequence ofthe first restriction enzyme, the sequence adjacent to the recognition sequence to which the detector probe hybridized, the recognition sequence of the second restriction enzyme, and the recognition sequence ofthe third restriction enzyme (S, I, X, and R, respectively).
  • This sequence information can also be represented by the structure
  • B is a gap of unknown sequence
  • C is the sequence to which the detector probe hybridized
  • D is the recognition sequence ofthe first restriction enzyme
  • E is the recognition sequence ofthe second restriction enzyme
  • F is the recognition sequence ofthe third restriction enzyme.
  • the gaps represent nucleotides between the recognition sequences ofthe second and third restriction enzymes and between the recognition sequence ofthe third restriction enzyme and the sequence to which the detector probe hybridized.
  • C is always adjacent to the recognition sequence D.
  • C is D is SSSSSS
  • E is XXXX
  • F is RRRR.
  • sequence information that can be obtained with the disclosed method can be further illustrated using a specific example of a nucleic acid fragment. Assume a nucleic acid sample containing a nucleic acid fragment with the sequence (SEQ ID NOs:24, 25, and 26; restriction enzyme recognition sequences in boldface)
  • the hai ⁇ in primer After addition ofthe second adaptor and amplification using the co ⁇ esponding hai ⁇ in primer (GGATCTGGTATAGGCTGTAATACCAGATCC; SEQ ID NO:28), the following nucleic acid is obtained (SEQ ID NO: 33 and SEQ ID NO:29; sequence from the adaptor-indexer is underlined, the hai ⁇ in primer is italicized). Note that the hai ⁇ in primer hybridizes to both the sticky end sequence and the remaining recognition sequence (that is, the C not in the sticky end).
  • An aliquot (that is, a restricted index sample) ofthe sample can be digested with Alul (recognition site AGCT) prior to amplification. By cutting the fragment, amplification is prevented. This lack of amplification in the restricted index sample indicates the presence ofthe sequence TCGA in the fragment.
  • a hai ⁇ in structure is formed in the bottom strand (in this example), the fragment is hybridized to an appropriate detector probe (a hexamer in this example), and the detector probe and hai ⁇ in ligator are coupled the following structure is obtained (SEQ ID NO:30; sequence from the adaptor- indexer is underlined, the hai ⁇ in primer is italicized, restriction enzyme recognition sequences in boldface) support AGG
  • the sequence ofthe detector probe is identified by the location in the probe a ⁇ ay where the hai ⁇ in ligator is detected.
  • the sequence ofthe adjacent recognition sequence (including the sticky end) is identified by the probe a ⁇ ay in which the hai ⁇ in ligator is detected (since a different set of probe a ⁇ ays is used for each index sample).
  • the sequence ofthe recognition sequence ofthe second restriction enzyme is identified by the probe a ⁇ ay in which the hai ⁇ in ligator is detected (since a different set of probe a ⁇ ays is used for each secondary index sample).
  • the presence of an internal sequence is determined by seeing if the signal is absent from the probe a ⁇ ay for the restricted secondary index sample that was digested with the third restriction enzyme (a different probe a ⁇ ay is used for each restricted and non-restricted secondary index sample). If the signal is absent, it indicates the recognition site is present in the fragment.
  • detection of hai ⁇ in ligator in the AGCTAT hexamer position ofthe TCGA third recognition site probe a ⁇ ay in the GTAC second recognition site set of probe a ⁇ ays in the CCTAGG sticky end set of probe arrays indicates the presence of a nucleic acid fragment in the nucleic acid sample having the sequence
  • the primer sequences in the hai ⁇ in primers are partly degenerate. In this way, multiple different nucleic acid fragments will be amplified in each index sample.
  • partially degenerate primer sequences it is prefe ⁇ ed that the 3' end ofthe primer sequence of all ofthe hai ⁇ in primers used in a given index sample be the same. It is also prefe ⁇ ed that the co ⁇ esponding 3' end sequences of hai ⁇ in primers used in different index samples be different.
  • the fragments amplified in each index sample will have related primer complement sequences while the sets of fragments amplified in the different index samples will be different.
  • Such relationships provide a maximum of both sequence information for the fragments and catalog complexity.
  • sets of hai ⁇ in primers with partially degenerate primer sequences can be illustrated with the following example.
  • Sets of hai ⁇ in primers where the primers sequences in each set has, from 5' to 3', 8 specific bases and 12 degenerate bases can prime amplification from all sites in a nucleic acid sample having a sequence complementary to the 8 specified bases.
  • the sequence ofthe specified bases in each ofthe sets can be different.
  • Each different sequence, and thus each different set, of hai ⁇ in primers will prime amplification from a different set of sites in a nucleic acid sample. In a sufficiently complex nucleic acid sample, all of these sequences will be represented in the set of amplified fragments.
  • Mass spectrometry techniques can be utilized for detection in the disclosed method. These techniques include matrix-assisted laser deso ⁇ tion/ionization time-of-flight (MALDI-TOF) mass spectroscopy. Such techniques allow automation and rapid throughput of multiple samples and assays.
  • MALDI-TOF matrix-assisted laser deso ⁇ tion/ionization time-of-flight
  • Mass spectrometry detection works better with smaller molecules so it is useful to cut some components ofthe method prior to, or as part of mass spectrometry detection.
  • a number of methods are contemplated where an ohgonucleotide molecule to be detected is cut to a shorter length prior to detection by mass spectrometry.
  • the disclosed method would proceed as normal and, in the prefe ⁇ ed embodiment, the surface that has the detector probes attached would be compatible with the source region of a matrix assisted laser deso ⁇ tion ionization, time of flight, mass spectrometer (MALDI-TOF - MS). The resultant fragment would look something like
  • P are the detector probe, coupled to the fragment
  • X are complementary bases ofthe hai ⁇ in primer and amplified fragment
  • the performance of mass spectrometry techniques degrades for DNA samples.
  • Chemical, biological, physical (thermal), and other cleaving reagents can be used to generate smaller, more optimal, sub-fragments to be analyzed in the mass spectrometer.
  • the degree of fragmentation is somewhat tunable in instruments like the Q-TOF systems (Micromass, US head office at Suite 407N, 100 Cummings Center, Beverly, MA 01915-6101, USA.) where one can look at the parent ion, then increase the fragmentation to see the decomposition fragments and thus the sequence; such a technique is contemplated to determine the full sized sub- fragment, and infer the sequence ofthe sub-fragment through these known tools.
  • the detectable fragment can be top strand, bottom strand, or both strands depending upon the scheme.
  • the label may be a cleavable mass tag or the strand need not be labeled.
  • cleaving reagents for this pu ⁇ ose.
  • one technique is that of Szybalski (described elsewhere herein) where Fokl is used to cut at a fixed distance from an arbitrary, specific, recognition site.
  • This technique can be extended to other restriction enzymes of Type IIS or Type III.
  • McrBC New England Biolabs
  • the cut site is not well defined (approximately 30 bases) which may be used to advantage to generate the parent as well as the fragmentation set.
  • Metal containing po ⁇ hyrins attached to ohgonucleotides have been shown to cut DNA very near the po ⁇ hyrin when exposed to light (texaphyrins, US5607924).
  • Another cleavage technology is that of Dervan (Cartwright et al, Cleavage of chromatin with methidiumpropyl-EDTA . iron(II).
  • a mass label such as peptide nucleic acid (PNA) molecules (Hanvey et al., Science 258:1481-1485 (1992)) of different sequence and molecular weight can be used as labels that bind specifically to sequence in hai ⁇ in primers or second primers.
  • PNA peptide nucleic acid
  • Laser deso ⁇ tion ofthe samples is used to generate MALDI-TOF mass spectra ofthe PNA labels, which are released into the spectrometer and resolved by mass.
  • the intensity of each PNA label reveals the relative amount of different components.
  • the PNA spectra generate scalar values that are indirect indicators ofthe relative abundance ofthe labeled component at specific locations in an a ⁇ ay. Probability Detection
  • Sequencing by hybridization is known to produce mismatch e ⁇ ors (Lipshutz, Likelihood DNA sequencing by hybridization. J Biomol Struct Dyn, 11(3):637 -53 (1993)).
  • Database searching for sequence information cu ⁇ ently is regular expression based and requires matched "letters" between the database entry and the search sequence.
  • the disclosed method allows replacement of regular expression matching (match versus no-match per base) with a probability function to determine a confidence in the assignment of the identity of a sequence tag (that is, the fragments produced in the disclosed method).
  • the disclosed method uses covalent coupling to improve the specificity ofthe hybridization near the coupling site. Despite this improvement, there will remain a finite probability of a mismatch, particularly for nucleotides more removed from the coupling site.
  • the e ⁇ or rate depends on least two mismatch properties: base pairing, i.e. A with G; distance from the coupling site.
  • weight matrices are used, following Dayhoff (Dayhoff et al., A model of evolutionary changes in proteins, in Atlas of Protein Sequence and Structure, M.O. Dayhoff, Editor. 1978, National Biomedical Research Foundation: Washington DC) and decia (Venezia and O ⁇ ara, Rapid motif compliance scoring with match weight sets. Comput Appl Biosci, 9(l):65-9 (1993)) protein techniques.
  • the coefficient in these matrices will be determined experimentally for the disclosed method.
  • matrices with illustrative coefficients representing position 1 and 2, where the columns represent the upper strand nucleotide and the rows represent the lower strand nucleotide. The actual coefficients can be determined empirically.
  • A[.02, .90, .03, .05] A[.01,.97,.01,.01] T[.90, .02, .03, .05] T [ .97 , .01, .01, .01]
  • This procedure can be extended to an arbitrary number of bases in a similar manner.
  • the score can be computed for all possible mismatches and rank ordered to reveal the most probable identity.
  • a cut-off score can be used to reduce the number of possible identities from the matrix estimation. For example using the example matrices above, sequences with a threshold score above 0.50 would yield only one sequence, that being a sequence which matches the probe.
  • This method of estimating sequences and their respective probability scores from the universe of mismatch events for a said probe can from extended from 1 to n, where n is the number of free bases available for hybridization.
  • one can compute a confidence value for uniqueness if one assumes a random distribution of bases. For example, if one has a candidate of 15 bases in length, in an organism which has an estimated 10 8 base genome, one expects the 15 base fragment to be unique because 10 /4 0.1 is much less than 1. The genome would have to be 10 times larger before one would expect an occu ⁇ ence of two instances ofthe particular 15 base fragment.
  • the distributions, in known genomes, are known not to be completely random and the initial assumption of a random distribution can be improved as information is gathered. This new information can be used to assign and use confidence values.
  • a fictitious gene family ABCD whose members are ABCDl, ABCD2 and ABCD3.
  • the three members were discovered following some event such as heat shock, and they are thus putatively assigned to belong to the heat shock family of genes and happen to have significant stretches of conserved sequence among the family of genes.
  • the organism to be a plant, where ABCDl was isolated from the plant root, ABCD2 was isolated from the plant leaf, and ABCD3 was isolated from the plant flower.
  • the estimation matrix may look like
  • the source ofthe sample i.e. root, leaf or flower
  • the matrix must contain all elements ofthe family, here to allow for a still to be found gene in this family, the rows and columns do not add to 1 ; all the other members are assigned a sum of 0.05, the values to be updated as the amount of information known about the organism increases.
  • the estimation matrix would be constructed from the known organism data in the database.
  • the catalog also contains the probabilities, and/or entries derived from the probabilities, for each probe/target combination, as discussed above. For pu ⁇ ose of illustration, let us assume that the probability of having probe sequence A paired with target sequence AA is 0.80, and the probability of having probe sequence A paired with sequence BB is 0.10, probe sequence B paired with target sequence AA is 0.05, and the probability of having probe sequence B paired with sequence BB is 0.75, or estimation AA BB A .80 .10 B .05 .75 It is a simple matter of application of linear algebra to determine the signals co ⁇ esponding to each target.
  • Comparison ofthe pattern for the control and tester, for the sequence co ⁇ esponding to AA exhibits an increase in the relative amount of AA from 0.24 to 0.64 for control to tester respectively. All other entries in the pattern are calculated in the same fashion.

Abstract

Disclosed is a method for the comprehensive analysis of nucleic acid samples and a detector composition for use in the method. The method involves amplifying nucleic acid fragments of interest using a primer that can form a hairpin structure; sequence-based coupling of the amplified fragments to detector probes; and detection of the coupled fragments. The amplified fragments are coupled by hybridization and coupling, preferably by ligation, to detector probes. A hairpin structure formed at the end of the amplified fragments facilitates coupling of the fragments to the probes. The method allows detection of the fragments where detection provides some sequence information for the fragments. The method allows a complex sample of nucleic acid to be quickly and easily cataloged in a reproducible and sequence-specific manner. The method can also be used to detect amplified fragments having a known sequence.

Description

ANALYSIS OF SEQUENCE TAGS WITH HAIRPIN PRIMERS BACKGROUND OF THE INVENTION
The disclosed invention is generally in the field of nucleic acid characterization and analysis, and specifically in the area of analysis and comparison of gene expression patterns and genomes.
The study of differences in gene-expression patterns is one of the most promising approaches for understanding mechanisms of differentiation and development. In addition, the identification of disease-related target molecules opens new avenues for rational pharmaceutical intervention. Currently, there are two main approaches to the analysis of molecular expression patterns: (1) the generation of mRNA-expression maps and (2) examination of the 'proteome', in which the expression profile of proteins is analyzed by techniques such as two- dimensional gel electrophoresis or mass spectrometry (matrix-assisted- desorption-ionization-time-of- flight (MALDI-TOF)) and by the ability to sequence sub-picomole amounts of protein. Classical approaches to transcript imaging, such as northern blotting or plaque hybridization, are time-consuming and material-intensive ways to analyze mRNA-expression patterns. For these reasons, other methods for high-throughput screening in industrial and clinical research have been developed. A breakthrough in the analysis of gene expression was the development of the northern-blot technique in 1977 (Alwine et al., Proc. Natl Acad. Sci. U.S.A. 74:5350-5354 (1977)). With this technique, labeled cDNA or RNA probes are hybridized to RNA blots to study the expression patterns of mRNA transcripts. Alternatively, RNase-protection assays can detect the expression of specific RNAs. These assays allow the expression of mRNA subsets to be determined in a parallel manner. For RNase-protection assays, the sequence of the analyzed mRNA has to be known in order to synthesize a labeled cDNA that forms a hybrid with the selected mRNA; such hybrids resist RNA degradation by a single-strand-specific nuclease and can be detected by gel electrophoresis. As a third approach, differential plaque-filter hybridization allows the identification of specific differences in the expression of cloned cDNAs
(Maniatis et al. Cell 15:687-701 (1978)). Although all of these techniques are excellent tools for studying differences in gene expression, the limiting factor of these classical methods is that expression patterns can be analyzed only for known genes.
The analysis of gene-expression patterns made a significant advance with the development of subtractive cDNA libraries, which are generated by hybridizing an mRNA pool of one origin to an mRNA pool of a different origin. Transcripts that do not find a complementary strand in the hybridization step are then used for the construction of a cDNA library (Hedrick et al., Nature 308:149-153 (1984)). A variety of refinements to this method have been developed to identify specific mRNAs (Swaroop et al., Nucleic Acids Res. 25:1954 (1991); Diatchenko et al, Proc. Natl Acad. Sci. U.S.A 93:6025-6030 (1996)). One of these is the selective amplification of differentially expressed mRNAs via biotin- and restriction-mediated enrichment (SABRE; Lavery et al., Proc. Natl. Acad. Sci. U.S.A. 94:6831-6836 (1997)), cDNAs derived from a tester population are hybridized against the cDNAs of a driver (control) population. After a purification step specific for tester-cDNA-containing hybrids, tester-tester homohybrids are specifically amplified using an added linker, thus allowing the isolation of previously unknown genes.
The technique of differential display of eukaryotic mRNA was the first one-tube method to analyze and compare transcribed genes systematically in a bi-directional fashion; subtractive and differential hybridization techniques have only been adapted for the unidirectional identification of differentially expressed genes (Liang and Pardee, Science 257:967-971 (1992)). Refinements have been proposed to strengthen reproducibility, efficiency, and performance of differential display (Bauer et al, Nucleic Acids Res. 11 :4272-4280 (1993); Liang and Pardee, Curr. Opin. Immunol 7:274-280 (1995); Ito and Sakaki, Methods Mol Biol. 85:37-44 (1997); Praschar and Weissman, Proc. Natl. Acad. Sci U.S.A. 93;659-663 (1996) , Shimkets et al, Nat Biotechnol, 17: 798-803 (1999)). Although these approaches are more reproducible and precise than traditional PCR-based differential display, they still require the use of gel electrophoresis, and often implies the exclusion of certain DNA fragments from analysis.
Originally developed to identify differences between two complex genomes, representational difference analysis (RDA) was adapted to analyze differential gene expression by taking advantage of both subtractive hybridization and PCR (Lisitsyn et al., Science 259:946-951 (1993); Hubank and Schatz, Nucleic Acids Res. 22:5640-5648 (1994)). In the first step, mRNA derived from two different populations, the tester and the driver (control), is reverse transcribed; the tester cDNA represents the cDNA population in which differential gene expression is expected to occur. Following digestion with a frequently cutting restriction endonuc lease, linkers are ligated to both ends of the cDNA. A PCR step then generates the initial representation of the different gene pools. The linkers of the tester and driver cDNA are digested and a new linker is ligated to the ends of the tester cDNA. The tester and driver cDNAs are then mixed in a 1 : 100 ratio with an excess of driver cDNA in order to promote hybridization between single-stranded cDNAs common in both tester and driver cDNA pools. Following hybridization of the cDNAs, a PCR exponentially amplifies only those homoduplexes generated by the tester cDNA, via the priming sites on both ends of the double-stranded cDNA (O'Neill and Sinclair, Nucleic Acids Res. 25:2681-2682 (1997); Wada et al., Kidney Int. 51 :1629-1638 (1997); Edman et al., J 323:113-118 (1997)).
The gene-expression pattern of a cell or organism determines its basic biological characteristics. In order to accelerate the discovery and characterization of mRNA-encoding sequences, the idea emerged to sequence fragments of cDNA randomly, direct from a variety of tissues (Adams et al, Science 252:1651-1656 (1991); Adams et al., Nature 377:3-16 (1995)). These expressed sequence tags (ESTs) allow the identification of coding regions in genome-derived sequences. Publicly available EST databases allow the comparative analysis of gene expression by computer. Differentially expressed genes can be identified by comparing the databases of expressed sequence tags of a given organ or cell type with sequence information from a different origin (Lee et al, Proc. Natl. Acad. Sci. U.S.A. 92:8303-8307 (1995); Vasmatzis et al., Proc. Natl. Acad. Sci. U. S A. 95:300-304 (1998)). A drawback to sequencing of ESTs is the requirement for large-scale sequencing facilities.
Serial analysis of gene expression (SAGE) is a sequence-based approach to the identification of differentially expressed genes through comparative analyses (Velculescu et al., Science 270:484-487 (1995)). It allows the simultaneous analysis of sequences that derive from different cell population or tissues. Three steps form the molecular basis for SAGE: (1) generation of a sequence tag (10-14 bp) to identify expressed transcripts; (2) ligation of sequence tags to obtain concatemers that can be cloned and sequenced; and (3) comparison of the sequence data to determine differences in expression of genes that have been identified by the tags. This procedure is performed for every mRNA population to be analyzed. A major drawback of SAGE is the fact that corresponding genes can be identified only for those tags that are deposited in gene banks, thus making the efficiency of SAGE dependent on the extent of available databases. Alternatively, a major sequencing effort is required to complete a SAGE data set capable of providing 95 % coverage of any given mRNA population, simply because most of the sequencing work yields repetitive reads on those tags that are present in high frequency in cellular mRNA. In other words, SAGE sequencing experiments yield diminishing returns for rare mRNAs, whose unique tags will begin to accumulate in the database only after many weeks of sequencing effort.
A different approach to the study of gene-expression profiles and genome composition is the use of DNA microarrays. Current DNA microarrays are systematically gridded at high density. Such microarrays are generated by using cDNAs (for example, ESTs), PCR products or cloned DNA, which are linked to the surface of nylon filters, glass slides or silicon chips (Schena et al., Science 270, 467-470 (1995). DNA arrays can also be assembled from synthetic ohgonucleotides, either by directly applying the synthesized ohgonucleotides, either by directly applying the synthesized ohgonucleotides to the matrix or by a more sophisticated method that combines photolithography and solid-phase chemical synthesis (Fodor et al., Nature 364:555-556 (1993)). To determine differences in gene-expression, labeled cDNAs or ohgonucleotides are hybridized to the DNA-or oligomer-carrying arrays. When using different fluorophores for labeling cDNAs or ohgonucleotides, two probes can be applied simultaneously to the array and compared at different wavelengths. The expression of 10,000 genes and more can be analyzed on a single chip (Chee et al, Science 274:610-614 (1996)). However, depending on the sensitivity of both cDNA and ohgonucleotide arrays, the intensity of hybridization signals can leave the linear range when either weakly or abundantly expressed genes are analyzed. Thus, individual optimization steps are required to ensure the accurate detection of differentially expressed genes. While such microarray methods may be used to address a number of interesting biological questions, they are not suitable for the discovery of new genes. There is a need for a method that combines the power and convenience of array hybridization technology with the capability for gene discovery inherent in differential display or SAGE. Such a method would be most attractive if it could enable comprehensive gene expression analysis without the use of gel electrophoresis, and without the need for a redundant DNA sequencing effort. Therefore, it is an object of the present invention to provide a method for the comprehensive analysis of nucleic acid sequence tags.
It is another object of the present invention to provide a detector composition that allows indexing of nucleic acid sequence tags.
It is another object of the present invention to provide a method for sequence-based detection of nucleic acid fragments of interest.
BRIEF SUMMARY OF THE INVENTION Disclosed is a method for the comprehensive analysis of nucleic acid samples and a detector composition for use in the method. The method involves amplifying nucleic acid fragments of interest using a primer that can form a hairpin structure; sequence-based coupling of the amplified fragments detector probes; and detection of the coupled fragments. The amplified fragments are coupled by hybridization and covalent coupling, preferably by ligation, to a detector probe. The probe is preferably immobilized in an array or on sortable beads. A haiφin structure formed at the end of the amplified fragments facilitates coupling of the fragments to the probes. The method allows detection of the fragments where detection provides some sequence information about the fragments. The method allows a complex sample of nucleic acid to be cataloged quickly and easily in a reproducible and sequence-specific manner. The method can also be used to detect amplified fragments having a known sequence. BRIEF DISCRIPTION OF THE DRAWINGS
Figures 1 A- IE are a listing of examples of hairpin primers and the haiφin structure that forms from the resulting haiφin ligator incoφorated at the end of an amplified fragment. Nucleotides in one of the strands of the stem of the haiφin structure are represented by H. Nucleotides in the primer sequence of the haiφin primer are represented by p and P. Nucleotides in the part of the primer sequence involved in one of the strands of the stem of the haiφin structure are represented by P. Nucleotides in the fragment are represented by f and F.
Nucleotides in the part of the fragment sequence involved in one of the strands of the stem of the haiφin structure are represented by F. Other nucleotides in the hairpin primer (that is, nucleotides that are neither part of the stem nor part of the primer sequence) are represented by n. In the haiφin ligator for haiφin primer 10, which represents an example of a haiφin primer used with adaptor-indexers, nucleotides in the primer sequence corresponding to sticky end sequences are boldface, nucleotides corresponding to adaptor-indexer sequences are underlined, and the recognition sequence of the restriction endonuclease (Fokl in this example) is listed as CCTAC. Figures 2A-2B are a diagram of nucleic acid molecules used and formed during an example of the disclosed method using generic sequences. Ligation of the top strand of the amplified fragment is illustrated. Nucleotides in one of the strands of the stem of the haiφin structure are represented by H. Nucleotides in the primer sequence of the hairpin primer are represented by p and P. Nucleotides in the part of the primer sequence involved in one of the strands of the stem of the haiφin structure are represented by P. Nucleotides in the fragment are represented by c, f, and F. Nucleotides in the part of the fragment sequence involved in one of the strands of the stem of the haiφin structure are represented by F. Nucleotides in the fragment complementary to the primer sequence of the haiφin primer are represented by c. Nucleotides in the detector probe are represented by I. Nucleotides in the fragment complementary to the detector probe are represented by f (boldface). Other nucleotides in the haiφin primer (that is, nucleotides that are neither part of the stem nor part of the primer sequence) are represented by n. Figures 3A-3C are a diagram of nucleic acid molecules used and formed during an example of the disclosed method using specific sequences. Ligation of the top strand of the amplified fragment is illustrated. Nucleotides in the fragment complementary to the detector probe are boldface. Depicted from top to bottom are the haiφin primer (SEQ ID NO:2), the nucleic acid fragment (SEQ ID NO:3), the haiφin primer hybridized to bottom strand of the nucleic acid fragment, the amplified nucleic acid fragment (SEQ ID NO:4), the haiφin structure formed in the top strand of the amplified nucleic acid fragment, and the amplified nucleic acid strand ligated to a detector probe (SEQ ID NO:32). The molecules and structures of Figures 3A-3C can be directly compared with those of Figure 2 to identify sequences in Figures 3 A-3C having particular significance.
Figures 4A-4B are a diagram examples of an amplified fragment (SEQ ID NO:4), the haiφin structures that can be formed from the haiφin ligators in the fragment strands, and the detector probes to which the haiφin ligators can be ligated. The diagram illustrates the relationship of an amplified fragment to the formation of 5' haiφin structures and 3' haiφin structures and the relationship of the polarity of a haiφin structure and the polarity of the detector probe to which it can be ligated. Figure 5 is a diagram of an example of the disclosed method where haiφin primers are used to prime amplification of both strands of a nucleic acid molecule. Each strand of the resulting amplified fragment has a haiφin ligator at each end and a haiφin structure of opposite polarity can form at each end of both strands. Figures 6 A-6C are a diagram of nucleic acid molecules used and formed during an example of the disclosed method using adaptor-indexers. Ligation of the top strand of the amplified fragment is illustrated. The restriction enzyme recognition sequence is underlined and the sticky end sequence is in bold. The fragment (SEQ ID NO:5) is shown at the top of the diagram. Depicted in order from top to bottom are the nucleic acid molecule after cleavage with Fokl; the nucleic acid fragment (left) and an example of a compatible adaptor-indexer (SEQ ID NO:6; right); the adaptor-indexer ligated to the nucleic acid fragment (SEQ ID NO:7); the haiφin primer (SEQ ID NO:8) hybridized to the top strand of the adaptor/fragment (nucleotides 13-47 of SEQ ID NO:7); the fragment after amplification (SEQ ID NO:9); the haiφin structure formed by the bottom strand of the amplified fragment; the haiφin structure mixed with the probe array (showing the relevant detector probe); and the fragment ligated to the probe array (SEQ ID NO:31). The fragment sequence determined in this example is GGATGNNNTTAGCATACC (SEQ ID NO:l).
DETAILED DESCRIPTION OF THE INVENTION
The disclosed method allows a complex sample of nucleic acid to be quickly and easily cataloged in a reproducible and sequence-specific manner. Such a catalog can be compared with other, similarly prepared catalogs of other nucleic acid samples to allow convenient detection of differences between the samples. The catalogs, which incoφorate information about the nucleic acid samples, can serve as fingeφrints of the nucleic acid samples which can be used both for detection of related nucleic acid samples and comparison of nucleic acid samples. For example, the presence or identity of specific organisms can be detected by producing a catalog of nucleic acid of the test organism and comparing the resulting catalog with reference catalogs prepared from known organisms. Changes and differences in gene expression patterns can also be detected by preparing catalogs of mRNA from different cell samples and comparing the catalogs. The catalog of sequences can also be used to produce a set of probes or primers that is specific for the source of a nucleic acid sample.
Comparison of nucleic acid catalogs produced with the disclosed method is facilitated by the highly ordered nature of the sequence information produced and cataloged in the method. Use of immobilization, sorting, and/or array detection in the method allows automation of the method, the cataloging of the information, and comparisons to other catalogs. The method results in the equivalent of a large number of sequence-specific bins that can be filled, empty, or filled to different levels, with the pattern of filled and empty bins, and/or the amount of signal in a bin, providing information about the nucleic acid sample that has been cataloged.
The disclosed method also allows specific and sensitive detection of nucleic acid fragments of interest. The use of sequence-based covalent coupling in the detection increases the reliability of detection over detection methods based only on probe hybridization. The disclosed method is also more efficient and less time consuming than conventional nucleic acid sequencing techniques.
One embodiment of the disclosed method involves the following basic steps. Where multiple different primer sequences are used, the nucleic acid sample is preferably divided into aliquots (referred to as index samples) before amplification. Preferably, the nucleic acid sample is divided into as many aliquots as the number of primer sequences used. Preferred nucleic acid samples for use in the disclosed method are samples to which adapter-indexers have been coupled. Where a single primer sequence is used, the nucleic acid sample is preferably not divided into index samples. Each index sample is then mixed with a different haiφin primer, each of which has a different primer sequence. For PCR amplification, a second primer is also mixed with each index sample. It is preferred that the second primer not be a haiφin primer. The index samples are then amplified.
Next, the index samples are treated to allow formation of haiφin structures at the fragment ends containing haiφin primer sequences. This is preferably accomplished by digesting one of the strands of the amplified fragments. Finally, the index samples are reacted with and coupled to detector probes. It is preferred that the probes include every possible sequence of a given length (for example, every possible six base sequence). The ends of the detector probes and the haiφin ends are coupled only if the probe hybridizes adjacent to the end of the haiφin ligator. Preferably each index sample is reacted with a different probe array. Coupling can be accomplished using any suitable technique, including ligation and chemical reactions. Ligation is preferred. When coupling is by ligation, there should be a 5 '-phosphate capable of participating in ligation on the appropriate strand.
Each processed DNA fragment from the sample will result in a signal based on coupling of an amplified fragment to a probe. A complex nucleic acid sample will produce a unique pattern of signals. It is this pattern that allows unique cataloging of nucleic acid samples and sensitive and powerful comparisons of the patterns of signals produced from different nucleic acid samples. The detector probe to which a DNA fragment is coupled identifies the sequence of the DNA fragment to which the primer hybridized and the adjacent sequence of the DNA fragment to which the detector probe hybridized.
Coupling of amplified fragments to probes can be detected directly or indirectly. For example, any of the probe or the amplified fragment can be detected. Association of an amplified fragment with a given probe is indicative of coupling of the probe and the amplified fragment. Detection of such associations can be facilitated through immobilization of the probes or haiφin primers, and through the use of capture tags, sorting tags and detectable labels in association with the probes, haiφin primers, and/or amplified fragments. Any combination of immobilization and association with capture tags, sorting tags, and labels can be used. Preferably, the probes are immobilized in arrays and the amplified fragments are associated with a detectable label. Thus, detection of a signal at a particular location in a particular array of detector probes can provide information about nucleic acid fragments indexed from the nucleic acid sample. Where the probes are immobilized in arrays, the array, and location in the array, where a DNA fragment generates a signal identify the sequence of the DNA fragment. The same effect can be accomplished by otherwise capturing, sorting, or detecting particular probes (via capture tags, sorting tags, and labels). That is, so long as the probe and the DNA fragment coupled to it can be identified, a pattern can be determined.
A preferred form of the disclosed method uses nucleic acid fragments to which adapter-indexers have been covalently coupled for amplification using haiφin primers. The manner in which the adaptor-indexers are coupled to nucleic acid fragments results in indexing of different fragments and preservation of sequence information about the fragments. Adaptor-indexes are coupled to nucleic acid fragments using the following basic steps. A nucleic acid sample is cleaved with one or more nucleic acid cleaving reagents (preferably restriction endonucleases) that results in a set of DNA fragments having sticky ends with a variety of sequences. The sample may also be divided into aliquots (referred to as index samples); preferably as many aliquots as there are sticky end sequences. Where multiple nucleic acid cleaving reagents are used, the nucleic acid sample is preferably divided into index samples before digestion. Where a single nucleic acid cleaving reagent is used, the nucleic acid sample is preferably divided into index samples following digestion. Each index sample is then mixed with a different adaptor-indexer, each of which has a sticky end compatible with one of the possible sticky ends on the DNA fragments in that index sample. The adaptor-indexes are then covalently coupled to compatible DNA fragments.
Each index sample can then be cleaved with one or more other nucleic acid cleaving reagents (referred to as second nucleic acid cleaving reagents), preferably a restriction enzyme having a four base recognition sequence. A second adaptor can then be covalently coupled to the DNA fragments in the index samples. The DNA fragments are then amplified using haiφin primers as described above. For this form of the method, it is preferred that the primer sequences of the haiφin primers are complementary to sequences in the adaptor- indexers.
Materials Nucleic Acid Samples
Any nucleic acid sample can be used with the disclosed method. Examples of suitable nucleic acid samples include genomic samples, mRNA samples, cDNA samples, nucleic acid libraries (including cDNA and genomic libraries), whole cell samples, environmental samples, culture samples, tissue samples, bodily fluids, and biopsy samples. Numerous other sources of nucleic acid samples are known or can be developed and any can be used with the disclosed method. Preferred nucleic acid samples for use with the disclosed method are nucleic acid samples of significant complexity such as genomic samples, cDNA samples, and mRNA samples.
Nucleic acid fragments are segments of larger nucleic molecules. Nucleic acid fragments, as used in the disclosed method, generally refer to nucleic acid molecules that have been amplified or that have been cleaved. A nucleic acid sample that has been amplified is referred to as an amplified sample. A nucleic acid sample that has been cleaved using a nucleic acid cleaving reagent is referred to as a digested sample.
An index sample is a nucleic acid sample that has been divided into different aliquots for further processing. In the context of the disclosed method, index samples are preferably aliquots of a nucleic acid sample to which different haiφin primers will be added. In the disclosed method, different nucleic acid fragments are processed in the different index samples based on the primer sequences of the haiφin primers. Thus, it is preferred that nucleic acid samples be divided into as many index samples as the number of haiφin primers used for amplification.
A control nucleic acid sample is a nucleic acid sample to which another nucleic acid sample (which can be referred to as a tester nucleic acid sample) is to be compared. A control index sample is an index sample to which another index sample (which can be referred to as a tester index sample) is to be compared.
Secondary index samples are aliquots of index samples. Thus, index samples can be divided into a plurality of secondary index samples. Secondary index samples are to be cleaved with a nucleic acid cleaving reagent, preferably a restriction enzyme. Restricted index samples and non-restricted index samples are aliquots of index samples. Restricted index samples are to be cleaved with a nucleic acid cleaving reagent while non-restricted index samples are not. Restricted secondary index samples and non-restricted secondary index samples are aliquots of secondary index samples. Restricted secondary index samples are to be cleaved with a nucleic acid cleaving reagent while non-restricted secondary index samples are not. Secondary index samples, restricted index samples, non-restricted index samples, restricted secondary index samples, and non-restricted secondary index samples are referred to collectively herein as derivative index samples. Each is derived from an index sample and, in some cases, from another derivative index sample. Hairpin Primers
A haiφin primer is a nucleic acid molecule that contains a primer sequence and that can form a stem-loop or haiφin structure. For convenience, and unless otherwise indicated, both haiφin structures and stem-loop structures are referred to herein as haiφin structures. The base paired portion of a haiφin structure is referred to as the stem of the haiφin structure. Haiφin primers are used in the disclosed method as specialized amplification primers that, following amplification, can form a haiφin structure at the end on amplified nucleic acid fragments. The haiφin is designed to allow sequence-specific covalent coupling of a detector probe to the end of the haiφin based on the adjacent sequence of the amplified fragment. The primer sequence of a haiφin primer is at the 3' end of the haiφin primer. The stem of a haiφin primer can involve all or part of the primer sequence. Although it is preferred, the stem need not extend to the 3' end ofthe primer sequence. The stem can also extend into the sequence ofthe amplified fragment. It is preferred that the stem of a haiφin primer involves all ofthe primer sequence without extending into the sequence ofthe amplified fragment.
Where fragments containing adaptor-indexers are amplified, it is preferred that the primer sequence ofthe haiφin primers be complementary to sequences in the adaptor-indexer. The stem of a haiφin primer can involve all or part ofthe sticky end sequence (or recognition sequence) for which the adaptor-indexer is designed. Although it is preferred, the stem need not extend to the 3' end ofthe sticky end sequence (or recognition sequence). The stem can also extend into the sequence ofthe amplified fragment beyond the sticky end sequence (or recognition sequence). It is preferred that the stem of a haiφin primer involves all ofthe sticky end sequence (or recognition sequence) without extending further into the sequence of the amplified fragment.
Some examples of haiφin structures of haiφin primers and their relationships to amplified nucleic acids are illustrated in Figures 1A-1B. Haiφin primers 1 and 4-9 are examples of haiφin primers where the stem extends to the end ofthe primer sequence. Haiφin primer 2 is an example of a haiφin primer where the stem does not extend to the end ofthe primer sequence. Hairpin primer 3 is an example of a hairpin primer where the stem extends into the sequence of the amplified fragment. Haiφin primer 9 is an example of a haiφin primer where the stem involves all ofthe primer sequence. Haiφin primers 1-8 are examples of haiφin primers where the stem does not involve all ofthe primer sequence. Haiφin primers 1-5 are examples of haiφin primers where the stem is 10 base pairs long. Haiφin primer 6 is an example of a haiφin primer where the stem is 12 base pairs long. Haiφin primer 7 is an example of a haiφin primer where the stem is 8 base pairs long. Haiφin primer 8 is an example of a haiφin primer where the stem is 3 base pairs long. Haiφin primer 9 is an example of a haiφin primer where the stem is 16 base pairs long.
Amplification using haiφin primers results in amplified nucleic acid fragments having haiφin primer sequences at one or both ends ofthe fragments.
These haiφin primer sequences in amplified fragments are referred to as haiφin ligators. The haiφin ligators can form haiφin structures. A haiφin structure with a 3 ' end is refeπed to as a 3 ' haiφin structure and a haiφin structure with a 5' end is refeπed to as a 5' haiφin structure (haiφin ligators containing these structures are refeπed to as 3 ' haiφin ligators and 5 ' haiφin ligators, respectively).
The stem of a haiφin structure can have any length that allows formation of the haiφin structure and which is of sufficient stability to allow covalent coupling of a detector probe. Preferably, the stem of the haiφin structure of a haiφin ligator is from 3 to 16 base pairs long, and more preferably from 6 to 10 base pairs long.
Generally, the sequence ofthe stem portion of a haiφin primer should not include the recognition sequence of any nucleic acid cleaving reagent to be used in a subsequent step in the method. However, inclusion of restriction sites in haiφin primers is useful in some embodiments ofthe disclosed method. For example, hybridization ofthe fragments to detector probes can be aided by shortening the fragment length prior to hybridization. This can be accomplished, for example, by digesting the fragment with a restriction endonuclease or other nucleic acid cleaving reagent. Preferably, the recognition site for the nucleic acid cleaving reagent is included in the sequence ofthe haiφin primer. For this puφose, it is prefeπed that the nucleic acid cleaving reagent used has a cleavage site offset from the recognition site. An example of such a nucleic acid cleaving reagent is the restriction enzyme EcoP15I.
Haiφin primers can contain labile nucleotides, preferably in the loop, that allow the haiφin structure to be broken. For example, uracil rather than thymine can be used in haiφin primers (phosphoramidite chemicals available from Glenn Research). When used in conjunction with uracil-DNA glycosylase (UDG; available from New England Biolabs) can be used to introduce specific strand breaks.
It is prefeπed that haiφin primers not have additional sequences that are self-complementary, other than the self-complementary stem portion. It is considered that this condition is met if there are no complementary regions greater than six nucleotides long without a mismatch or gap. While the haiφin primers (and amplified nucleic acid fragments) can be detected using sequence-based detection systems, the haiφin primers (or amplified nucleic acid fragments) can also contain a label to facilitate detection. Numerous labels are known and can be used for this puφose. Haiφin primers can also contain or be associated with capture tags to facilitate immobilization or capture of fragments in which haiφin primers have been incoφorated. In general, the capture tag can be one member of a binding pair such as biotin and streptavidin. Capture tags are discussed more fully elsewhere herein. Haiφin primers can also contain or be associated with sorting tags to facilitate sorting or separation of fragments in which haiφin primers have been incoφorated. In general, the sorting tag can be a detectable label such as a fluorescent moiety or a manipulable moiety such as a magnetic bead. Sorting tags are discussed more fully elsewhere herein. Haiφin primers can also be immobilized on a substrate. Haiφin primers can also include a few phosphorothioate linkages or other non-hydrolyzable bonds at the 5' end to protect the strand ofthe amplified fragment containing the haiφin primer from exonuclease digestion. This allows one ofthe strands ofthe amplified fragments to be degraded. Haiφin primers can also include one or more photocleavable nucleotides to facilitate release of probe sequences and amplified fragments coupled to the probe. Photocleavable nucleotides and their use are described in WO 00/04036.
Haiφin primers need not be composed of naturally occuπing nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the primer have the general structure described herein and be capable ofthe interactions and reactions required in the disclosed method. Detector Probes
Detector probes are molecules, preferably ohgonucleotides, that can hybridize to nucleic acids in a sequence-specific manner. In the disclosed method, detector probes are used to capture nucleic acid fragments amplified using the disclosed haiφin primers based on complementary sequences present in the amplified nucleic acid fragments. Detector probes are preferably used in sets having a variety of probe sequences, preferably a set of probes having every possible combination (or hybridizable to every combination) of nucleotide sequence the length ofthe probe. Detector probes are preferably used in sets where each probe has the same length. Prefeπed lengths for the probe portion of detector probes are five, six, seven, and eight nucleotides. Detector probes preferably include a probe portion (for hybridization to sample fragments) and linker portions through which the probe portion is coupled to a substrate, capture tag, sorting tag, or label. These linker portions can have any suitable structure and will generally be chosen based on the method of immobilization or synthesis ofthe detector probes. The linker portion can be made up of or include nucleotides. The linker portions can have any suitable length and preferably are of sufficient length to allow the probe portion to hybridize effectively. For convenience and unless otherwise indicated, reference to the length of detector probes refers to the length ofthe probe portion ofthe probes. Immobilized detector probes are detector probes immobilized on a support. Detector probes can be, and preferably are, immobilized on a substrate.
Detector probes can also contain or be associated with capture tags to facilitate immobilization or capture ofthe probes and amplified fragments to which they have been coupled. Detector probes can also contain or be associated with sorting tags to facilitate sorting or separation ofthe probes and amplified fragments to which they have been coupled. Detector probes can also contain or be associated with labels to facilitate detection ofthe probes and amplified fragments to which they have been coupled.
Detector probes can also include one or more photocleavable nucleotides to facilitate release of probe sequences and amplified fragments coupled to the probe. Photocleavable nucleotides and their use are described in WO 00/04036. Detector probes need not be composed of naturally occurring nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the probe have the general structure described herein and be capable ofthe interactions and reactions required in the disclosed method.
Probe Arrays
Different detector probes can be used together as a set. The set can be used as a mixture of all or subsets ofthe probes, probes used separately in separate reactions, or immobilized in an array. Probes used separately or as mixtures can be physically separable through, for example, the use of capture tags, sorting tags, or immobilization on beads. A probe aπay (also refeπed to herein as an aπay) includes a plurality of probes immobilized at identified or predetermined locations on the aπay. In this context, a plurality of probes refers to a multiple probes each having a different sequence. Each predetermined location on the aπay has one type of probe (that is, all the probes at that location have the same sequence). Each location will have multiple copies ofthe probe. The spatial separation of probes of different sequence in the aπay allows separate detection and identification of amplified fragments that become coupled to the probes via hybridization of the probes to nucleic acid fragments in a nucleic acid sample. If an amplified fragment is detected at a given location in a probe aπay, it indicates that the sequence adjacent to the site in the nucleic acid fragment where the fragment hybridized is complementary to the probe immobilized at that location in the aπay.
Adaptor-indexers can also be immobilized in aπays. Different modes of the disclosed method can be performed with different components immobilized, labeled, or tagged. Arrays of adaptor-indexers can be made and used as described below and elsewhere herein for the detector probes. Preferably, the detector probes in a probe aπay will all be ofthe same polarity. That is, each probe will have a free 5' end or each probe will have a free 3' end. The polarity of a probe determines to which form of haiφin structure the probe can be coupled. A probe aπay with probes having 5' ends is refeπed to as a 5' probe aπay. A probe aπay with probes having 3' ends is refeπed to as a 3' probe aπay. A probe aπay can also have probes of both polarities. If so, it is prefeπed that probes of different polarities be immobilized at identified or predetermined locations on the probe aπay.
Solid-state substrates for use in probe aπay can include any solid material to which ohgonucleotides can be coupled, directly or indirectly. This includes materials such as acrylamide, cellulose, nitrocellulose, glass, silicon, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, glass, polysilicates, polycarbonates, teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polylactic acid, polyorthoesters, polypropylfumerate, collagen, glycosaminoglycans, and polyamino acids. Solid-state substrates can have any useful form including thin films or membranes, beads, bottles, dishes, fibers, woven fibers, shaped polymers, particles and microparticles. A prefeπed form for a solid-state substrate is a microtiter dish. The most prefeπed form of microtiter dish is the standard 96-well type.
Methods for immobilization of ohgonucleotides to solid-state substrates are well established. Detector probes can be coupled to substrates using established coupling methods. For example, suitable attachment methods are described by Pease et al., Proc. Natl. Acad. Sci. USA 91(11):5022-5026 (1994), and Khrapko et al, Mol Biol (Mosk) (USSR) 25:718-730 (1991). A method for immobilization of 3'-amine ohgonucleotides on casein-coated slides is described by Stimpson et al, Proc. Natl Acad. Sci. USA 92:6379-6383 (1995). A prefeπed method of attaching ohgonucleotides to solid-state substrates is described by Guo et al, Nucleic Acids Res. 22:5456-5465 (1994).
Methods for producing aπays of ohgonucleotides on solid-state substrates are also known. Examples of such techniques are described in U.S. Patent No. 5,871,928 to Fodor et al, U.S. Patent No. 5,654,413 to Brenner, U.S. Patent No. 5,429,807, and U.S. Patent No. 5,599,695 to Pease et al. Although prefeπed, it is not required that a given probe aπay be a single unit or structure. The set of probes may be distributed over any number of solid supports. For example, at one extreme, each probe may be immobilized in a separate reaction tube or container.
The probes in aπays can also be designed to have similar hybrid stability. This would make hybridization of fragments to detector probes more efficient and reduce the incidence of mismatch hybridization. The hybrid stability of probes can be calculated using known formulas and principles of thermodynamics (see, for example, Santa Lucia et al., Biochemistry 35:3555- 3562 (1996); Freier et al., Proc. Natl. Acad. Sci. USA 83:9373-9377 (1986); Breslauer et al., Proc. Natl. Acad. Sci. USA 83:3746-3750 (1986)). The hybrid stability ofthe probes can be made more similar (a process that can be refeπed to as smoothing the hybrid stabilities) by, for example, chemically modifying the probes (Nguyen et al., Nucleic Acids Res. 25(15):3059-3065 (1997); Hohsisel, Nucleic Acids Res. 24(3):430-432 (1996)). Hybrid stability can also be smoothed by carrying out the hybridization under specialized conditions (Nguyen et al., Nucleic Acids Res. 27(6): 1492-1498 (1999); Wood et al., Proc. Natl. Acad. Sci. USA 82(6):1585-1588 (1985)). Another means of smoothing hybrid stability of the probes is to vary the length ofthe probes. This would allow adjustment ofthe hybrid stability of each probe so that all ofthe probes had similar hybrid stabilities (to the extent possible). Since the addition or deletion of a single nucleotide from a probe will change the hybrid stability ofthe probe by a fixed increment, it is understood that the hybrid stabilities ofthe probes in a probe aπay will not be equal. For this reason, similarity of hybrid stability as used herein refers to any increase in the similarity ofthe hybrid stabilities ofthe probes (or, put another way, any reduction in the differences in hybrid stabilities ofthe probes). This is useful since any such increased similarity in hybrid stability can improve the efficiency and fidelity of hybridization and coupling of the detector probes.
The efficiency of hybridization and coupling of detector probes to sample fragments can also be improved by grouping detector probes of similar hybrid stability in sections or segments of a probe aπay that can be subjected to different hybridization conditions. In this way, the hybridization conditions can be optimized for particular classes of probes. Second Primers
A second primer is a nucleic acid molecule that contains a primer sequence. The primer sequence of a second primer is at the 3' end. A second primer differs from a haiφin primer in that a second primer is not designed to form a haiφin structure. Second primers are used to amplify the opposite strand of nucleic acid fragments when the amplification technique requires a second primer (and when a second haiφin primer is not used to amplify the opposite strand). Where fragments containing second adaptors are amplified, it is prefeπed that the primer sequence ofthe second primers (or the second haiφin primers, if used) be complementary to sequences in the second adaptor.
Second primers can also contain detector sequences 5' ofthe primer sequences. Such detector sequences can be used to facilitate detection of nucleic acid fragments amplified in the disclosed method. Detector sequences can have any arbitrary sequence, preferably sequences that do not interfere with operation ofthe method. For example, it is prefeπed that detector sequences be chosen that are not significantly complementary to sequences in the second primer or sequences in haiφin primers or other second primers. Detector sequences are preferably the same. Also prefeπed are sets of second primers where the detector sequences within a set are the same but which differ between sets.
Second primers can also contain or be associated with capture tags to facilitate immobilization or capture of fragments in which second primers have been incoφorated. Capture tags are discussed more fully elsewhere herein. Second primers can also contain or be associated with sorting tags to facilitate sorting or separation of fragments in which second primers have been incoφorated. Sorting tags are discussed more fully elsewhere herein. Second primers can also contain or be associated with labels to facilitate detection of fragments in which second primers have been incoφorated. Second primers can also be immobilized on a substrate.
Second primers can also include one or more photocleavable nucleotides to facilitate release of second primer sequences for detection. Photocleavable nucleotides and their use are described in WO 00/04036.
Second primers need not be composed of naturally occurring nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the second primer have the general structure described herein and be capable of the interactions and reactions required in the disclosed method. Labels To aid in detection and quantitation of fragments coupled to detector probes, labels can be incoφorated into, coupled to, or associated with haiφin primers, second primers, detector probes, and/or the fragments. A label is any molecule that can be associated with nucleic acid fragments, directly or indirectly, and which results in a measurable, detectable signal, either directly or indirectly. A label is associated with a component when it is coupled or bound, either covalently or non-covalently, to the component. A label is coupled to a component when it is covalently coupled to the component. Many suitable labels for incoφoration into, coupling to, or association with nucleic acid are known. Examples of labels suitable for use in the disclosed method are radioactive isotopes, fluorescent molecules, phosphorescent molecules, bio luminescent molecules, enzymes, antibodies, and ligands.
Examples of suitable fluorescent labels include fluorescein (FITC), 5,6- carboxymethyl fluorescein, Texas red, nitrobenz-2-oxa-l,3-diazol-4-yl (NBD), coumarin, dansyl chloride, rhodamine, 4'-6-diamidino-2-phenylinodole (DAPI), and the cyanine dyes Cy3, Cy3.5, Cy5, Cy5.5 and Cy7. Prefeπed fluorescent labels are fluorescein (5-carboxyfluorescein-N-hydroxysuccinimide ester) and rhodamine (5,6-tetramethyl rhodamine). Prefeπed fluorescent labels for simultaneous detection are FITC and the cyanine dyes Cy3, Cy3.5, Cy5, Cy5.5 and Cy7. The absoφtion and emission maxima, respectively, for these fluors are: FITC (490 nm; 520 nm), Cy3 (554 ran; 568 nm), Cy3.5 (581 nm; 588 nm), Cy5 (652 nm: 672 nm), Cy5.5 (682 nm; 703 nm) and Cy7 (755 nm; 778 nm), thus allowing their simultaneous detection. The fluorescent labels can be obtained from a variety of commercial sources, including Molecular Probes, Eugene, OR and Research Organics, Cleveland, Ohio.
Labeled nucleotides are prefeπed form of label since they can be directly incoφorated into nucleic acids during synthesis. Examples of labels that can be incoφorated into DNA or RNA include nucleotide analogs such as BrdUrd (Hoy and Schimke, Mutation Research 290:217-230 (1993)), BrUTP (Wansick et al, J. Cell Biology 122:283-293 (1993)) and nucleotides modified with biotin (Langer et al, Proc. Natl. Acad. Sci. USA 78:6633 (1981)) or with suitable haptens such as digoxygenin (Kerkhof, Anal. Biochem. 205:359-364 (1992)). Suitable fluorescence-labeled nucleotides are Fluorescein-isothiocyanate-dUTP, Cyanine-3-dUTP and Cyanine-5-dUTP (Yu et al, Nucleic Acids Res., 22:3226- 3232 (1994)). A prefeπed nucleotide analog detection label for DNA is BrdUrd (BUDR triphosphate, Sigma), and a prefeπed nucleotide analog detection label for RNA is Biotin- 16-uridine-5'-triphosphate (Biotin-16-dUTP, Boehringher Mannheim). Fluorescein, Cy3, and Cy5 can be linked to dUTP for direct labeling. Cy3.5 and Cy7 are available as avidin or anti-digoxygenin conjugates for secondary detection of biotin- or digoxygenin-labeled probes.
Labels that are incoφorated into nucleic acid, such as biotin, can be subsequently detected using sensitive methods well-known in the art. For example, biotin can be detected using streptavidin-alkaline phosphatase conjugate (Tropix, Inc.), which is bound to the biotin and subsequently detected by chemiluminescence of suitable substrates (for example, chemiluminescent substrate CSPD: disodium, 3-(4-methoxyspiro-[l,2,-dioxetane-3-2'-(5'- chloro)tricyclo [3.3.1.1 ' ]decane]-4-yl) phenyl phosphate; Tropix, Inc.).
Other labels include molecular or metal barcodes, mass labels, and labels detectable by nuclear magnetic resonance, electron paramagnetic resonance, surface enhanced raman scattering, surface plasmon resonance, fluorescence, phosphorescence, chemiluminescence, resonance raman, microwave, or a combination. Mass labels are compounds or moieties that have, or which give the labeled component, a distinctive mass signature in mass spectroscopy. Mass labels are useful when mass spectroscopy is used for detection. Prefeπed mass labels are peptide nucleic acids and carbohydrates. Combinations of labels can also be useful. For example, color-encoded microbeads having, for example, 265 unique combinations of labels, are useful for distinguishing numerous components. For example, 256 different detector probes can be uniquely labeled and detected allowing mutiplexing and automation ofthe disclosed method. Useful labels are described in de Haas et al., "Platinum poφhyrins as phosphorescent label for time-resolved microscopy," J Histochem. Cytochem. 45(9): 1279-92 (1997); Karger and Gesteland, "Digital chemiluminescence imaging of DNA sequencing blots using a charge-coupled device camera," Nucleic Acids Res. 20(24):6657-65 (1992); Keyes et al., "Overall and internal dynamics of DNA as monitored by five-atom-tethered spin labels," Biophys. J. 72(l):282-90 (1997); Kirschstein et al., "Detection ofthe DeltaF508 mutation in the CFTR gene by means of time- resolved fluorescence methods,"
Bioelectrochem. Bioenerg. 48(2):415-21 (1999); Kricka, "Selected strategies for improving sensitivity and reliability of immunoassays," Clin. Chem. 40(3):347- 57 (1994); Kricka, "Chemiluminescent and bioluminescent techniques," Clin. Chem. 37(9):1472-81 (1991); Kumke et al., "Temperature and quenching studies of fluorescence polarization detection of DNA hybridization," Anal. Chem. 69(3):500-6 (1997); McCreery, "Digoxigenin labeling," Mol Biotechnol 7(2):121-4 (1997); Mansfield, et al., "Nucleic acid detection using non- radioactive labeling methods," Mol. Cell Probes 9(3):145-56 (1995); Nurmi, et al., "A new label technology for the detection of specific polymerase chain reaction products in a closed tube," Nucleic Acids Res. 28(8):28 (2000); Oetting et al. "Multiplexed short tandem repeat polymoφhisms ofthe Weber 8 A set of markers using tailed primers and infrared fluorescence detection," Electrophoresis 19(18):3079-83(1998); Roda et al., "Chemiluminescent imaging of enzyme-labeled probes using an optical microscope- videocamera luminograph," Anal. Biochem. 257(l):53-62 (1998); Siddiqi et al., "Evaluation of electrochemiluminescence- and bioluminescence-based assays for quantitating specific DNA," J Clin. Lab. Anal 10(6):423-31 (1996); Stevenson et al., "Synchronous luminescence: a new detection technique for multiple fluorescent probes used for DNA sequencing," Biotechniques 16(6): 1104-11 (1994); Vo-Dinh et al, "Surface-enhanced Raman gene probes," Anal. Chem. 66(20):3379-83 (1994); Volkers et al., "Microwave label detection technique for DNA in situ hybridization," Eur. J. Morphol 29(l):59-62 (1991). Metal barcodes, a form of molecular barcode, are 30-300 nm diameter by
400-4000 nm multilayer multi metal rods. These rods are constructed by electrodeposition into an alumina mold, then the alumina is removed leaving these small multilayer objects behind. The system can have up to 12 zones encoded, in up to 7 different metals, where the metals have different reflectivity and thus appear lighter or darker in an optical microscope depending on the metal; this leads to practically unlimited identification codes. The metal bars can be coated with glass or other material, and probes attached to the glass using methods commonly known in the art; assay readout is by fluorescence from the target, and the identity ofthe probe is from the light dark pattern ofthe barcode. Methods for detecting and measuring signals generated by labels are known. For example, radioactive isotopes can be detected by scintillation counting or direct visualization; fluorescent molecules can be detected with fluorescent spectrophotometers; phosphorescent molecules can be detected with a spectrophotometer or directly visualized with a camera; enzymes can be detected by detection or visualization ofthe product of a reaction catalyzed by the enzyme; antibodies can be detected by detecting a secondary detection label coupled to the antibody. Such methods can be used directly in the disclosed method of amplification and detection. As used herein, detection molecules are molecules which interact with amplified nucleic acid and to which one or more detection labels are coupled. In another form of detection, labels can be distinguished temporally via different fluorescent, phosphorescent, or chemiluminescent emission lifetimes. Multiplexed time-dependent detection is described in Squire et al., J. Microscopy 197(2):136-149 (2000), and WO 00/08443.
Quantitative measurement ofthe amount or intensity of a label can be used. For example, quantitation can be used to determine if a given label, and thus the labeled component, is present at a threshold level or amount. A threshold level or amount is any desired level or amount of signal and can be chosen to suit the needs ofthe particular form ofthe method being performed. Nucleic Acid Cleaving Reagents
Some forms ofthe disclosed method make use of nucleic acid cleaving reagents. Nucleic acid cleaving reagents are compounds, complexes, and enzymes that cause, mediate, or catalyze cleavage on nucleic acid molecules. Prefeπed nucleic acid cleaving reagents are those that cleave nucleic acid molecules in a sequence-specific manner. Restriction enzymes (also refeπed to as restriction endonucleases) are the prefeπed form of nucleic acid cleaving reagents. Other nucleic acid cleaving reagents include the universal restriction endonucleases of Szybalski (Szybalski, Gene 40(2-3): 169-73 (1985); Podhajska and Szybalski, Gene 40(2-3): 175-82 (1985)[published eπatum appears in Gene 43(3):325 (1985)]), the advanced DNA cleavage systems developed by Breaker et al. (Carmi et al., Proc Natl Acad Sci U S A 95(5):2233-2237 (1998)), and the use of zinc fingers to direct site recognition of restriction enzymes such as the hybrid restriction enzymes described by Kim et al., Proc. Natl. Acad. Sci. USA 93(3): 1156-1160 (1996), and Smith et al., Nucleic Acids Res. 27(2):674-681 (1999).
Many nucleic acid cleaving reagents are known and can be used with the disclosed method. Relevant to the disclosed method, nucleic acid cleaving reagents generally have a recognition sequence and a cleavage site. Many nucleic acid cleaving reagents, especially restriction enzymes, also generate sticky ends at the cleavage site. A recognition sequence is the nucleotide sequence which, if present in a nucleic acid molecule, will direct cleavage ofthe nucleic acid molecule by a cognate nucleic acid cleaving reagent. The cleavage site of a nucleic acid cleaving reagent is the site, usually in relation to the recognition sequence, where the nucleic acid cleaving reagent cleaves a nucleic acid molecule. Sticky ends (also refeπed to as cohesive ends, protruding ends, and 5' or 3' overhangs) are single-stranded nucleic acid segments at the end of a double-stranded nucleic acid segment.
For specific embodiments ofthe method, the nucleic acid cleaving reagents used will have certain properties and/or certain relationships to other restriction enzymes used in the method. For example, in some prefeπed embodiments ofthe disclosed method, nucleic acid cleaving reagents that generates sticky ends having a plurality of different sequences are prefeπed, with nucleic acid cleaving reagents having a cleavage site offset from the recognition sequence being most prefeπed. Other embodiments ofthe disclosed method require the use of different nucleic acid cleaving reagents that have different recognition sequences and/or generate different sticky ends than other nucleic acid cleaving reagents used on the same index sample at other stages in the method. For example, where three digests (that is, cleavage reactions) are used in the method, it is prefeπed that the nucleic acid cleaving reagents used in each ofthe digests have a recognition sequence different from that ofthe nucleic acid cleaving reagents used in the other digests. In such cases, the known properties of nucleic acid cleaving reagents can be used to select or design appropriate nucleic acid cleaving reagents.
Where a nucleic acid cleaving reagent cleaves DNA at a site different or offset from the recognition sequence, a variety of sticky ends having different sequences can be generated. This is because recognition sequences in nucleic acids can occur next to any sequence and therefore the site of cleavage can have any sequence. For example, Fokl cleaves 9 (upper strand) and 13 (lower strand) nucleotides downstream from the recognition site of GGATG. The four base sticky end will have whatever sequence happens to be 10 to 13 nucleotides away from the recognition site. Given enough cleavage sites, a total of 256 different sticky end sequences (that is every possible four base sequence) can result from a Fokl digestion. As a result, restriction enzymes such as Type IIS restriction enzymes can be said to generate sticky ends having a plurality of different sequences.
As used herein, unless otherwise indicated, the terms digest, digestion, digested, and digesting refer generally to a cleavage reaction or the act of cleaving and is not intended to be limited to cleavage by a protein enzyme or by any particular mechanism. Similarly, the term restricted is intended to refer to any nucleic acid cleavage, not just cleavage by a restriction enzyme. In the context of nucleic acid cleaving reagents, sequence-specific requires only some sequence specificity, not absolute sequence specificity. That is, nucleic acid cleaving reagents having a completely or partially defined recognition sequence are prefeπed. Thus, nucleic acid cleaving reagents having some degeneracy in their recognition sequence are still considered sequence-specific.
A second nucleic acid cleaving reagent is a nucleic acid cleaving reagent used to digest a secondary index sample. A third nucleic acid cleaving reagent is an nucleic acid cleaving reagent used to digest a restricted index sample or a restricted secondary index sample. Second and third nucleic acid cleaving reagents are preferably Type II restriction endonucleases that cleave in the recognition sequence. A second restriction enzyme is a restriction enzyme used to digest a secondary index sample. A third restriction enzyme is an enzyme used to digest a restricted index sample or a restricted secondary index sample. Second and third restriction enzymes are preferably Type II restriction endonucleases that cleave in the recognition sequence.
In addition to the use of restriction enzymes in a standard mode, one can make use ofthe Type IIS enzymes as universal restriction endonuclease as described by Szybalski (Szybalski, Gene 40(2-3):169-73 (1985); Podhajska and Szybalski, Gene 40(2-3): 175-82 (1985)[published erratum appears in Gene 43(3):325 (1985)]). In the Szybalski technique single stranded or double stranded DNA can be cleaved at any arbitrary (but specific) site utilizing the structure described in combination with a Type IIS enzyme. More advanced DNA cleavage systems have been evolved by Breaker et al. (Carmi et al., Proc
Natl Acad Sci U S A 95(5):2233-2237 (1998)). In these systems Breaker has shown that DNA recognize a particular sequence in a target DNA and can cleave the target DNA, single stranded or double stranded targets. With Breaker's system for evolution of DNA for a particular action, it is clear that given reasonable time and effort a suitable DNA for a recognition and particular cleavage result is practical. Adaptor-indexers Adaptor-indexers are double-stranded nucleic acids containing a single- stranded portion and a double-stranded portion. The single-stranded portion is at one end ofthe adaptor-indexer and constitutes a sticky end. The sticky end is refeπed to as the sticky end portion ofthe adaptor-indexer. It is preferable that the protruding single strand (sticky end) have two, three, four, or five nucleotides. The double-stranded portion of adaptor-indexers may have any convenient sequence or length. In general, the sequence and length ofthe double-stranded portion is selected to be adapted to subsequent steps in the method. For example, sequences in the adaptor-indexer may be used for primer or probe hybridization. A main puφose of adaptor-indexers is to provide sequence for hybridization by a haiφin primer for amplification. Thus, prefeπed sequence composition and length for the double-stranded portion of adaptor-indexers will generally be those that are useful for haiφin primer hybridization. Adaptor-indexers can also include a detector portion which is designed to facilitate detection ofthe adaptor-indexer. The detection portion can be, for example, a sequence that is a hybridization target or it can be a label or tag.
Generally, the sequence ofthe double-stranded portion of an adaptor- indexer should not include the recognition sequence of any restriction enzyme to be used in a subsequent step in the method. It is prefeπed that adaptor-indexers not have any sequences that are self-complementary. It is considered that this condition is met if there are no complementary regions greater than six nucleotides long without a mismatch or gap.
A set of adaptor-indexers for use in the disclosed method should include different adaptor-indexers where the single-stranded portion each have a different nucleotide sequence selected from combinations and permutations of the nucleotides A, C, G, and T. Where multiple nucleic acid cleaving reagents are used in the first digest, the single-stranded portion of each adaptor-indexer can have a different nucleotide sequence compatible with a sticky end sequence generated by one ofthe nucleic acid cleaving reagents. While the sticky ends of adaptor-indexers in one set have different sequences, it is prefeπed that they be ofthe same length to facilitate use ofthe set to index fragments produced by cleavage by one nucleic acid cleaving reagent. It is preferable that the members of a set of adaptor-indexers contain a double-stranded portion which is identical for each member ofthe set.
A prefeπed set of indexing linker strands comprising: (a) at least two single-stranded first ohgonucleotides each having a common identical sequence, and a unique sequence of a length selected from 2, 3, 4 and 5 nucleotides selected from permutations and combinations of A, G, C and T nucleotides, at one end selected from a 3' end and a 5' end; and (b) a single stranded second ohgonucleotide whose sequence is complementary to the common sequence of the first ohgonucleotides such that, when hybridized with any one ofthe first ohgonucleotides, a double-stranded adaptor-indexer would result which includes an end having a sticky end with a unique sequence.
Adaptor-indexers can also contain or be associated with capture tags to facilitate immobilization or capture of fragments to which adaptor-indexers have been coupled. In general, the capture tag can be one member of a binding pair such as biotin and streptavidin. Capture tags are discussed more fully elsewhere herein. Adaptor-indexers can also contain or be associated with sorting tags to facilitate sorting or separation of fragments to which adaptor-indexers have been coupled. In general, the sorting tag can be a detectable label such as a fluorescent moiety or a manipulable moiety such as a magnetic bead. Sorting tags are discussed more fully elsewhere herein. Adaptor-indexers can also contain or be associated with labels to facilitate detection of fragments to which adaptor-indexers have been coupled. Adaptor-indexers can also be immobilized on a substrate.
Adaptor-indexers can also include a protruding end at the end opposite the sticky end. Such an end can be used as, for example, a hybridization target for a label to be associated with the adaptor-indexer (and thus can be considered the detection portion ofthe adaptor-indexer). Adaptor-indexers can also include one or more photocleavable nucleotides to facilitate release of adaptor-indexer sequences for detection. Photocleavable nucleotides and their use are described in WO 00/04036.
Adaptor-indexers need not be composed of naturally occurring nucleotides. Modified nucleotides, unnatural bases and nucleotide and ohgonucleotide analogs can be used. All that is required is that the adaptor- indexer have the general structure described herein and be capable of the interactions and reactions required in the disclosed method. Second Adaptors
Second adaptors are double-stranded nucleic acids containing a single- stranded portion and a double-stranded portion. The single-stranded portion is at one end ofthe second adaptor and constitutes a sticky end. It is preferable that the protruding single strand (sticky end) have two, three, four, or five nucleotides. The double-stranded portion of second adaptor may have any convenient sequence or length. In general, the sequence and length ofthe double-stranded portion is selected to be adapted to subsequent steps in the method. For example, the second adaptors can provide sequence for primer hybridization of a second primer or second haiφin primer. Thus, preferred sequence composition and length for the double-stranded portion of second adaptors will generally be those that are useful for primer hybridization. Generally, the sequence ofthe double-stranded portion of a second adaptor should not include the recognition sequence of any nucleic acid cleaving reagent to be used in a subsequent step in the method. It is prefeπed that second adaptors not have any sequences that are self-complementary. It is considered that this condition is met if there are no complementary regions greater than six nucleotides long without a mismatch or gap.
A set of second adaptors for use in the disclosed method can include different second adaptors where the single-stranded portion each have a different nucleotide sequence compatible with a sticky end sequence generated by one of the second restriction enzymes. It is preferable that the members of a set of second adaptors contain a double-stranded portion which is identical for each member ofthe set.
Second adaptors can also contain or be associated with capture tags to facilitate immobilization or capture of fragments to which second adaptors have been coupled. Second adaptors can also contain or be associated with sorting tags to facilitate sorting or separation of fragments to which second adaptors have been coupled. Second adaptors can also contain or be associated with labels to facilitate detection of fragments to which second adaptors have been coupled. Second adaptors can also be immobilized on a substrate. Capture Tags
A capture tag is any compound that can be used to separate compounds or complexes having the capture tag from those that do not. Preferably, a capture tag is a compound, such as a ligand or hapten, that binds to or interacts with another compound, such as ligand-binding molecule or an antibody. It is also prefeπed that such interaction between the capture tag and the capturing component be a specific interaction, such as between a hapten and an antibody or a ligand and a ligand-binding molecule.
Prefeπed capture tags, described in the context of nucleic acid probes, are described by Syvnen et α/., Nucleic Acids Res., 14:5037 (1986). Prefeπed capture tags include biotin, which can be incoφorated into nucleic acids. In the disclosed method, capture tags incoφorated into adaptor-indexers or second adaptors can allow sample fragments (to which the adaptors have been coupled) to be captured by, adhered to, or coupled to a substrate. Similarly, capture tags incoφorated into haiφin primers or second primers can allow sample fragments (into which the primers have been incoφorated) to be captured, adhered to, or coupled to a substrate. Such capture allows simplified washing and handling of the fragments, and allows automation of all or part ofthe method.
Capturing sample fragments on a substrate may be accomplished in several ways. In one embodiment, capture docks are adhered or coupled to the substrate. Capture docks are compounds or moieties that mediate adherence of a sample fragment by binding to, or interacting with, a capture tag on the fragment. Capture docks immobilized on a substrate allow capture ofthe fragment on the substrate. Such capture provides a convenient means of washing away reaction components that might interfere with subsequent steps.
Substrates for use in the disclosed method can include any solid material to which components ofthe assay can be adhered or coupled. Examples of substrates include, but are not limited to, materials such as acrylamide, cellulose, nitrocellulose, glass, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, polysilicates, polycarbonates, teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polylactic acid, polyorthoesters, polypropylfumerate, collagen, glycosaminoglycans, and polyamino acids. Substrates can have any useful form including thin films or membranes, beads, bottles, dishes, fibers, woven fibers, shaped polymers, particles and microparticles. Prefeπed forms of substrates are plates and beads. The most prefeπed form of beads are magnetic beads. In one embodiment, the capture dock is an ohgonucleotide. Methods for immobilizing and coupling ohgonucleotides to substrates are well established. For example, suitable attachment methods are described by Pease et al, Proc. Natl. Acad. Sci. USA 91(l l):5022-5026 (1994), and Khrapko et al, MolBiol (Mosk) (USSR) 25:718-730 (1991). A method for immobilization of 3'-amine ohgonucleotides on casein-coated slides is described by Stimpson et al, Proc. Natl. Acad. Sci. USA 92:6379-6383 (1995). A prefeπed method of attaching ohgonucleotides to solid-state substrates is described by Guo et al, Nucleic Acids Res. 22:5456-5465 (1994).
In another embodiment, the capture dock is an anti-hybrid antibody. Methods for immobilizing antibodies to substrates are well established.
Immobilization can be accomplished by attachment, for example, to aminated surfaces, carboxylated surfaces or hydroxylated surfaces using standard immobilization chemistries. Examples of attachment agents are cyanogen bromide, succinimide, aldehydes, tosyl chloride, avidin-biotin, photocrosslinkable agents, epoxides and maleimides. A prefeπed attachment agent is glutaraldehyde. These and other attachment agents, as well as methods for their use in attachment, are described in Protein immobilization: fundamentals and applications, Richard F. Taylor, ed. (M. Dekker, New York, 1991), Johnstone and Thoφe, Immunochemistry In Practice (Blackwell Scientific Publications, Oxford, England, 1987) pages 209-216 and 241-242, and
Immobilized Affinity Ligands, Craig T. Hermanson et al, eds. (Academic Press,
New York, 1992). Antibodies can be attached to a substrate by chemically cross-linking a free amino group on the antibody to reactive side groups present within the substrate. For example, antibodies may be chemically cross-linked to a substrate that contains free amirio or carboxyl groups using glutaraldehyde or carbodiimides as cross-linker agents. In this method, aqueous solutions containing free antibodies are incubated with the solid-state substrate in the presence of glutaraldehyde or carbodiimide. For crosslinking with glutaraldehyde the reactants can be incubated with 2% glutaraldehyde by volume in a buffered solution such as 0.1 M sodium cacodylate at pH 7.4. Other standard immobilization chemistries are known by those of skill in the art. Sorting Tags A sorting tag is any compound that can be used to sort or separate compounds or complexes having the sorting tag from those that do not. In general, all capture tags can be a sorting tag. Sorting tags also include compounds and moieties that can be detected and which can mediate the sorting of tagged components. Such forms of sorting tags are generally not also capture tags. For example, a fluorescent moiety can allow sorting of components tagged with the moiety from those that are not (or those with a different tag). However, such a fluorescent moiety does not necessarily have a suitable capture dock with which it can interact and be captured. Preferably, a sorting tag is a label, such as a fluorescent label, that can mediate sorting. Method
The disclosed method involves the following basic steps. A nucleic acid sample is subjected to amplification using primers where at least one ofthe primers is a haiφin primer. Nucleic acids in the sample are amplified to result in amplified nucleic acid fragment having haiφin primer sequences at one or both ends. These haiφin primer sequences in amplified fragments are refeπed to as haiφin ligators. The amplified fragments are treated to allow the haiφin ligators to form stem- loop or haiφin structures at the end of the amplified fragments. The amplified fragments are then contacted with a plurality of detector probes and the amplified fragments are covalently coupled to probes via the haiφin ligator. Coupled fragments can then be detected. Since the sequence ofthe amplified fragment adjacent to the haiφin structure ofthe haiφin ligator determines the sequence ofthe detector probe to which the haiφin ligator is coupled, this adjacent sequence in the amplified fragment is identified by noting to which probe a given fragment is coupled. This identification is preferably accomplished by having probes of known sequence immobilized at known locations in the probe array.
In one embodiment ofthe disclosed method, a catalog of nucleic acid sequences in a nucleic acid sample can be created by using multiple haiφin primers, each with a different primer sequence, to amplify the nucleic acid sample. Multiple different nucleic acid fragments will be amplified with different sequences adjacent to the hai in structure ofthe haiφin ligator. The pattern of fragments on the probe array provides a catalog ofthe fragments that can then be compared with other nucleic acid samples.
Where multiple haiφin primers are used, the nucleic acid sample is preferably divided into aliquots (referred to as index samples) before amplification. Each index sample is then mixed with a different haiφin primer, each of which has a primer sequence. The haiφin primers then mediate amplification of different nucleic acid sequences (based on the sequence ofthe primer sequence).
Each index sample can be amplified with one or more second primers (in conjunction with a haiφin primer). The haiφin primer amplifies one strand and the second primer amplifies the opposite strand. All index samples are preferably amplified with the same second primer(s). Alternatively, the index samples can be further divided into secondary index samples with each amplified with a different second primer or set of second primers. Amplified fragments in each index sample (or secondary index sample) would then have primer sequences at each end. The sequences of these primers can be used as primer binding sites for further amplification ofthe fragments, preferably once the fragments are coupled to detector probes.
Different strands ofthe amplified fragments can subjected to covalent coupling on a probe array. Since one ofthe strands will produce a haiφin structure with a 3' end and the other strand will produce a haiφin structure with a 5' end (see Figures 4A-4B), differential coupling ofthe strands can be accomplished by the simple expedient of using a probe array with detector probes all ofthe same polarity — that is, detector probes all with 5' ends (in a 5' probe aπay) or detector probes all with 3' ends (in a 3' probe aπay). Only the fragment strand with compatible polarity can be coupled to the detector probe. A haiφin structure with a 3' end is refeπed to as a 3' haiφin structure and a haiφin structure with a 5' end is refeπed as a 5' haiφin structure (haiφin ligators containing these structures are referred to as 3' haiφin ligators and 5' haiφin ligators, respectively). Selective strand coupling can also be accomplished, for example, by digesting one ofthe strands with an exonuclease (detector probes ofthe coπect polarity must still be used). Such digestion is also preferred since it reduces the chance for interference by the opposite strand during coupling to the detector probes. Where a nucleic acid sample is amplified using multiple haiφin primers having different primer sequences, both ends ofthe amplified fragments will have haiφin ligators (see Figure 5, bottom). Thus, both strands will form both a 5' haiφin structure and a 3' haiφin structure and both stands can be coupled to detector probes. By subjecting both strands of such fragments to both a 5' probe array and a 3' probe aπay, both ends of both strands of each fragment can be detected and cataloged. This provides a maximum of information about the nucleic acid sample.
Each sample (or each index sample or derivative index sample) can be reacted with and coupled to an array of detector probes. Preferred arrays include every possible sequence of a given length (for example, every possible six base sequence), although arrays containing fewer combinations can also be used. Such arrays are refeπed to herein as probe arrays. The ends ofthe detector probes and the haiφin ligator are coupled together only if the detector probe hybridizes adjacent to the end ofthe haiφin ligator. Thus, a haiφin ligator is coupled to an detector probe on the aπay only when a sequence complementary to the detector probe is present immediately adjacent to the end ofthe stem sequence in an amplified fragment. Examples ofthe relationship and interaction of various components ofthe disclosed method are illustrated in Figures 2A-B and 3A-C. Each amplified fragment from the sample will result in a signal at a particular location in a particular array of detector probes. The probe aπay in which the signal for a given fragment is detected is determined by the primer sequence ofthe haiφin primer. Where multiple haiφin primers (having different primer sequences) are used, each different primer sequence is preferably processed in a separate index sample and a separate probe aπay is preferably used for each index sample or derivative index sample. The location in the probe array in which the signal for a given fragment is detected is determined by the sequence in the fragment immediately adjacent to the end of the stem sequence in the fragment since the detector probe must hybridize to this sequence in order to be coupled to the haiφin ligator ofthe fragment. A complex nucleic acid sample will produce a unique pattern of signals on the probe aπays. It is this pattern that allows unique cataloging of nucleic acid samples and sensitive and powerful comparisons ofthe patterns of signals produced from different nucleic acid samples.
The use of different sets of haiφin primers provides a means for generating different subsets of fragments from a complex sample. Such a defined subset of molecules may be further resolved by additional amplification and indexing, or by any ofthe established techniques such as cloning, PCR amplification, or gel electrophoresis. Individual members ofthe class may be distinguished by identifying characteristics such as length, sequence, or restriction endonuclease maps. The sequence ofthe primers sequences of the haiφin ligators provides a means of indexing a large number of nucleic acid fragments. Detector probes of different sequence can be immobilized at different locations on the probe aπay. In this way, the sequence ofthe detector probes on the probe aπay and the sequence of nucleic acid fragments in the index samples determine where on the probe aπay haiφin ligators (and thus, fragments) become coupled. The presence of haiφin ligators at different locations in the probe aπays thus forms a pattern of signals that provides a signature or fmgeφrint of a nucleic acid sample based on the presence or absence of specific nucleic acid sequences in the sample. For this reason, cataloging of this pattern of signals (that is, the pattern ofthe presence of haiφin ligators) is an embodiment ofthe disclosed method that is of particular interest. Catalogs can be made up of, or be refeπed to, as, for example, a pattern of haiφin ligators on probe aπays, a pattern ofthe presence of haiφin ligators on probe aπays, a catalog of nucleic acid fragments in a sample, or a catalog of nucleic acid sequences in a sample. The information in the catalog is preferably in the form of positional information (that is, location in the probe aπay) or, more preferably, in the form of sequences. Prefeπed sequence information for catalogs include sequences of probe aπay probes to which a haiφin ligator was coupled and sequences of nucleic acid fragments present in the sample (derived from the locations in the probe aπay where haiφin ligators were coupled). Such catalogs of nucleic acid samples can be compared to a similar catalog derived from any other sample to detect similarities and differences in the samples (which is indicative of similarities and differences in the nucleic acids in the samples). For example, a catalog of a first nucleic acid sample can be compared to a catalog of a sample from the same type of organism as the first nucleic acid sample, a sample from the same type of tissue as the first nucleic acid sample, a sample from the same organism as the first nucleic acid sample, a sample obtained from the same source but at a different time than the first nucleic acid sample, a sample from a different organism than the first nucleic acid sample, a sample from a different type of tissue than the first nucleic acid sample, or a sample from a different type of organism than the first nucleic acid sample.
The same type of tissue is tissue ofthe same type such as liver tissue, muscle tissue, or skin (which may be from the same or a different organism or type of organism). The same organism refers to the same individual, animal, or cell. For example, two samples taken from a patient are from the same organism. The same source is similar but broader, referring to samples from, for example, the same organism, the same tissue from the same organism, or the same cDNA, or the same cDNA library. Samples from the same source that are to be compared are preferably collected at different times (thus allowing for potential changes over time to be detected). This is especially useful when the effect of a treatment or change in condition is to be assessed. A different organism refers a different individual organism, such as a different patient, a different individual animal. Different organism includes a different organism of the same type or organisms of different types. A different type of organism refers to organisms of different types such as a dog and cat, a human and a mouse, or E. coli and Salmonella. A different type of tissue refers to tissues of different types such as liver and kidney, or skin and brain. Detecting the presence of haiφin ligators on a probe array can be accomplished by detection of labels incoφorated into, or coupled to, the haiφin ligators. Alternatively, the haiφin ligators can be detected based on detection of their sequence. Any ofthe numerous sequence-specific detection techniques can be used for this puφose, including, for example, hybridization of labeled probes. The loop sequence ofthe haiφin primer, for example, is a preferred site for binding of a detector tag by complementary hybridization. In this embodiment, the loop portion ofthe haiφin primer should be long enough to permit effective binding of a complementary nucleic acid. Design of hybridization probes and hybridization conditions are well known. Prefeπed probe lengths for this puφose are 12 to 20 bases. The nucleic acid tag may additionally bind to the bases in one side ofthe stem. The presence of haiφin ligators can also be detected by generating a signal mediated by the haiφin ligator, its associated fragment, or the second primer sequence at the other end ofthe fragment. Use ofthe second primer sequence as a primer for primer extension, described below, is a prefeπed example of this.
When coupling of a haiφin ligator to a detector probe involves the use of a strand having a 5' hairpin structure (top strand in Figures 4A-B), the coupling event links the strand to the detector probe via the 5' end ofthe haiφin ligator, which contains, for example, a 5 '-phosphate capable of participating in ligation. After coupling, there remains a free 3'-terminus at the other end, which may be used for a labeling reaction. Where the strand has a 3' haiφin structure at this other end (as in the bottom strand in Figure 5), the strand can be labeled by primer extension. Labeling is preferably performed using primer extension by the Klenow fragment of DNA polymerase I, in the presence of fluorescent dNTPs. The signal to be detected for the nucleic acid fragments can be increased by nucleic acid amplification. It is prefeπed either that the nucleic acid fragments (including haiφin ligators) that have been coupled to the detector probes be amplified or mediate amplification of another nucleic acid. The fragments can be amplified using any suitable method. Prefeπed amplification methods are those that work efficiently for the generation of surface-localizable signals. A prefeπed method is branch DNA amplification (Urdea,
Biotechnology 12:926 (1994); Horn et al., Nucleic Acids Res. 25(23):4835-4841 (1997). A second prefeπed method is rolling circle amplification (PCT application WO 97/19193; Lizardi et al, Nature Genetics 19(3):225-232 (1998)). Other methods include polymerase chain reaction (PCR), ligase chain reaction (LCR), self-sustained sequence replication (3SR), nucleic acid sequence based amplification (NASBA), strand displacement amplification (SDA), and amplification with Qβ replicase (Birkenmeyer and Mushahwar, J Virological Methods, 35:117-126 (1991); Landegren, Trends Genetics, 9:199-202 (1993)). Amplification primers can be based, for example, on the sequence ofthe haiφin primers and second primers. It is prefeπed that amplification primers be based on haiφin primer sequences that appear in the loop ofthe haiφin structure. In this way, all ofthe fragments can be amplified using the same primer if the haiφin primers are designed to have the same loop sequence. In this case, the primer sequences and stem sequences ofthe haiφin primers can be different as discussed elsewhere herein. Amplification ofthe fragment is facilitated by the presence of haiφin primer sequence at the end ofthe fragment (and by the presence of second primer sequence at the other end). For example, the primer sequences can be used for amplification primer sequences. The primer sequences can also be used to circularize the adaptor/fragments for subsequent amplification by rolling circle replication. Rolling circle amplification is described in U.S. Patent No. 5,854,033 and PCT application WO 97/19193.
In one embodiment, hybridization of amplified fragments to detector probes can be aided by shortening the fragment length prior to hybridization. This can be accomplished, for example, by digesting the fragment with a restriction endonuclease. Preferably, the recognition site for the restriction endonuclease is included in the sequence ofthe haiφin primer. For this puφose, it is prefeπed that the restriction enzyme used has a cleavage site offset from the recognition site. The following example illustrates use ofthe non- palindromic Type III enzyme EcoP15I (New England Biolabs) to shorten the length of amplified fragments prior to hybridization. EcoPl 51 recognizes and cleaves the following site (SEQ ID NO: 10):
5 ' - CAGCAGNNNNITONN NNNl rN NtJNNN^ - 3 '
Figure imgf000039_0001
where the carets (Λ)mark the cut sites in each strand. Amplification using a haiφin primer having the sequence (SEQ ID NO:l 1)
5 ' -TCTAGTCCAATCCAAGCTACATCAGCAGATGCGGACTAGA- 3 ' results in the following double stranded fragment (SEQ ID NO: 12; the recognition site is boldface, the stem sequences are underlined)
5' .. NNNNN TSTNNNNGACCTGTCTAGTCCGCATCTGCTGATGTAGCTTGGATTGGACTAGA-3 ' 3' .. MNNITONM NNNCTGGACAGATCAGGCGTAGACGACTACATCGAACCTAACCTGATCT-5 '
Digestion with EcoP15I will result in the cleaved fragment
5 ' -NNNNNNNMMGACCTGTCTAGTCCGCATCTGCTGATGTAGCTTGGATTGGACTAGA-3 ' 3'- NNNNNNNCTGGACAGATCAGGCGTAGACGACTACATCGAACCTAACCTGATCT-5 '
The bottom strand can then form the haiφin structure (nucleotides 1-53 of SEQ ID NO: 12)
AATCCAAGCTA 5' -TCTAGTCC C 3 ' -NNNN UNCTGGACAGATCAGG A
CGTAGACGACT
Coupling of this shortened fragment to a detector probe results in the structure (SEQ ID NO:21) support \ AATCCAAGCTA
GACCTGTCTAGTCC C
3 ' -NNNNNNNCTGGACAGATCAGG A
CGTAGACGACT
In another embodiment, the strands of amplified nucleic acid fragments can be separated prior to hybridization to the detector probes. Such strand separation can improve the efficiency of both formation ofthe haiφin structure and hybridization ofthe amplified fragment to the detector probe. This separation can be accomplished using any suitable technique. Strand separation is preferably accomplished by strand-specific digestion. This can be accomplished, for example, by digesting one ofthe strands with a nuclease such as T7 gene 6 exonuclease. By incoφorating a few phosphorothioate linkages at the 5' end ofthe haiφin primer, the strand containing the haiφin primer will be protected from exonuclease digestion while the other strand is digested. Alternatively, the other (non-haiφin) primer can be made with 5' end phosphorothioate linkages. This will protect the opposite strand from digestion. Strand separation can also be accomplished by including a capture tag on the haiφin primer or the second primer. Capture tags and their use are described above. A prefeπed capture tag is a biotin incoφorated into a primer by using a biotin-T phosphoramidite (Glen Research No. 10-1038-95). This modified nucleotide does not interfere with primer function, and becomes incoφorated into all newly-synthesized DNA strands during PCR amplification. If the strand to be captured is a strand with a 5' haiφin structure (top strand in Figure 4 A), the biotin-T is present as part ofthe haiφin primer. On the other hand, if the strand to be captured is a strand with a 3' haiφin structure (bottom strand in Figure 4 A), the biotin-T is present as part of the second primer. The preferred location of the biotin-T in the haiφin primer is any thymine base present in the loop sequence. Capture ofthe biotinylated strand may be performed by methods well known in the art, such as the use of streptavidin-magnetic particles (Dynal, Inc.). This capture tag can then be used to immobilize one strand ofthe amplified fragments while the other strands are washed away. Either the immobilized or washed strand can be carried forward in the method.
In another embodiment, the concentration ofthe various nucleic acid fragments in the index samples are normalized. Normalization can be preformed either before or after any amplification step that may be used. A prefeπed technique for fragment normalization involves immobilizing one strand ofthe nucleic acid fragments, denaturing the nucleic acid fragments, renaturing the nucleic acid fragments for a time greater than the
Figure imgf000041_0001
for abundant nucleic acid fragments and less than the coti^ for rare nucleic acid fragments, and collecting the un-renatured nucleic acid fragments. The sequence information that can be obtained with the disclosed method can be illustrated using a specific example of a nucleic acid fragment. Assume a nucleic acid sample containing a nucleic acid fragment with the sequence (SEQ ID NOs:13 and l4)
..CGCACGGGCTATAGCTGATATAG..GGCAAATGTCTAGTCCGAAATCCAAGCTATG.. ..GCGTGCCCGATCTCGACTATATC..CCGTTTACAGATCAGGCTTTAGGTTCGATAC..
If the sample is amplified with a haiφin primer having the sequence TCTAGTCCGAATGTAGCTTGGATTTCGGACTAGA (SEQ ID NO: 15; where the primer sequence is in boldface and stem sequences are underlined) and a second primer having the sequence ACGGGCTATAGCTGATATAG, the following amplified fragment will result (SEQ ID NOs:16 and 17):
ACGGGCTATAGCTGATATAG .. GGCAAATGTCTAGTCCGAAATCCAAGCTACATTCGGACTAGA TGCCCGATCTCGACTATATC .. CCGTTTACAGATCAGGCTTTAGGTTCGATGTAAGCCTGATCT
When a haiφin structure is formed in the lower strand, the following nucleic acid is obtained (SEQ ID NOs: 16 and 17).
ATGTAG TCTAGTCCGA C TGCCCGATCTCGACTATATC .. CCGTTTACAGATCAGGCT T
TTAGGT
When this nucleic acid is hybridized to an appropriate detector probe (a hexamer in this example) and the detector probe and haiφin ligator are coupled the following structure is obtained (SEQ ID NO: 16 and SEQ ID NO:20).
support ATGTAG CAAATGTCTAGTCCGA C TGCCCGATCTCGACTATATC.. CCGTTTACAGATCAGGCT T
TTAGGT The sequence ofthe detector probe is identified by the location in the probe aπay where the haiφin ligator is detected. The sequence ofthe adjacent primer sequence is identified by the probe aπay in which the label ofthe haiφin ligator is detected (since a different set of probe aπays is used for each index sample). Thus, in this example, detection of label in the CAAATG hexamer position ofthe TCTAGTCCGAAATCCAAGCT (nucleotides 9-28 of SEQ ID NO: 17) probe array (TCTAGTCCGAAATCCAAGCT (nucleotides 9-28 of SEQ ID NO: 17) coπesponds to the primer sequence in the haiφin primer sequence in this example) indicates the presence of a nucleic acid fragment in the nucleic acid sample having the sequence
CAAATGTCTAGTCCGAAATCCAAGCT (nucleotides 3-28 of SEQ ID NO: 14).
Haiφin primers may also be utilized to multiplex a one color readout of a control and tester fragments of a gene from the same address of a slide aπay. One way to do this is to use labile and stable forms of haiφin primers as described in the following illustration. 1. Generate PCR products from cDNA using adaptor ligation. Use different haiφin primers for the tester and control, a uracil in the synthetic adapters for the testers and a thymine in the synthetic adapters for the controls. A fluorescence label my be incoφorated into the haiφin using standard fluorescent labeled nucleotides.
2. Hybridize and ligate to probe aπay xxxxxxNNNNNNNNNNNNNNN
I ! I I I I M I I I I I I I I I dT (Stable Hairpin, Control) nnnnnnnnnnNNNNNNNNN*NN xxxxxxNNNNNNNNNNNNNNN
I I | | I I M I I I I I I I I I U (Labile Hairpin, Tester) nnnnnnnnnnNNNNNNNNN*NN
where x is the hexamer probe, N is the haiφin, n is the amplified fragment, I indicates base pairing, * indicates a fluorescently labeled nucleotide.
3. Read the fluorescence signal at a hexamer probe location. This coπesponds to the control plus tester fluorescence.
4. Treat the probe aπay with uracil-DNA glycosylase. This will cleave haiφins containing uracil at the uracil and leave the thymine uncleaved.
5. Wash the slide with alkali to remove the cleaved fragments.
6. Read the fluorescence signal from the hexamer probe location. This signal corresponds only to the control sample.
7. The tester/control ratio is calculated from the signals of steps 3 and 6. Ratio = (signal3-signal6)/signal6.
Another mode for the use of a uracil containing haiφin is as follows.
1. Generate PCR products from cDNA using adaptor ligation. Use different haiφin primers for the tester and control, a uracil in the synthetic adapters for the testers and a thymine in the synthetic adapters for the controls. xxxxxxNNNNNNNNNNNNNNN
I I I I I I I I I I I | | I I I I U Labile Hairpin (control) ...nnnnnnnnnnNNNNNNNNNNN
xxxxxxNNNNNNNNNNNNNNN I I I I I I I I I I I I I I I I I U Labile Hairpin (tester) ... nnnnnnnnnnNNNNNNNNNNNM where x is the hexamer probe, N is the haiφin, M is an additional base or bases, n is the binary sequence tag, | indicates base pairing.
2. Hybridize and ligate to probe array.
3. Wash with alkali to remove non-ligated tag-haiφins. 4. Cleave with uracil-DNA glycosylase.
The released fragment to be analyzed will be: . . . nnnnnnnnnnNNNNNNNNNNN (control ) . . . nnnnnnnnnnNNNNNNNNNNNM (tester)
5. Detect the cleaved tags, resolving the two different masses, using MALDI-TOF. Use of a tandem mass spectrometer to fragment the cleaved tags will determine some or all ofthe tag sequence, and improve the signal to noise.
A prefeπed form ofthe disclosed method involves amplification of nucleic acid fragments to which adaptor-indexers have been coupled. An example of this form ofthe method is illustrated in Figures 6A-C. Coupling of adaptor-indexers to nucleic acid fragments involves the following basic steps. A nucleic acid sample, embodied in double stranded DNA, is digested with one or more restriction endonucleases such that a set of DNA fragments having sticky ends with a variety of sequences is generated. Prefeπed for this puφose is the use of a single Type IIS restriction endonuclease having an offset cleavage site. Since such Type IIS restriction endonucleases cleave at a site different from the recognition sequence, this results in a set of DNA fragments having sticky ends with a variety of sequences. A similar effect can be obtained by digesting the nucleic acid sample with a mixture of restriction endonucleases which cleave at their recognition site. For a four base sticky end, there are 256 possible sequences. The general formula is N = 4 where X is the length ofthe sticky end and N is the number of possible sequences. In a sufficiently complex nucleic acid sample, all of these sequences will be represented in the ends ofthe set of DNA fragments. The nucleic acid sample is also divided into aliquots (refeπed to as index samples); preferably as many aliquots as there are sticky end sequences (for example, N =
4X aliquots). Where multiple restriction endonucleases are used, the nucleic acid sample is preferably divided into index samples before digestion. Where a single restriction endonuclease is used, the nucleic acid sample is preferably divided into index samples following digestion. Each index sample is then mixed with a different adaptor-indexer, each of which has a sticky end compatible with one ofthe possible sticky ends on the DNA fragments in that index sample. The adaptor-indexes are then coupled onto compatible DNA fragments.
Each index sample can then be digested with one or more other restriction enzymes (refeπed to as second restriction enzymes), preferably restriction enzymes having a four base recognition sequences. All index samples are preferably digested with the same restriction enzyme(s). Alternatively, the index samples can be further divided into secondary index samples with each digested with a different second restriction enzyme or set of restriction enzymes. A second adaptor can then be coupled to the DNA fragments in the index samples (or secondary index samples). Preferably, the same second adaptor is used for each index sample. Different second adaptors are preferably used with secondary index sample derived from the same index sample. In this case, it is prefeπed that the same set of second adaptors be used with each set of secondary index samples. DNA fragments in each index sample (or secondary index sample) now have adaptors coupled to each end. The DNA fragments can then be amplified using haiφin primers. Sequences in the adaptors can be used as primer binding sites for this amplification.
Optionally, prior to amplification, the index samples (or secondary index samples) can divided into further aliquots. These are refeπed to as restricted index samples and non-restricted index samples (or restricted secondary index samples and non-restricted secondary index samples, if there are secondary index samples). Generally, the index samples (or secondary index samples) can be divided into one or more restricted index samples and one non-restricted index sample. The restricted index samples (or restricted secondary index samples), but not the non-restricted index sample (or non-restricted secondary index sample) are then each digested with a different restriction endonuclease (refeπed to as third restriction enzymes). The third restriction enzymes are preferably different from any ofthe restriction enzymes or second restriction enzymes with which the sample has been digested. In some cases, the third restriction enzymes will cleave some DNA fragments in the restricted index samples (or restricted secondary index samples), thus making the fragment incompetent for amplification. In this way, the signals generated by the restricted index samples and non-restricted index sample (or restricted and non-restricted secondary index samples) can differ, and fragments containing the recognition sequence of one ofthe third restriction enzymes can be identified.
Secondary index samples, restricted index samples, non-restricted index samples, restricted secondary index samples, and non-restricted secondary index samples are refeπed to collectively herein as derivative index samples. Each is derived from an index sample and, in some cases, from another derivative index sample. In general, only those derivative index samples last generated are carried forward in the method. For example, if secondary index samples are created, the original index samples from which they were derived are no longer carried forward in the method (the secondary index samples are). Similarly, if restricted and non-restricted secondary index samples are created, then neither the original index samples nor the secondary index samples from which the restricted and non-restricted secondary index samples were derived are carried forward in the method. However, additional information may be gained by carrying forward all or some ofthe index samples and derivative index samples. Each processed DNA fragment (that is, each DNA fragment to which an adaptor-indexer was coupled) from the sample will result in a signal at a particular location in a particular aπay of detector probes. In prefeπed embodiments, the probe aπay in which the signal for a given fragment is detected is determined by the sequence ofthe original sticky end sequence (or recognition sequence). Each different sticky end or recognition sequence is processed in a separate index sample; a separate probe aπay is used for each index sample or derivative index sample. The location in the probe aπay in which the signal for a given fragment is detected is determined by the sequence in the DNA fragment adjacent to the stem ofthe haiφin structure, which is preferably the sequence adjacent to the sticky end sequence (or recognition sequence), since the detector probe must hybridize to this sequence in order to be coupled to the haiφin ligator on the fragment. Hybridization based on the sequence adjacent to the sticky end sequence (or recognition sequence) is accomplished by designing the haiφin primer to result in formation of a haiφin structure with a stem that includes, and terminates at, the sticky end sequence (see example below). A complex nucleic acid sample will produce a unique pattern of signals on the probe aπays. It is this pattern that allows unique cataloging of nucleic acid samples and sensitive and powerful comparisons of the patterns of signals produced from different nucleic acid samples.
The probe aπay, and location in the probe aπay, where a DNA fragment generates a signal identifies the sequence ofthe sticky end of the DNA fragment and ofthe sequence adjacent to the sticky end (or the recognition sequence of the restriction enzyme and ofthe sequence adjacent to the recognition sequence). This is a ten base sequence when a four base sticky end and six base detector probes are used. The fixed relationship between the recognition sequence and the cleavage site of a Type IIS restriction enzyme, when used, and the identity of the recognition sequence, provide additional sequence information about the DNA fragment.
This form ofthe disclosed method is performed using one or more restriction enzymes that collectively produce a plurality of different sticky end sequences. Preferably, the sticky end sequences generated by the restriction enzyme are not limited by the recognition sequence ofthe restriction enzyme.
The sticky ends generated are preferably 2, 3, 4 or 5 nucleotides long. Preferred restriction enzymes for use in the disclosed method are Type IIS restriction endonucleases, which are enzymes that cleave DNA at locations outside of (or offset from) the recognition site and which generate sticky ends. Examples of Type IIS restriction endonucleases are Fokl, Bbvl, Hgal, BspMI and SfaNI. Restriction endonucleases for use in this embodiment ofthe disclosed method produce sticky ends encompassing permutations and combinations of the four nucleotides, A, C, G, and T. The larger the number of protruding bases, the greater the number of possible permutations and combinations of terminal nucleotide sequences, and the more specific the indexing is likely to be. For example, a restriction endonuclease such as Fokl, which releases fragments with four base, 5'-protruding sticky ends, will generate fragments having 44 or 256 possible protruding tetranucleotide ends. Cleavage of a cDNA sample having an average of 12,000 different cDNAs with the restriction endonuclease Fokl will produce a mixture of fragments with four base, 5'-protruding ends. On average, Fokl cuts twice in every 45 base pairs giving an average fragment size of 512 base pairs. If the average length of cDNA is 1,700 base pairs, each cDNA will produce approximately four fragments. The entire sample will contain approximately 4 * 12,000 = 48,000 fragments. There are 44 = 256 possible tetranucleotide sequences and therefore 256 possible identities for each sticky end. On average, there will be 48,000/256 = 188 fragments with a given sticky end sequence. Each of these fragments is sorted by hybridization to different detector probes based on the sequence adjacent to the sticky end sequence in each fragment. A hexamer probe aπay has 4,096 different six nucleotide probes. Thus, only 188 ofthe 4,096 hexamers in the probe aπay will couple to a haiφin ligator, on average. With 256 probe aπays each having 4,096 different hexamer probes, there are 256 * 4,096 = 1,048,576 "bins" in which to distribute 48,000 fragments. This leaves ample opportunity to identify different patterns when using different cDNA samples.
Cleavage of human genomic DNA (which has a haploid number of 3 X 109 base pairs) with the restriction endonuclease Bsp24I will release a large and complex mixture of fragments with five base, 3'-protruding ends. On average, Bsp24I cuts twice in every 46 base pairs giving an average fragment size of 2048 base pairs, and resulting in 3 X 109/2048 = approximately 1.5 X 106 fragments. There are 45 = 1024 possible pentanucleotide sequences and therefore 1024 possible identities for each sticky end. On average, there will be 1.5 X 106/1024 = 1,465 fragments with a given sticky end sequence. Each of these fragments is sorted by hybridization to different detector probes based on the sequence adjacent to the sticky end sequence in each fragment. An heptamer probe aπay has 16,384 different seven nucleotide probes. Thus, only 1,465 ofthe 16,384 heptamers in the probe aπay will couple to a haiφin ligator, on average. With 1024 probe arrays each having 16,384 different heptamer probes, there are 1024 * 16,384 = 1.6 X 107 "bins" in which to distribute 1.5 X 106 fragments.
Cleavage of a cDNA sample with twenty different restriction endonucleases having six -base recognition sequences will produce a mixture of fragments with sticky ends. On average, restriction endonucleases having six- base recognition sequences cut once every 46 = 4096 base pairs. If the sample contains approximately 12,000 cDNA molecules with an average length of cDNA is 1,500 base pairs, cleavage with one ofthe restriction enzymes will result about 3200 cuts (and thus 6400 DNA fragments with sticky ends). Further cleavage ofthe sample (second digest) with two different restriction endonucleases having four-base recognition sequences will result in additional cuts once every 44 = 256 base pairs. Since the second digest will, in many cases, result in cuts on each fragment, this will result in (for each ofthe 20 * 2 = 40 secondary index samples) approximately 6,400 fragments, each approximately 256 base pairs long.
If five different restriction endonucleases having four-base recognition sequences are used for the third digest, approximately half of the fragments in each restricted secondary index sample will be cleaved (since these restriction enzymes will cut about once every 256 base pairs). Thus, there will be approximately 3,200 fragments (intact, with both an adaptor-indexer and a second adaptor) in each of the 20 * 2 * 5 = 200 restricted secondary index samples (there will be approximately 6,400 fragments in the non-restricted secondary index sample). Each of these fragments is sorted by hybridization to different detector probes based on the sequence adjacent to the sticky end sequence in each fragment. A hexamer probe aπay has 4,096 different six nucleotide probes. Thus, only 3,200 ofthe 4,096 hexamers in the probe aπay will couple to a haiφin ligator, on average. With 200 probe aπays each having 4,096 different hexamer probes, there are 200 * 4,096 = 819,200 "bins" in which to distribute the of 3,200 * 200 = 640,000 total fragments (a heptamer aπay would provide 200 * 16,384 = 3,276,800 "bins").
As these examples illustrate, the length ofthe recognition sequence, the length ofthe sticky end generated, and the length ofthe detector probes used in the probe aπays together determine the number of data bins into which the nucleic acid fragments are sorted. By using sticky ends and aπay probes of sufficient length, the sorting of fragments can be matched to the complexity of the sample being analyzed.
The use of a comprehensive panel of adaptor-indexers provides a means for attaching specific functional modifications to selected subsets of a complex mixture of nucleic acid fragments and identifying the molecules so modified. Such a defined subset of molecules may be further resolved by additional cleavage and indexing, or by any ofthe established techniques such as cloning, PCR amplification, or gel electrophoresis. Individual members ofthe class may be distinguished by identifying characteristics such as length, sequence, or restriction endonuclease maps. The sequence ofthe sticky ends ofthe adaptor- indexers provides a means of indexing a large number of nucleic acid fragments.
Detector probes of different sequence can be immobilized at different locations on the probe aπay. In this way, the sequence ofthe detector probes on the probe aπay and the sequence of nucleic acid fragments in the index samples determine where on the probe aπay amplified fragments become coupled. The presence of fragments at different locations in the probe aπays thus forms a pattern of signals that provides a signature or fmgeφrint of a nucleic acid sample based on the presence or absence of specific nucleic acid sequences in the sample. For this reason, cataloging of this pattern of signals (that is, the pattern ofthe presence of fragments or haiφin ligators) is an embodiment ofthe disclosed method that is of particular interest. Catalogs can be made up of, or be refeπed to, as, for example, a pattern of fragments on probe aπays, a pattern of the presence of fragments on probe aπays, a pattern of haiφin ligators on probe aπays, a pattern ofthe presence of haiφin ligators on probe aπays, a catalog of nucleic acid fragments in a sample, or a catalog of nucleic acid sequences in a sample. The information in the catalog is preferably in the form of positional information (that is, location in the probe aπay) or, more preferably, in the form of sequences. Prefeπed sequence information for catalogs include sequences of detector probes to which a fragment was coupled and sequences of nucleic acid fragments present in the sample (derived from the locations in the probe aπay where fragments were coupled).
When a single Type IIS restriction enzyme is used in the first digest, the sequence information obtainable can be illustrated with the following structures: DNA fragment: ..NNNNXXXX..NNNNRRRRROOOOOOOOOSSSSNNNN.. Sequence information:
In these structures, each character represents a nucleotide. N represents any nucleotide (having no special identity or relationship to the method). R represents a nucleotide in the recognition sequence ofthe Type IIS restriction enzyme. O represents a nucleotide in the offset between the recognition site and the cleavage site ofthe Type IIS restriction enzyme. S represents a nucleotide in the sticky end resulting from cleavage with the Type IIS restriction enzyme. X represents a nucleotide in the recognition/cleavage site ofthe second restriction enzyme. I represents a nucleotide complementary to the detector probe.
From the DNA fragment ...NNNNXXXX...NNNNRRRRROOOOOO OOOSSSSNNNN..., the sequence information can be obtained. In this example, the Type IIS restriction enzyme has a five base recognition sequence, a nine base offset to the cleavage site, and creates a four base sticky end. The detector probes contain hexamer sequences. Each aπay location where a signal is generated in this example thus represents a specific sequence : nnnnn — nnnnnnnnnn (where n represents an identified nucleotide and each - represents an unidentified nucleotide). This is refeπed to as a determined sequence. The portion ofthe nucleic acid fragments for which the sequence is determined coπesponds to the sticky end sequence, the sequence adjacent to the sticky end sequence to which the detector probe hybridized, and the recognition sequence ofthe restriction enzyme (S, I, and R, respectively).
This sequence information can also be represented by the structure A-B-C-D where A is the recognition sequence ofthe restriction enzyme, B is the gap of unknown sequence, C is the sequence to which the detector probe hybridized, and D is the sticky end sequence. The gap represents the nucleotides between the recognition sequence and the sequence to which the detector probe hybridized. C is always adjacent to the sticky end sequence D. In the example above, A is RRRRR, B is OOO, C is and D is SSSS.
The sequence information that can be obtained with the disclosed method can be further illustrated using a specific example of a nucleic acid fragment. Assume a nucleic acid sample containing a nucleic acid fragment with the sequence (SEQ ID NO: 18)
. . CGGTGGATGACTTGAAGCTATGCTTAGG . . . . GCCACCTACTGAACTTCGATACGAATCC . . If the sample is digested with Fokl — a Type IIS restriction enzyme with a recognition sequence of GGATG and a cleavage site offset by 9 and 13 nucleotides — the fragment will be cleaved to generate the following fragments
(the Fokl recognition sequence is shown in bold) .. CGGTGGATGACTTGAAGC TATGCTTAGG .. .. GCCACCTACTGAACTTCGATAC GAATCC ..
When the coπesponding adaptor-indexer is coupled to fragment and the coupled fragment is amplified using a coπesponding haiφin primer, the following nucleic acid is obtained (SEQ ID NO: 19; sequence from the adaptor-indexer is underlined, the haiφin primer is italicized)
.. CGGTGGATGACTTGAAGCTATGCGGTATTACAGCCTATATACCGCATA ..GCCACCTACTGAACTTCGAΓACGCCAΓAAΓGTCGGATATAΓGGCGΓAT
When the haiφin structure is formed (in the bottom strand in this example), the nucleic acid is hybridized to an appropriate detector probe (a hexamer in this example), and the detector probe and haiφin ligator are coupled the following structure is obtained (SEQ ID NO:22) support ATAG
"TGAAGC TA TGCGGTA T G
. . GCCACCTACTGAACTTCGATACGCCATA C ATGT
The sequence ofthe detector probe is identified by the location in the probe aπay where the fragment is detected. The sequence ofthe adjacent sticky end is identified by the probe aπay in which the fragment is detected (since a different probe aπay is used for each sticky end sequence). Finally, the sequence ofthe recognition sequence is identified by the relationship ofthe cleavage site to the recognition sequence. Thus, in this example, detection of label in the TGAAGC hexamer position ofthe AT AC sticky end probe aπay indicates the presence of a nucleic acid fragment in the nucleic acid sample having the sequence CCTACNNNACTTCGATAC (3' to 5'; SEQ ID NO:23).
Relating this sequence to the generalized structure A-B-C-D, A is CCTAC, B is NNN, C is ACTTCG, and D is ATAC.
When multiple restriction enzymes are used for the first digestion, the sequence information obtainable can be illustrated with the following structures: DNA fragment : . . NNXXXXNN . . NNRRRRNN . . . . Sequence : XXXX RRRR III IIISSSSSS
In these structures, each character represents a nucleotide. N represents any nucleotide (having no special identity or relationship to the method). S represents a nucleotide in the recognition sequence (including sticky end) ofthe first restriction enzyme. X represents a nucleotide in the recognition/cleavage site ofthe second restriction enzyme. R represents a nucleotide in the recognition sequence ofthe third restriction enzyme. I represents a nucleotide complementary to the detector probe. The sequence and distance between the recognition sites ofthe second and third restriction enzymes and between the recognition site ofthe second restriction enzyme and the probe complement are not determined in the basic method.
From the DNA fragment SSSSSSNN..., the sequence information can be obtained. In this example, the detector probes contain hexamer sequences. Each aπay location where a signal is generated in this example thus represents a specific sequence : nnnn...nnnn...minnnnnnnnnn (where n represents an identified nucleotide and each ... represents an unidentified gap sequence). This is refeπed to as a determined sequence. The portion ofthe nucleic acid fragments for which the sequence is determined coπesponds to the recognition sequence ofthe first restriction enzyme, the sequence adjacent to the recognition sequence to which the detector probe hybridized, the recognition sequence of the second restriction enzyme, and the recognition sequence ofthe third restriction enzyme (S, I, X, and R, respectively). This sequence information can also be represented by the structure
E-B-F-B-C-D where B is a gap of unknown sequence, C is the sequence to which the detector probe hybridized, D is the recognition sequence ofthe first restriction enzyme, E is the recognition sequence ofthe second restriction enzyme, and F is the recognition sequence ofthe third restriction enzyme. The gaps represent nucleotides between the recognition sequences ofthe second and third restriction enzymes and between the recognition sequence ofthe third restriction enzyme and the sequence to which the detector probe hybridized. C is always adjacent to the recognition sequence D. In the example above, C is D is SSSSSS, E is XXXX, and F is RRRR.
The sequence information that can be obtained with the disclosed method can be further illustrated using a specific example of a nucleic acid fragment. Assume a nucleic acid sample containing a nucleic acid fragment with the sequence (SEQ ID NOs:24, 25, and 26; restriction enzyme recognition sequences in boldface)
.. CGCATGGG .. ATAGCTTG .. CAAGCTATGGATCCGA.. .. GCGTACCC .. TATCGAAC .. GTTCGATACCTAGGCT .. If the sample is first digested with BamHI — a restriction enzyme with a recognition sequence of GGATCC generating a four-base sticky end — the fragment will be cleaved to generate the following fragments:
.. CGCATGGG .. ATAGCTTG .. CAAGCTATG GATCCA.. .. GCGTACCC .. TATCGAAC .. GTTCGATACCTAG GT .. When the coπesponding adaptor-indexer is coupled to fragment and the fragment digested with Nlal (recognition sequence CATG) the result is (SEQ ID NO:27):
.. CGCATG GG ..ATAGCTTG .. CAAGCTATGGATCTGGTATTACAGCCTA .. GC GTACCC .. TATCGAAC .. GTTCGATACCTAGACCATAATGTCGGAT After addition ofthe second adaptor and amplification using the coπesponding haiφin primer (GGATCTGGTATAGGCTGTAATACCAGATCC; SEQ ID NO:28), the following nucleic acid is obtained (SEQ ID NO: 33 and SEQ ID NO:29; sequence from the adaptor-indexer is underlined, the haiφin primer is italicized). Note that the haiφin primer hybridizes to both the sticky end sequence and the remaining recognition sequence (that is, the C not in the sticky end).
GCCATGGATCTCTCACATGGG .. ATAGCTTG .. CGGTACCTAGAGAGTGTACCC .. TATCGAAC .. .. CAAGCTATGGATCTGGTATTACAGCCTATACCAGATCC
.. GTTCGATACCTAGACCATAATGTCGGATATGGTCTAGG
An aliquot (that is, a restricted index sample) ofthe sample can be digested with Alul (recognition site AGCT) prior to amplification. By cutting the fragment, amplification is prevented. This lack of amplification in the restricted index sample indicates the presence ofthe sequence TCGA in the fragment. When a haiφin structure is formed in the bottom strand (in this example), the fragment is hybridized to an appropriate detector probe (a hexamer in this example), and the detector probe and haiφin ligator are coupled the following structure is obtained (SEQ ID NO:30; sequence from the adaptor- indexer is underlined, the haiφin primer is italicized, restriction enzyme recognition sequences in boldface) support AGG
^ GCTATGGATCTGGTAT C CGGTACCTAGAGAGTGTACCC .. TATCGAAC .. GTTCGATACCTAGACCAXA T ATG
The sequence ofthe detector probe is identified by the location in the probe aπay where the haiφin ligator is detected. The sequence ofthe adjacent recognition sequence (including the sticky end) is identified by the probe aπay in which the haiφin ligator is detected (since a different set of probe aπays is used for each index sample). The sequence ofthe recognition sequence ofthe second restriction enzyme is identified by the probe aπay in which the haiφin ligator is detected (since a different set of probe aπays is used for each secondary index sample). Finally, the presence of an internal sequence (the recognition sequence ofthe third restriction enzyme) is determined by seeing if the signal is absent from the probe aπay for the restricted secondary index sample that was digested with the third restriction enzyme (a different probe aπay is used for each restricted and non-restricted secondary index sample). If the signal is absent, it indicates the recognition site is present in the fragment. Thus, in this example, detection of haiφin ligator in the AGCTAT hexamer position ofthe TCGA third recognition site probe aπay in the GTAC second recognition site set of probe aπays in the CCTAGG sticky end set of probe arrays indicates the presence of a nucleic acid fragment in the nucleic acid sample having the sequence
GTAC... TCGA... TCGATACCTAGG (SEQ ID NO:31). Relating this sequence to the generalized structure E-B-F-B-C-D, C is TCGATA, D is CCTAGG, E is GTAC, and F is TCGA.
In another embodiment, the primer sequences in the haiφin primers are partly degenerate. In this way, multiple different nucleic acid fragments will be amplified in each index sample. Where partially degenerate primer sequences are used, it is prefeπed that the 3' end ofthe primer sequence of all ofthe haiφin primers used in a given index sample be the same. It is also prefeπed that the coπesponding 3' end sequences of haiφin primers used in different index samples be different. In this way, the fragments amplified in each index sample will have related primer complement sequences while the sets of fragments amplified in the different index samples will be different. Such relationships provide a maximum of both sequence information for the fragments and catalog complexity.
The use of sets of haiφin primers with partially degenerate primer sequences can be illustrated with the following example. Sets of haiφin primers where the primers sequences in each set has, from 5' to 3', 8 specific bases and 12 degenerate bases can prime amplification from all sites in a nucleic acid sample having a sequence complementary to the 8 specified bases. The sequence ofthe specified bases in each ofthe sets can be different. Each different sequence, and thus each different set, of haiφin primers will prime amplification from a different set of sites in a nucleic acid sample. In a sufficiently complex nucleic acid sample, all of these sequences will be represented in the set of amplified fragments. By dividing the nucleic acid sample into aliquots (refeπed to as index samples) prior to amplification, multiple sets of fragments can be generated and analyzed, with each set preserving the primer sequence information. Mass Spectroscopy Detection
Mass spectrometry techniques can be utilized for detection in the disclosed method. These techniques include matrix-assisted laser desoφtion/ionization time-of-flight (MALDI-TOF) mass spectroscopy. Such techniques allow automation and rapid throughput of multiple samples and assays.
Mass spectrometry detection works better with smaller molecules so it is useful to cut some components ofthe method prior to, or as part of mass spectrometry detection. A number of methods are contemplated where an ohgonucleotide molecule to be detected is cut to a shorter length prior to detection by mass spectrometry. The disclosed method would proceed as normal and, in the prefeπed embodiment, the surface that has the detector probes attached would be compatible with the source region of a matrix assisted laser desoφtion ionization, time of flight, mass spectrometer (MALDI-TOF - MS). The resultant fragment would look something like
HHH Surface 3 ' PPPPPPXXXXXXXXXXXXXXXXXXXXXXXX H-L xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx H
/ HHH
3' ...YYYYYYYYY
Where:
P are the detector probe, coupled to the fragment;
X are complementary bases ofthe haiφin primer and amplified fragment;
H are loop bases;
Y are the remaining nucleotides ofthe amplified fragment; L is a label.
For fragments of greater than approximately 50 bases the performance of mass spectrometry techniques degrades for DNA samples. Chemical, biological, physical (thermal), and other cleaving reagents can be used to generate smaller, more optimal, sub-fragments to be analyzed in the mass spectrometer. The degree of fragmentation is somewhat tunable in instruments like the Q-TOF systems (Micromass, US head office at Suite 407N, 100 Cummings Center, Beverly, MA 01915-6101, USA.) where one can look at the parent ion, then increase the fragmentation to see the decomposition fragments and thus the sequence; such a technique is contemplated to determine the full sized sub- fragment, and infer the sequence ofthe sub-fragment through these known tools. The detectable fragment can be top strand, bottom strand, or both strands depending upon the scheme. The label may be a cleavable mass tag or the strand need not be labeled.
There are several useful cleaving reagents for this puφose. For example, one technique is that of Szybalski (described elsewhere herein) where Fokl is used to cut at a fixed distance from an arbitrary, specific, recognition site. This technique can be extended to other restriction enzymes of Type IIS or Type III. One could also use this technique twice, once to trim off the end nearer the surface, once to trim off the end further from the surface; preferably one would use a Type II enzyme to cut the end furthest from the surface. Use of McrBC (New England Biolabs), can be used to cut at methylcytosine sites adjacent to G/A. The cut site is not well defined (approximately 30 bases) which may be used to advantage to generate the parent as well as the fragmentation set. Metal containing poφhyrins attached to ohgonucleotides have been shown to cut DNA very near the poφhyrin when exposed to light (texaphyrins, US5607924). One could denature and use a hybridization texaphryin and light to cleave the remaining strand. Another cleavage technology is that of Dervan (Cartwright et al, Cleavage of chromatin with methidiumpropyl-EDTA . iron(II). Proc Natl Acad Sci U S A, 80(11):3213- 7 (1983); Schultz, P.G. and P.B. Dervan, Sequence-specific double-strand cleavage of DNA by penta-N- methylpyrrolecarboxamide-EDTA X Fe(II). Proc Natl Acad Sci U S A, 80(22):6834-7 (1983)). Techniques using photocleave linkages are described by Olejnik et al. (Olejnik et al, Photocleavable peptide- DNA conjugates: synthesis and applications to DNA analysis using MALDI-MS. Nucleic Acids Res, 1999. 27(23):4626-31 (1999); Olejnik et al, Photocleavable affinity tags for isolation and detection ofbiomolecules. Methods Enzymol, 291:135-54 (1998); Olejnik et al., Photocleavable aminotag phosphoramidites or 5 '-termini DNA/RNA labeling. Nucleic Acids Res, 26(15):3572-6 (1998); Olejnik et al., Photocleavable aminotag phosphoramidites for 5'-termini DNA/RNA labeling. Nucleic Acids Res, 26(15):3572-6 (1998); Olejnik et al, Photocleavable biotin derivatives: a versatile approach for the isolation of biomolecules. Proc Natl Acad Sci U S A, 92(16):7590-4 (1995)) These linkages can be cleaved using light to release the fragment from the surface, thus allowing one to provide a more gentle desoφtion. WO 0004036 describes photocleavable nucleotides and methods for their use.
In one embodiment, a mass label such as peptide nucleic acid (PNA) molecules (Hanvey et al., Science 258:1481-1485 (1992)) of different sequence and molecular weight can be used as labels that bind specifically to sequence in haiφin primers or second primers. Laser desoφtion ofthe samples is used to generate MALDI-TOF mass spectra ofthe PNA labels, which are released into the spectrometer and resolved by mass. The intensity of each PNA label reveals the relative amount of different components. In other words, the PNA spectra generate scalar values that are indirect indicators ofthe relative abundance ofthe labeled component at specific locations in an aπay. Probability Detection
Sequencing by hybridization is known to produce mismatch eπors (Lipshutz, Likelihood DNA sequencing by hybridization. J Biomol Struct Dyn, 11(3):637 -53 (1993)). Database searching for sequence information cuπently is regular expression based and requires matched "letters" between the database entry and the search sequence. The disclosed method allows replacement of regular expression matching (match versus no-match per base) with a probability function to determine a confidence in the assignment of the identity of a sequence tag (that is, the fragments produced in the disclosed method).
The disclosed method uses covalent coupling to improve the specificity ofthe hybridization near the coupling site. Despite this improvement, there will remain a finite probability of a mismatch, particularly for nucleotides more removed from the coupling site. The eπor rate depends on least two mismatch properties: base pairing, i.e. A with G; distance from the coupling site.
As an illustration ofthe process to determine the confidence value, consider the two bases in a hexamer probe furthest from the coupling site, numbering the bases as shown here.
<hexamer> surface-linker-spacer-NNNNNNnnnn-hairpin ligator
I I I I I I I I I I I I I I I I I I I I I I I I 3' -fragment .. NNNNNNNNNnnnn-hairpin ligator
123456 <position> where for this particular case one has, surface — linker-spacer ATXXXX, focusing on the AT (positions 1 and 2) bases for puφose ofthe immediate illustration.
To evaluate the possible set of sequences represented, weight matrices are used, following Dayhoff (Dayhoff et al., A model of evolutionary changes in proteins, in Atlas of Protein Sequence and Structure, M.O. Dayhoff, Editor. 1978, National Biomedical Research Foundation: Washington DC) and Venezia (Venezia and OΗara, Rapid motif compliance scoring with match weight sets. Comput Appl Biosci, 9(l):65-9 (1993)) protein techniques. The coefficient in these matrices will be determined experimentally for the disclosed method. Below is an example of matrices (with illustrative coefficients) representing position 1 and 2, where the columns represent the upper strand nucleotide and the rows represent the lower strand nucleotide. The actual coefficients can be determined empirically.
Position 1 Position 2
A T C G A T C G
A[.02, .90, .03, .05] A[.01,.97,.01,.01] T[.90, .02, .03, .05] T [ .97 , .01, .01, .01]
C[.02, .03, .05, .90] C[.01,.01,.01,.97]
G[.03,.02,.90,.05] G[.01,.01,.97,.01]
For the case of a perfect match detection on the hexamer ATXXXX the score is determined to be the product ofthe coefficients ofthe matrices, shown below here in bold; 0.90 x 0.97 = 0.87.
Position 1 Position 2
A T C G A T C G
A[.02, .90, .03, .05] A[.01, .97, .01, .01]
T[ .90, .02, .03, .05] T[.97, .01, .01, .01] C[.02, .03, .05, .90] C [ .01, .01, .01, .97 ]
G[.03, .02, .90, .05] G[.01, .01, .97, .01]
A case where a singe base mismatch in one strand occurs, for example A->G in position 1 on the hexamer side, the score is determined in a similar fashion, to be 0.05 x 0.97 = 0.05 Position 1 Position 2
A T C G A T C G
A[.02, .90, .03, .05] A[.01, .97, .01, .01]
T[.90, .02, .03, .05] T[.97, .01, .01, .01]
C[.02, .03, .05, .90] C[.01, .01, .01, .97] G[.03, .02, .90, .05] G [ .01, .01, .97 , .01]
This procedure can be extended to an arbitrary number of bases in a similar manner. For a given number of nucleotides the score can be computed for all possible mismatches and rank ordered to reveal the most probable identity. A cut-off score can be used to reduce the number of possible identities from the matrix estimation. For example using the example matrices above, sequences with a threshold score above 0.50 would yield only one sequence, that being a sequence which matches the probe. This method of estimating sequences and their respective probability scores from the universe of mismatch events for a said probe can from extended from 1 to n, where n is the number of free bases available for hybridization.
In an organism that has not been completely characterized (i.e. at least sequenced and consensus sequence assembled) one can compute a confidence value for uniqueness if one assumes a random distribution of bases. For example, if one has a candidate of 15 bases in length, in an organism which has an estimated 108 base genome, one expects the 15 base fragment to be unique because 10 /4 = 0.1 is much less than 1. The genome would have to be 10 times larger before one would expect an occuπence of two instances ofthe particular 15 base fragment.
The distributions, in known genomes, are known not to be completely random and the initial assumption of a random distribution can be improved as information is gathered. This new information can be used to assign and use confidence values. As an example, consider a fictitious gene family ABCD, whose members are ABCDl, ABCD2 and ABCD3. The three members were discovered following some event such as heat shock, and they are thus putatively assigned to belong to the heat shock family of genes and happen to have significant stretches of conserved sequence among the family of genes. Also consider the organism to be a plant, where ABCDl was isolated from the plant root, ABCD2 was isolated from the plant leaf, and ABCD3 was isolated from the plant flower. The estimation matrix may look like
1 2 3 ABCDl [ . 60 , . 15 , . 05 ] ABCD2 [ . 25 , . 60 , . 15 ] ABCD3 [ . 05 , . 15 , . 60 ] where the column 1 represents root, column 2 represents leaf and column 3 represents flower.
In a single experiment where one has high confidence in the sequence but the sequence may belong to one ofthe three known members ofthe family, the source ofthe sample (i.e. root, leaf or flower) allow estimation ofthe identity ofthe gene. For the fully mathematically closed treatment the matrix must contain all elements ofthe family, here to allow for a still to be found gene in this family, the rows and columns do not add to 1 ; all the other members are assigned a sum of 0.05, the values to be updated as the amount of information known about the organism increases.
One can extend this estimation to include organism homology. That is, if one were to search a database of all organisms for a given sequence from gene
ABCDl of Plant 1 there may be matches to Plant 2, Plant 3, Mammal 1, etc.
The estimation matrix would be constructed from the known organism data in the database.
The calculations and analysis described above can be illustrated using the following example of construction of a catalog. Consider a two probe aπay, a control sample, and a tester sample. Consider the two probes to have the known sequences: A, <substrate— linker-- AGGGAG-3'>, and, B, <substrate--linker— ATGGAG>. These probes will capture their cognate sequence: AA,
<...TCCCTC...>, and, BB, <...TACCTC...> from the control and tester samples, as well as some mismatched species with lower probability as described herein. Utilizing the estimation matrix technique as discussed above one calculates the probabilities ofthe coπect matching. The disclosed method is conducted on the control and tester, resulting signals are collected from the probe aπay, and a catalog is made which contains the four signals: control tester
AA BB AA BB A .30 .03 A .80 .10
B .03 .50 B .03 .50
The catalog also contains the probabilities, and/or entries derived from the probabilities, for each probe/target combination, as discussed above. For puφose of illustration, let us assume that the probability of having probe sequence A paired with target sequence AA is 0.80, and the probability of having probe sequence A paired with sequence BB is 0.10, probe sequence B paired with target sequence AA is 0.05, and the probability of having probe sequence B paired with sequence BB is 0.75, or estimation AA BB A .80 .10 B .05 .75 It is a simple matter of application of linear algebra to determine the signals coπesponding to each target. Here, for example, multiplying the coπesponding entries together to convert the control and tester to the pattern coπesponding to the probabilistic pattern ofthe target of interest. For example, the total signal ascribed, in the control sample, to AA target is 0.30 x 0.80 (on A probe site, perfect match) + 0.03 x 0.05 (on B probe site, imperfect match) = approximately 0.24. On the tester sample, the AA target signal is 0.80 x 0.80 + 0.03 x 0.05 = approximately 0.64. Comparison ofthe pattern for the control and tester, for the sequence coπesponding to AA, exhibits an increase in the relative amount of AA from 0.24 to 0.64 for control to tester respectively. All other entries in the pattern are calculated in the same fashion.

Claims

CLAIMSWe claim:
1. A method of identifying nucleic acid fragments in nucleic acid samples, the method comprising
(a) mixing one or more nucleic acid samples with one or more different haiφin primers, wherein each haiφin primer comprises a different primer sequence,
(b) incubating the samples under conditions that promote amplification of nucleic acids in the samples, wherein amplified nucleic acid fragments are formed which have haiφin primer sequences at one or both ends,
(c) incubating the samples under conditions that promote formation of haiφin structures by the haiφin primer sequences at the ends ofthe amplified fragments,
(d) hybridizing each sample with a plurality of detector probes and covalently coupling the haiφin structures to the probes, wherein each probe has a different sequence, and
(e) detecting, directly or indirectly, coupling ofthe amplified fragments to the detector probes.
2. The method of claim 1 wherein the probes are all ofthe same length.
3. The method of claim 2 wherein the detector probes are six, seven, or eight nucleotides long.
4. The method of claim 1 wherein the probes all have similar hybrid stability.
5. The method of claim 1 wherein the amplified fragments are covalently coupled to the detector probes by ligation.
6. The method of claim 1 further comprising, prior to step (a), dividing the sample into a plurality of index samples, wherein a different haiφin primer is mixed with each index sample, wherein steps (a) through (e) are performed with each index sample.
7. The method of claim 5 further comprising, prior to step (b), dividing each index sample into a set of two or more of secondary index samples, and mixing each secondary index sample in each set of secondary index samples with a different set of one or more second primers.
8. The method of claim 5 further comprising, prior to step (a), dividing each index sample into a set of two or more of secondary index samples, prior to, simultaneous with, or following step (a), mixing each secondary index sample in each set of secondary index samples with a different set of one or more second primers, wherein mixing each index sample with one or more different haiφin primers is accomplished by mixing the one or more different haiφin primers with each secondary index sample in a set of secondary index samples.
9. The method of claim 1 further comprising, following step (b), separating the strands ofthe amplified fragments and proceeding with step (c) using only one ofthe strands.
10. The method of claim 9 wherein the strands are separated using a capture tag incoφorated into one ofthe strands.
11. The method of claim 10 wherein the capture tag is incoφorated into the haiφin primers.
12. The method of claim 9 wherein the strands are separated by selective digestion of one ofthe strands.
13. The method of claim 12 wherein the linkages between a plurality of nucleotides at the 5' end of each haiφin primer are insensitive to nuclease digestion.
14. The method of claim 13 wherein the linkages between a plurality of nucleotides at the 5' end of each haiφin primer are phosphorothioate linkages.
15. The method of claim 1 wherein the concentration of the various nucleic acid fragments in the samples are normalized.
16. The method of claim 15 wherein the strands ofthe nucleic acid fragments are separated and the concentration ofthe nucleic acid fragments is normalized by immobilizing one strand ofthe nucleic acid fragments, denaturing the nucleic acid fragments, renaturing the nucleic acid fragments for a time greater than the c0/2 for abundant nucleic acid fragments and less than the c0/2 for rare nucleic acid fragments, and collecting the un-renatured nucleic acid fragments.
17. The method of claim 1 further comprising, prior to step (e), amplifying the amplified fragments coupled to the detector probes.
18. The method of claim 1 wherein each detector probe is immobilized on a substrate.
19. The method of claim 18 wherein all ofthe detector probes are immobilized on the same substrate.
20. The method of claim 18 wherein all ofthe detector probes are immobilized on a different substrate.
21. The method of claim 20 wherein the substrates are beads.
22. The method of claim 18 wherein the detector probes are immobilized on a plurality of different substrates such that at least one detector probe is immobilized on one substrate and at least one other detector probe, respectively, is immobilized on a different substrate.
23. The method of claim 18 wherein the detector probes are in an aπay.
24. The method of claim 1 wherein each detector probe is associated with a capture tag, sorting tag, or both.
25. The method of claim 24 wherein the detector probes are captured via the capture tags.
26. The method of claim 24 wherein the detector probes are sorted via the sorting tags.
27. The method of claim 24 wherein the detector probes are associated with a plurality of different capture tags or a plurality of different sorting tags.
28. The method of claim 1 wherein the detector probes are in an array, wherein each detector probe is immobilized at a different location in the aπay, and wherein detecting coupling of amplified fragments to detector probes is accomplished by detecting the presence of amplified fragments at different locations in the aπays.
29. The method of claim 28 wherein the location, amount, or location and amount of amplified fragments in the aπays constitutes a pattern of amplified fragments in the aπays, the method further comprising comparing the pattern of amplified fragments in the aπays with the pattern of amplified fragments in aπays determined in a separate procedure using a second nucleic acid sample.
30. The method of claim 29 further comprising comparing the pattern of amplified fragments in the aπays with the pattern of amplified fragments in aπays determined in a plurality of separate procedures using a plurality of different nucleic acid samples.
31. The method of claim 1 further comprising, following covalent coupling in step (d), incubating the samples with T4 endonuclease VII.
32. The method of claim 1 wherein the nucleic acid fragments are amplified by PCR.
33. The method of claim 1 wherein each haiφin primer contains a label, wherein coupling ofthe amplified fragments to the probes is detected via the label.
34. The method of claim 33 wherein the label is detectable by nuclear magnetic resonance, electron paramagnetic resonance, surface enhanced raman scattering, surface plasmon resonance, fluorescence, phosphorescence, chemiluminescence, resonance raman, microwave, or a combination.
35. The method of claim 1 wherein the presence ofthe amplified fragments is detected by rolling circle replication of an amplification target circle wherein replication is primed by primer sequences at the end ofthe amplified fragments.
36. The method of claim 1 wherein the pattern ofthe amount of amplified fragments coupled to different detector probes constitutes a catalog of nucleic acid fragments in the nucleic acid sample, wherein the pattern is compared to a predicted pattern based on probabilities of base mismatches of sequences hybridized to the detector probes.
37. The method of claim 1 wherein detecting coupling ofthe amplified fragments to the detector probes is accomplished by detecting mass labels associated with the coupled fragments, mass labels associated with the detector probes, or a combination, by mass spectroscopy.
38. The method of claim 37 wherein the mass labels associated with the coupled fragments and mass labels associated with the detector probes are detected by matrix-assisted laser desoφtion/ionization time-of- flight mass spectroscopy.
39. The method of claim 37 wherein the composition ofthe mass labels associated with the coupled fragments and mass labels associated with the detector probes are determined by analyzing the fragmentation pattern.
40. The method of claim 37 wherein uncoupled fragments are washed away from the detector probes prior to detection ofthe coupled fragments.
41. The method of claim 37 wherein the haiφin primers, the detector probes, or both, contain a photocleavable nucleotide, wherein the method further comprises, following coupling of the amplified fragments to the detector probes, photocleavage ofthe photocleavable nucleotides, and detection of one or both strands ofthe coupled amplified fragment by mass spectroscopy.
42. The method of claim 37 further comprising, following coupling of the amplified fragments to the detector probes, incubation ofthe couple fragments and detector probes with one or more nucleic acid cleaving reagents, and detection of one or both strands ofthe coupled fragment by mass spectroscopy.
43. The method of claim 1 further comprising performing steps (a) through (c) on one or more control nucleic acid samples, wherein the haiφin primers used with the control nucleic acid samples contain a different label from the label ofthe haiφin primers used with the nucleic acid samples, mixing the control nucleic acid samples with coπesponding nucleic acid samples and proceeding with step (d) by hybridizing the mixed samples with the detector probes, detecting coupling of both types of amplified fragments to different detector probes, and identifying differences in the pattern of coupling of amplified fragments to probes from the nucleic acid samples and the control nucleic acid samples.
44. The method of claim 1 further comprising performing steps (a) through (c) on one or more control nucleic acid samples, wherein the haiφin primers used with the nucleic acid samples contain a labile nucleotide in the loop, mixing the control nucleic acid samples with coπesponding nucleic acid samples and proceeding with step (d) by hybridizing the mixed samples with the detector probes, detecting coupling of amplified fragments from both types of samples to different detector probes, treating the mixed nucleic acid samples to cleave the labile nucleotide, detecting coupling of amplified fragments from nucleic acid samples to different detector probes, and identifying differences in the pattern of coupling of amplified fragments to probes from the nucleic acid samples and the control nucleic acid samples.
45. The method of claim 1 wherein the detector probes are in an aπay, wherein each probe is immobilized at a different location in the aπay, wherein the location of amplified fragments in the aπay constitutes a pattern of coupling of amplified fragments to probes in the aπay, the method further comprising comparing the pattern of coupling of amplified fragments in the aπays with the pattern of amplified fragments in aπays determined in a separate procedure using a second nucleic acid sample.
46. The method of claim 1 wherein the pattern ofthe presence, amount, presence and amount, or absence of amplified fragments coupled to different detector probes constitutes a catalog of nucleic acid fragments in the nucleic acid sample.
47. The method of claim 46 further comprising preparing a second catalog of nucleic acid fragments in a second nucleic acid sample and comparing the first catalog and second catalog.
48. The method of claim 47 further comprising identifying or preparing nucleic acid fragments coπesponding the nucleic acid fragments present at a threshold amount in the first nucleic acid sample but not present at the threshold amount in the second nucleic acid sample.
49. The method of claim 47 wherein the second nucleic acid sample is a sample from the same type of organism as the first nucleic acid sample.
50. The method of claim 47 wherein the second nucleic acid sample is a sample from the same type of tissue as the first nucleic acid sample.
51. The method of claim 47 wherein the second nucleic acid sample is a sample from the same organism as the first nucleic acid sample.
52. The method of claim 51 wherein the second nucleic acid sample is obtained at a different time than the first nucleic acid sample.
53. The method of claim 47 wherein the second nucleic acid sample is a sample from a different organism than the first nucleic acid sample.
54. The method of claim 47 wherein the second nucleic acid sample is a sample from a different type of tissue than the first nucleic acid sample.
55. The method of claim 47 wherein the second nucleic acid sample is a sample from a different species of organism than the first nucleic acid sample.
56. The method of claim 47 wherein the second nucleic acid sample is a sample from a different strain of organism than the first nucleic acid sample.
57. The method of claim 47 wherein the second nucleic acid sample is a sample from a different cellular compartment than the first nucleic acid sample.
58. The method of claim 47 further comprising identifying or preparing nucleic acid fragments coπesponding the nucleic acid fragments present in the first nucleic acid sample but not present in the second nucleic acid sample.
59. The method of claim 58 further comprising using the nucleic acid fragments as probes.
60. The method of claim 59 wherein using the nucleic acid fragments as probes is accomplished by repeating steps (a) through (d) with a different nucleic acid sample, wherein the nucleic acid fragments are used as detector probes in steps (d) and (e).
61. The method of claim 1 further comprising determining the sequence of a portion of at least one ofthe amplified fragments.
62. The method of claim 61 wherein the portion ofthe amplified fragment coπesponds to the sequence complementary to the primer sequence ofthe haiφin primer and the sequence adjacent to the sequence complementary to the primer sequence to which the detector probe hybridized.
63. The method of claim 62 further comprising detecting or amplifying a nucleic acid coπesponding to a nucleic acid fragment in the nucleic acid sample using a probe or primer based on the determined sequence ofthe portion ofthe nucleic acid fragment.
64. The method of claim 1 wherein each haiφin primer or detector probe contains a label, wherein coupling ofthe amplified fragments to the detector probes is detected via the label.
65. The method of claim 64 wherein each haiφin primer contains a label, wherein detecting coupling ofthe amplified fragments to the detector probes is accomplished by separating coupled fragments from uncoupled fragments, and detecting the labels ofthe coupled fragments.
66. The method of claim 65 wherein each different haiφin primer contains a different label, wherein each detector probe is associated with a capture tag or a sorting tag, wherein separating coupled fragments from uncoupled fragments is accomplished by separating the detector probes from the uncoupled fragments using the capture tags or sorting tags, wherein the coupled fragments separate with the detector probes.
67. The method of claim 66 wherein the sorting tag is a fluorescent label, and wherein separating the detector probes from the uncoupled fragments is accomplished using a fluorescent label sorter.
68. The method of claim 64 wherein each detector probe contains a label, wherein detecting coupling ofthe amplified fragments to the detector probes is accomplished by separating coupled detector probes from uncoupled detector probes, and detecting the labels ofthe detector probes.
69. The method of claim 68 wherein each different detector probe contains a different label, wherein each amplified fragment is associated with a capture tag or a sorting tag, wherein separating coupled detector probes from uncoupled detector probes is accomplished by separating the amplified fragments from the uncoupled detector probes using the capture tags or sorting tags, wherein the coupled detector probes separate with the amplified fragments.
70. The method of claim 69 wherein the sorting tag is a fluorescent label, and wherein separating the amplified fragments from the uncoupled detector probes is accomplished using a fluorescent label sorter.
71. The method of claim 64 wherein the labels are fluorescent, phosphorescent, or chemiluminescent labels.
72. The method of claim 71 wherein at least two ofthe labels are distinguished temporally via different fluorescent, phosphorescent, or chemiluminescent emission lifetimes.
73. The method of claim 64 wherein the labels are detectable by nuclear magnetic resonance, electron paramagnetic resonance, surface enhanced raman scattering, surface plasmon resonance, fluorescence, phosphorescence, chemiluminescence, resonance raman, microwave, or a combination.
74. The method of claim 73 wherein the label is detected using nuclear magnetic resonance, electron paramagnetic resonance, surface enhanced raman scattering, surface plasmon resonance, fluorescence, phosphorescence, chemiluminescence, resonance raman, microwave, or a combination.
75. The method of claim 64 wherein the labels are beads comprising a label.
76. The method of claim 75 wherein the label is a molecular barcode.
77. The method of claim 64 wherein the labels are mass labels.
78. The method of claim 1 further comprising performing steps (a) through (e) on a control nucleic acid sample, identifying differences between the nucleic acid sample and the control nucleic acid sample in the pattern of amplified fragments coupled to different detector probes.
79. The method of claim 78 wherein the haiφin primers used with the control nucleic acid sample contain a different label from the label ofthe haiφin primers used with the nucleic acid sample, wherein the control nucleic acid samples are mixed with coπesponding nucleic acid samples prior to step (d).
80. The method of claim 1 further comprising performing steps (a) through (e) on a plurality of nucleic acid samples.
81. The method of claim 80 further comprising performing steps (a) through (e) on a control nucleic acid sample, identifying differences between the nucleic acid samples and the control nucleic acid sample in the pattern of amplified fragments coupled to different detector probes.
82. The method of claim 80 further comprising identifying differences between the nucleic acid samples in the pattern of amplified fragments coupled to different detector probes.
83. A method of identifying a nucleic acid sequence in a nucleic acid sample, the method comprising
(a) mixing a nucleic acid sample with a haiφin primer and a second primer, wherein the haiφin primer and the second primer comprise primer sequences complementary to sequences flanking and on opposite strands of a nucleic acid sequence of interest,
(b) incubating the nucleic acid sample under conditions that promote amplification ofthe nucleic acid sequence of interest, wherein a nucleic acid fragment is formed which comprises the nucleic acid sequence of interest flanked by sequences ofthe haiφin primer and the second primer,
(c) incubating the nucleic acid sample under conditions that promote formation of a haiφin structure by the sequence ofthe haiφin primer,
(d) hybridizing the nucleic acid sample with a plurality of detector probes and coupling the haiφin structure to a probe, wherein each probe has a different sequence, and
(e) detecting, directly or indirectly, coupling of the nucleic acid fragment to a detector probe.
84. A method of identifying nucleic acid fragments in a nucleic acid sample, the method comprising
(a) dividing the sample into a plurality of index samples, wherein the index samples are organized into sets of index samples wherein each set comprises a plurality of index samples,
(b) mixing each index sample in a set of index samples with one or more different haiφin primers, wherein each haiφin primer comprises a different primer sequence, and mixing each index sample with one or more different second primers,
(c) incubating the index samples under conditions that promote amplification of nucleic acids in the samples, wherein amplified nucleic acid fragments are formed which are flanked by haiφin primer sequences and second primer sequences,
(d) incubating the index samples under conditions that promote formation of haiφin structures by the haiφin primer sequences at the ends ofthe amplified fragments,
(e) hybridizing each index sample with a plurality of detector probes and coupling the haiφin structures to the probes, wherein in a given set of detector probes each probe has a different sequence, and
(f) detecting, directly or indirectly, coupling ofthe amplified fragments to different detector probes.
85. A method of comparing nucleic acid samples, the method comprising
(a) comparing a catalog of nucleic acid fragments in a first nucleic acid sample with a catalog of nucleic acid fragments in a second nucleic acid sample, and
(b) identifying or preparing nucleic acid fragments coπesponding to the nucleic acid fragments present in the first nucleic acid sample but not present in the second nucleic acid sample; wherein the catalogs of nucleic acid fragments are each produced by
(i) mixing the nucleic acid sample with one or more different haiφin primers, wherein each haiφin primer comprises a different primer sequence,
(ii) incubating the sample under conditions that promote amplification of nucleic acids in the sample, wherein amplified nucleic acid fragments are formed which have haiφin primer sequences at one or both ends,
(iii) incubating the sample under conditions that promote formation of haiφin structures by the haiφin primer sequences at the ends ofthe amplified fragments,
(iv) hybridizing the sample with a plurality of detector probes and covalently coupling the haiφin structures to the probes, wherein each probe has a different sequence, and
(v) detecting, directly or indirectly, coupling ofthe amplified fragments to the detector probes, wherein the pattern ofthe presence, amount, presence and amount, or absence of amplified fragments coupled to different detector probes constitutes the catalog of nucleic acid fragments in the nucleic acid sample.
86. The method of claim 85 wherein nucleic acid fragments present in the first nucleic acid sample but not present in the second nucleic acid sample are nucleic acid fragments present at a threshold amount in the first nucleic acid sample but not present at the threshold amount in the second nucleic acid sample.
87. The method of claim 85 wherein the pattern ofthe presence, amount, presence and amount, or absence of amplified fragments coupled to different detector probes is embodied by the sequences represented by the coupled amplified fragments.
88. The method of claim 87 wherein each represented sequence coπesponds to sequence complementary to the primer sequence ofthe haiφin primer and sequence adjacent to the sequence complementary to the primer sequence to which the detector probe hybridized.
89. The method of claim 85 wherein the probes are all ofthe same length.
90. The method of claim 85 wherein the probes all have similar hybrid stability.
91. A method of identifying nucleic acid fragments in a nucleic acid sample, the method comprising
(a) incubating a nucleic acid sample with one or more nucleic acid cleaving reagents that collectively generate sticky ends having a plurality of different sequences to produce nucleic acid fragments with sticky ends,
(b) dividing the sample into a plurality of index samples,
(c) mixing a different adaptor-indexer with each index sample and covalently coupling the adaptor-indexers to the nucleic acid fragments, wherein each adaptor- indexer has a different sticky end, wherein each sticky end ofthe adaptor-indexes is compatible with a sticky end generated by the nucleic acid cleaving reagents, wherein each index sample has a different adaptor-indexer,
(d) mixing each index sample with one or more different haiφin primers, wherein each haiφin primer comprises a different primer sequence, wherein each primer sequence is complementary to all or part ofthe sequence of at least one ofthe adaptor-indexers,
(e) incubating the index samples under conditions that promote amplification of nucleic acids in the samples, wherein amplified nucleic acid fragments are formed which have haiφin primer sequences at one or both ends,
(f) incubating the index samples under conditions that promote formation of haiφin structures by the haiφin primer sequences at the ends ofthe amplified fragments,
(g) hybridizing each index sample with a plurality of detector probes and covalently coupling the haiφin structures to the probes, wherein each probe has a different sequence, and
(h) detecting coupling ofthe amplified fragments to different detector probes.
92. The method of claim 91 further comprising determining the sequence of a portion of at least one ofthe amplified fragments.
93. The method of claim 92 wherein the portion ofthe nucleic acid fragments coπesponds to the sticky end sequence, the sequence adjacent to the sticky end sequence to which the detector probe hybridized, and the recognition sequence ofthe nucleic acid cleaving reagent.
94. The method of claim 93 wherein the portion includes a gap of known length but unknown sequence between the sequence adjacent to the sticky end and the recognition sequence ofthe nucleic acid cleaving reagent.
95. The method of claim 94 wherein the portion has the structure
A-B-C-D wherein A is the recognition sequence ofthe nucleic acid cleaving reagent, B is the gap of unknown sequence, C is the sequence to which the detector probe hybridized, and D is the sticky end sequence.
96. The method of claim 91 wherein, for at least one ofthe haiφin primer sequences, the sticky end sequence is involved in the stem ofthe haiφin structure but none ofthe fragment sequence adjacent to the sticky end sequence is involved in the stem ofthe haiφin structure.
97. The method of claim 91 wherein the probes are all ofthe same length.
98. The method of claim 91 wherein the probes all have similar hybrid stability.
99. The method of claim 91 further comprising, prior to step (c), incubating the index samples with one or more second nucleic acid cleaving reagents, and mixing a second adaptor with each digested index sample and covalently coupling the second adaptor to the nucleic acid fragments, wherein the second adaptor has an end compatible with the end generated by one ofthe second nucleic acid cleaving reagents.
100. The method of claim 99 further comprising, prior to digestion with the second nucleic acid cleaving reagents, dividing each index sample into a set of two or more of secondary index samples, wherein each secondary index sample in each set of secondary index samples is digested with a different set of one or more second nucleic acid cleaving reagents.
101. The method of 99 further comprising, simultaneous with step (d) mixing each secondary index sample in each set of secondary index samples with a different set of one or more second primers, wherein each secondary primer is complementary to all or part ofthe sequence of at least one ofthe second adaptors.
102. The method of claim 91 further comprising, following step (e), separating the strands ofthe amplified fragments and proceeding with step (d) using only one ofthe strands.
103. The method of claim 91 further comprising, following covalent coupling in step (e), incubating the index samples with T4 endonuclease VII.
104. The method of claim 91 wherein each haiφin primer contains a label, wherein the presence ofthe amplified fragments is detected via the label.
105. The method of claim 91 further comprising performing steps (a) through (f) on a control nucleic acid sample to produce control index samples, wherein the haiφin primers used with the control nucleic acid sample contain a different label from the label ofthe haiφin primers used with the nucleic acid sample, mixing the control index samples with coπesponding index samples and proceeding with step (g) by hybridizing the mixed samples with the detector probes, identifying differences between the nucleic acid sample and the control nucleic acid sample in the pattern of amplified fragments coupled to different detector probes.
106. A kit comprising a set of haiφin primers wherein each haiφin primer has a different primer sequence, and a plurality of detector probes, wherein each probe has a different sequence.
107. The kit of claim 106 wherein the detector probes are six, seven, or eight nucleotides long.
108. The kit of claim 106 wherein at least one haiφin primer, at least one detector probe, or a combination, contains a label.
109. The kit of claim 108 wherein the labels are fluorescent, phosphorescent, or chemiluminescent labels.
110. The kit of claim 109 wherein at least two ofthe labels are distinguished temporally via different fluorescent, phosphorescent, or chemiluminescent emission lifetimes.
111. The kit of claim 108 wherein the labels are detectable by nuclear magnetic resonance, electron paramagnetic resonance, surface enhanced raman scattering, surface plasmon resonance, fluorescence, phosphorescence, chemiluminescence, resonance raman, microwave, or a combination.
112. The kit of claim 108 wherein the labels are beads comprising a label.
113. The method of claim 112 wherein the label is a molecular barcode.
114. The kit of claim 108 wherein the labels are mass labels.
115. The kit of claim 106 wherein each detector probe is immobilized on a substrate.
116. The kit of claim 106 wherein each haiφin primer or detector probe is associated with a capture tag, sorting tag, or both.
117. The kit of claim 106 wherein the detector probes are nucleic acid fragments prepared by
(a) mixing one or more nucleic acid samples with one or more different haiφin primers, wherein each haiφin primer comprises a different primer sequence,
(b) incubating the samples under conditions that promote amplification of nucleic acids in the samples, wherein amplified nucleic acid fragments are formed which have haiφin primer sequences at one or both ends,
(c) incubating the samples under conditions that promote formation of haiφin structures by the haiφin primer sequences at the ends ofthe amplified fragments,
(d) hybridizing each sample with a plurality of detector probes and covalently coupling the haiφin structures to the probes, wherein each probe has a different sequence, and
(e) detecting, directly or indirectly, coupling ofthe amplified fragments to the detector probes, wherein the pattern of amplified fragments coupled to different detector probes constitutes a catalog of nucleic acid fragments in the nucleic acid sample, (f) preparing a second catalog of nucleic acid fragments in a second nucleic acid sample and comparing the first catalog and second catalog, and
(g) preparing nucleic acid fragments coπesponding the nucleic acid fragments present in the first nucleic acid sample but not present in the second nucleic acid sample.
118. The kit of claim 106 wherein the probes are all ofthe same length.
119. The kit of claim 106 wherein the probes all have similar hybrid stability.
120. The kit of claim 106 further comprising a set of adaptor-indexers wherein each adaptor-indexer has a different sticky end, wherein each sticky end of the adaptor-indexes is compatible with a sticky end generated by a restriction enzyme that generates sticky ends having a plurality of different sequences.
PCT/US2000/022246 1999-08-13 2000-08-11 Analysis of sequence tags with hairpin primers WO2001012856A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU67708/00A AU6770800A (en) 1999-08-13 2000-08-11 Analysis of sequence tags with hairpin primers

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US14887099P 1999-08-13 1999-08-13
US60/148,870 1999-08-13
US09/544,713 2000-04-06
US09/544,713 US6261782B1 (en) 1999-04-06 2000-04-06 Fixed address analysis of sequence tags

Publications (2)

Publication Number Publication Date
WO2001012856A2 true WO2001012856A2 (en) 2001-02-22
WO2001012856A3 WO2001012856A3 (en) 2002-01-24

Family

ID=26846245

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2000/022164 WO2001012855A2 (en) 1999-08-13 2000-08-11 Binary encoded sequence tags
PCT/US2000/022246 WO2001012856A2 (en) 1999-08-13 2000-08-11 Analysis of sequence tags with hairpin primers

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/US2000/022164 WO2001012855A2 (en) 1999-08-13 2000-08-11 Binary encoded sequence tags

Country Status (8)

Country Link
US (4) US6383754B1 (en)
EP (1) EP1206577B1 (en)
JP (1) JP2003527087A (en)
AT (1) ATE318932T1 (en)
AU (2) AU6770800A (en)
CA (1) CA2383264A1 (en)
DE (1) DE60026321D1 (en)
WO (2) WO2001012855A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1550718A1 (en) * 2002-07-30 2005-07-06 Tum Gene, Inc. Method of detecting base mutation
US8313905B2 (en) 2008-01-24 2012-11-20 Samsung Electronics Co., Ltd. Detection oligomer and method for controlling quality of biochip using detection oligomer

Families Citing this family (189)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994003624A1 (en) * 1992-08-04 1994-02-17 Auerbach Jeffrey I Methods for the isothermal amplification of nucleic acid molecules
US6261808B1 (en) * 1992-08-04 2001-07-17 Replicon, Inc. Amplification of nucleic acid molecules via circular replicons
US20070269799A9 (en) * 1994-06-22 2007-11-22 Zhang David Y Nucleic acid amplification methods
US6593086B2 (en) 1996-05-20 2003-07-15 Mount Sinai School Of Medicine Of New York University Nucleic acid amplification methods
US5942391A (en) * 1994-06-22 1999-08-24 Mount Sinai School Of Medicine Nucleic acid amplification method: ramification-extension amplification method (RAM)
US5854033A (en) * 1995-11-21 1998-12-29 Yale University Rolling circle replication reporter systems
US6683173B2 (en) * 1998-04-03 2004-01-27 Epoch Biosciences, Inc. Tm leveling methods
JP2002540802A (en) * 1999-04-06 2002-12-03 イェール ユニバーシティ Fixed address analysis of sequence indicators
WO2001012855A2 (en) * 1999-08-13 2001-02-22 Yale University Binary encoded sequence tags
JP3596868B2 (en) * 2000-11-24 2004-12-02 キヤノン株式会社 Method for evaluating the amount of probe in end-labeled probe array, and method for evaluating amount of target substance using labeled probe array
DE10060827A1 (en) * 2000-12-07 2002-06-13 Basf Lynx Bioscience Ag Methods of coding hybridization probes
EP1381697B1 (en) * 2001-02-27 2009-03-18 Virco Bvba Circular probe amplification (cpa) of circularized nucleic acid molecules
KR20030031911A (en) * 2001-04-19 2003-04-23 싸이퍼젠 바이오시스템즈, 인코포레이티드 Biomolecule characterization using mass spectrometry and affinity tags
CA2344599C (en) * 2001-05-07 2011-07-12 Bioneer Corporation Selective polymerase chain reaction of dna of which base sequence is completely unknown
US7414111B2 (en) * 2001-09-19 2008-08-19 Alexion Pharmaceuticals, Inc. Engineered templates and their use in single primer amplification
DE60234369D1 (en) * 2001-09-19 2009-12-24 Alexion Pharma Inc MANIPULATED MATRICES AND THEIR USE IN SINGLE PRIMER AMPLIFICATION
US20110151438A9 (en) 2001-11-19 2011-06-23 Affymetrix, Inc. Methods of Analysis of Methylation
US20030119004A1 (en) * 2001-12-05 2003-06-26 Wenz H. Michael Methods for quantitating nucleic acids using coupled ligation and amplification
US7553619B2 (en) * 2002-02-08 2009-06-30 Qiagen Gmbh Detection method using dissociated rolling circle amplification
ATE423224T1 (en) * 2002-04-26 2009-03-15 Solexa Inc CONSTANT LENGTH SIGNATURES FOR PARALLEL SEQUENCING OF POLYNUCLEOTIDES
US7115370B2 (en) 2002-06-05 2006-10-03 Capital Genomix, Inc. Combinatorial oligonucleotide PCR
WO2003106672A2 (en) * 2002-06-12 2003-12-24 Riken METHOD FOR UTILIZING THE 5'END OF mRNA FOR CLONING AND ANALYSIS
US7108976B2 (en) * 2002-06-17 2006-09-19 Affymetrix, Inc. Complexity management of genomic DNA by locus specific amplification
US9388459B2 (en) * 2002-06-17 2016-07-12 Affymetrix, Inc. Methods for genotyping
CN1182256C (en) * 2002-08-09 2004-12-29 周国华 Gene expression amount comparing analysis method
US7619819B2 (en) 2002-08-20 2009-11-17 Illumina, Inc. Method and apparatus for drug product tracking using encoded optical identification elements
US7872804B2 (en) 2002-08-20 2011-01-18 Illumina, Inc. Encoded particle having a grating with variations in the refractive index
US7900836B2 (en) 2002-08-20 2011-03-08 Illumina, Inc. Optical reader system for substrates having an optically readable code
US7923260B2 (en) 2002-08-20 2011-04-12 Illumina, Inc. Method of reading encoded particles
US7901630B2 (en) 2002-08-20 2011-03-08 Illumina, Inc. Diffraction grating-based encoded microparticle assay stick
US7508608B2 (en) 2004-11-17 2009-03-24 Illumina, Inc. Lithographically fabricated holographic optical identification element
US7164533B2 (en) 2003-01-22 2007-01-16 Cyvera Corporation Hybrid random bead/chip based microarray
US7092160B2 (en) 2002-09-12 2006-08-15 Illumina, Inc. Method of manufacturing of diffraction grating-based optical identification element
US20100255603A9 (en) * 2002-09-12 2010-10-07 Putnam Martin A Method and apparatus for aligning microbeads in order to interrogate the same
AU2003267192A1 (en) * 2002-09-12 2004-04-30 Cyvera Corporation Method and apparatus for aligning elongated microbeads in order to interrogate the same
US7459273B2 (en) * 2002-10-04 2008-12-02 Affymetrix, Inc. Methods for genotyping selected polymorphism
US20040235005A1 (en) * 2002-10-23 2004-11-25 Ernest Friedlander Methods and composition for detecting targets
US20040110134A1 (en) * 2002-12-05 2004-06-10 Wenz H. Michael Methods for quantitating nucleic acids using coupled ligation and amplification
US7291459B2 (en) * 2002-12-10 2007-11-06 University Of Alabama At Huntsville Nucleic acid detector and method of detecting targets within a sample
US20040121338A1 (en) * 2002-12-19 2004-06-24 Alsmadi Osama A. Real-time detection of rolling circle amplification products
US9487823B2 (en) 2002-12-20 2016-11-08 Qiagen Gmbh Nucleic acid amplification
US7799576B2 (en) 2003-01-30 2010-09-21 Dh Technologies Development Pte. Ltd. Isobaric labels for mass spectrometric analysis of peptides and method thereof
US20040209299A1 (en) * 2003-03-07 2004-10-21 Rubicon Genomics, Inc. In vitro DNA immortalization and whole genome amplification using libraries generated from randomly fragmented DNA
US8206913B1 (en) 2003-03-07 2012-06-26 Rubicon Genomics, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
US8043834B2 (en) 2003-03-31 2011-10-25 Qiagen Gmbh Universal reagents for rolling circle amplification and methods of use
EP1627074B1 (en) * 2003-05-09 2008-07-09 Peter Winter Use of a type iii restriction enzyme to isolate sequence tags comprising more than 25 nucleotides
US20040248103A1 (en) * 2003-06-04 2004-12-09 Feaver William John Proximity-mediated rolling circle amplification
WO2005001129A2 (en) * 2003-06-06 2005-01-06 Applera Corporation Mobility cassettes
CN1836050A (en) * 2003-07-07 2006-09-20 单细胞系统公司 Hairpin-labeled probes and methods of use
US8114978B2 (en) * 2003-08-05 2012-02-14 Affymetrix, Inc. Methods for genotyping selected polymorphism
EP2339024B1 (en) * 2003-10-21 2014-12-31 Orion Genomics, LLC Methods for quantitative determination of methylation density in a DNA locus
US20050186590A1 (en) * 2003-11-10 2005-08-25 Crothers Donald M. Nucleic acid detection method having increased sensitivity
US20050148087A1 (en) * 2004-01-05 2005-07-07 Applera Corporation Isobarically labeled analytes and fragment ions derived therefrom
GB0400584D0 (en) * 2004-01-12 2004-02-11 Solexa Ltd Nucleic acid chacterisation
WO2005080604A2 (en) * 2004-02-12 2005-09-01 Compass Genetics, Llc Genetic analysis by sequence-specific sorting
US7433123B2 (en) * 2004-02-19 2008-10-07 Illumina, Inc. Optical identification element having non-waveguide photosensitive substrate with diffraction grating therein
CA2559209C (en) * 2004-03-08 2016-06-07 Rubicon Genomics, Inc. Methods and compositions for generating and amplifying dna libraries for sensitive detection and analysis of dna methylation
US7622281B2 (en) * 2004-05-20 2009-11-24 The Board Of Trustees Of The Leland Stanford Junior University Methods and compositions for clonal amplification of nucleic acid
US7575863B2 (en) * 2004-05-28 2009-08-18 Applied Biosystems, Llc Methods, compositions, and kits comprising linker probes for quantifying polynucleotides
WO2005118877A2 (en) * 2004-06-02 2005-12-15 Vicus Bioscience, Llc Producing, cataloging and classifying sequence tags
AU2005255348B2 (en) * 2004-06-18 2009-12-17 Real Time Genomics, Ltd Data collection cataloguing and searching method and system
US20060000899A1 (en) * 2004-07-01 2006-01-05 American Express Travel Related Services Company, Inc. Method and system for dna recognition biometrics on a smartcard
US7620402B2 (en) * 2004-07-09 2009-11-17 Itis Uk Limited System and method for geographically locating a mobile device
US20060172319A1 (en) * 2004-07-12 2006-08-03 Applera Corporation Mass tags for quantitative analyses
US20070048752A1 (en) * 2004-07-12 2007-03-01 Applera Corporation Mass tags for quantitative analyses
US20060057613A1 (en) * 2004-07-26 2006-03-16 Nanosphere, Inc. Method for distinguishing methicillin resistant S. aureus from methicillin sensitive S. aureus in a mixed culture
US7867703B2 (en) * 2004-08-26 2011-01-11 Agilent Technologies, Inc. Element defined sequence complexity reduction
DE602005025782D1 (en) * 2004-09-21 2011-02-17 Life Technologies Corp TWO-TONE REAL-TIME / END POINT QUANTIFICATION OF MICRO-RNAS (MIRNAS)
WO2006055735A2 (en) * 2004-11-16 2006-05-26 Illumina, Inc Scanner having spatial light modulator
ATE459933T1 (en) 2004-11-16 2010-03-15 Illumina Inc METHOD AND APPARATUS FOR READING CODED MICROBALLS
WO2006065598A2 (en) * 2004-12-13 2006-06-22 Geneohm Sciences, Inc. Fluidic cartridges for electrochemical detection of dna
US7579153B2 (en) * 2005-01-25 2009-08-25 Population Genetics Technologies, Ltd. Isothermal DNA amplification
US7407757B2 (en) * 2005-02-10 2008-08-05 Population Genetics Technologies Genetic analysis by sequence-specific sorting
US8309303B2 (en) 2005-04-01 2012-11-13 Qiagen Gmbh Reverse transcription and amplification of RNA with simultaneous degradation of DNA
US7452671B2 (en) * 2005-04-29 2008-11-18 Affymetrix, Inc. Methods for genotyping with selective adaptor ligation
US7326772B2 (en) * 2005-05-12 2008-02-05 Penta Biotech, Inc. Peptide for assaying hERG channel binding
US20060292585A1 (en) * 2005-06-24 2006-12-28 Affymetrix, Inc. Analysis of methylation using nucleic acid arrays
US20070020640A1 (en) * 2005-07-21 2007-01-25 Mccloskey Megan L Molecular encoding of nucleic acid templates for PCR and other forms of sequence analysis
EP1924704B1 (en) 2005-08-02 2011-05-25 Rubicon Genomics, Inc. Compositions and methods for processing and amplification of dna, including using multiple enzymes in a single reaction
US8409804B2 (en) * 2005-08-02 2013-04-02 Rubicon Genomics, Inc. Isolation of CpG islands by thermal segregation and enzymatic selection-amplification method
US7871770B2 (en) 2005-08-09 2011-01-18 Maxwell Sensors, Inc. Light transmitted assay beads
US8232092B2 (en) * 2005-08-09 2012-07-31 Maxwell Sensors, Inc. Apparatus and method for digital magnetic beads analysis
US7858307B2 (en) * 2005-08-09 2010-12-28 Maxwell Sensors, Inc. Light transmitted assay beads
EP1762627A1 (en) 2005-09-09 2007-03-14 Qiagen GmbH Method for the activation of a nucleic acid for performing a polymerase reaction
GB0524069D0 (en) * 2005-11-25 2006-01-04 Solexa Ltd Preparation of templates for solid phase amplification
US8137936B2 (en) * 2005-11-29 2012-03-20 Macevicz Stephen C Selected amplification of polynucleotides
US11306351B2 (en) 2005-12-21 2022-04-19 Affymetrix, Inc. Methods for genotyping
US7932029B1 (en) * 2006-01-04 2011-04-26 Si Lok Methods for nucleic acid mapping and identification of fine-structural-variations in nucleic acids and utilities
US7368265B2 (en) * 2006-01-23 2008-05-06 Compass Genetics, Llc Selective genome amplification
WO2007101075A2 (en) * 2006-02-22 2007-09-07 Applera Corporation Double-ligation method for haplotype and large-scale polymorphism detection
US7901882B2 (en) * 2006-03-31 2011-03-08 Affymetrix, Inc. Analysis of methylation using nucleic acid arrays
US7830575B2 (en) 2006-04-10 2010-11-09 Illumina, Inc. Optical scanner with improved scan time
US7833716B2 (en) 2006-06-06 2010-11-16 Gen-Probe Incorporated Tagged oligonucleotides and their use in nucleic acid amplification methods
US20090002703A1 (en) * 2006-08-16 2009-01-01 Craig Edward Parman Methods and systems for quantifying isobaric labels and peptides
US20080293589A1 (en) * 2007-05-24 2008-11-27 Affymetrix, Inc. Multiplex locus specific amplification
US20090023190A1 (en) * 2007-06-20 2009-01-22 Kai Qin Lao Sequence amplification with loopable primers
WO2009049007A2 (en) * 2007-10-10 2009-04-16 Magellan Biosciences, Inc. Compositions, methods and systems for rapid identification of pathogenic nucleic acids
US9074244B2 (en) 2008-03-11 2015-07-07 Affymetrix, Inc. Array-based translocation and rearrangement assays
WO2009128938A1 (en) * 2008-04-17 2009-10-22 Maxwell Sensors, Inc. Hydrodynamic focusing for analyzing rectangular microbeads
US8029993B2 (en) 2008-04-30 2011-10-04 Population Genetics Technologies Ltd. Asymmetric adapter library construction
JP5378724B2 (en) * 2008-07-30 2013-12-25 株式会社日立ハイテクノロジーズ Expression mRNA identification method
US8383345B2 (en) 2008-09-12 2013-02-26 University Of Washington Sequence tag directed subassembly of short sequencing reads into long sequencing reads
JP5843614B2 (en) * 2009-01-30 2016-01-13 オックスフォード ナノポア テクノロジーズ リミテッド Adapters for nucleic acid constructs in transmembrane sequencing
KR20100089688A (en) * 2009-02-04 2010-08-12 삼성전자주식회사 Method for analyzing target nucleic acid
WO2010107825A2 (en) 2009-03-16 2010-09-23 Pangu Biopharma Limited Compositions and methods comprising histidyl-trna synthetase splice variants having non-canonical biological activities
CA2757289A1 (en) 2009-03-31 2010-10-21 Atyr Pharma, Inc. Compositions and methods comprising aspartyl-trna synthetases having non-canonical biological activities
US9085798B2 (en) 2009-04-30 2015-07-21 Prognosys Biosciences, Inc. Nucleic acid constructs and methods of use
US10787701B2 (en) 2010-04-05 2020-09-29 Prognosys Biosciences, Inc. Spatially encoded biological assays
US20190300945A1 (en) 2010-04-05 2019-10-03 Prognosys Biosciences, Inc. Spatially Encoded Biological Assays
PT2556171E (en) 2010-04-05 2015-12-21 Prognosys Biosciences Inc Spatially encoded biological assays
EP2563380B1 (en) 2010-04-26 2018-05-30 aTyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of cysteinyl-trna synthetase
AU2011248614B2 (en) 2010-04-27 2017-02-16 Pangu Biopharma Limited Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of isoleucyl tRNA synthetases
US8993723B2 (en) 2010-04-28 2015-03-31 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of alanyl-tRNA synthetases
WO2011135459A2 (en) 2010-04-29 2011-11-03 Medical Prognosis Institute A/S Methods and devices for predicting treatment efficacy
CA2797374C (en) 2010-04-29 2021-02-16 Pangu Biopharma Limited Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of asparaginyl trna synthetases
EP2563383B1 (en) 2010-04-29 2017-03-01 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of valyl trna synthetases
ES2623805T3 (en) 2010-05-03 2017-07-12 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic and antibody compositions related to phenylalanyl-alpha-tRNA synthetase protein fragments
CN103140233B (en) 2010-05-03 2017-04-05 Atyr 医药公司 Treatment, diagnosis and the discovery of antibody compositions related to the protein fragments of methionyl-tRNA synthetase
WO2011139986A2 (en) 2010-05-03 2011-11-10 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of arginyl-trna synthetases
JP6008844B2 (en) 2010-05-04 2016-10-19 エータイアー ファーマ, インコーポレイテッド Innovative discovery of therapeutic, diagnostic and antibody compositions related to protein fragments of the p38 MULTI-tRNA synthetase complex
AU2011252990B2 (en) 2010-05-14 2017-04-20 Pangu Biopharma Limited Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of phenylalanyl-beta-tRNA synthetases
CA2799480C (en) 2010-05-17 2020-12-15 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of leucyl-trna synthetases
EP2575856B1 (en) 2010-05-27 2017-08-16 aTyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of glutaminyl-trna synthetases
WO2011153277A2 (en) 2010-06-01 2011-12-08 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of lysyl-trna synthetases
CN102933721B (en) * 2010-06-09 2015-12-02 凯津公司 For the composite sequence barcode of high flux screening
WO2012021247A2 (en) 2010-07-12 2012-02-16 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of glycyl-trna synthetases
AU2011293294B2 (en) 2010-08-25 2016-03-24 Pangu Biopharma Limited Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of Tyrosyl-tRNA synthetases
GB201106254D0 (en) 2011-04-13 2011-05-25 Frisen Jonas Method and product
CA2836836A1 (en) 2011-06-01 2012-12-06 Medical Prognosis Institute A/S Methods and devices for prognosis of cancer relapse
JP6298404B2 (en) 2011-07-25 2018-03-20 オックスフォード ナノポール テクノロジーズ リミテッド Hairpin loop method for double-stranded polynucleotide sequencing using transmembrane pores
US20150329855A1 (en) * 2011-12-22 2015-11-19 Ibis Biosciences, Inc. Amplification primers and methods
EP2814514B1 (en) 2012-02-16 2017-09-13 Atyr Pharma, Inc. Histidyl-trna synthetases for treating autoimmune and inflammatory diseases
WO2013153911A1 (en) * 2012-04-12 2013-10-17 国立大学法人東京大学 Nucleic acid quantification method, detection probe, detection probe set, and nucleic acid detection method
US9695416B2 (en) 2012-07-18 2017-07-04 Siemens Healthcare Diagnostics Inc. Method of normalizing biological samples
US11155860B2 (en) 2012-07-19 2021-10-26 Oxford Nanopore Technologies Ltd. SSB method
WO2014085434A1 (en) 2012-11-27 2014-06-05 Pontificia Universidad Catolica De Chile Compositions and methods for diagnosing thyroid tumors
BR112015021788B1 (en) 2013-03-08 2023-02-28 Oxford Nanopore Technologies Plc METHODS FOR MOVING ONE OR MORE IMMOBILIZED HELICASES, FOR CONTROLLING THE MOVEMENT OF A TARGET POLYNUCLEOTIDE, FOR CHARACTERIZING A TARGET POLYNUCLEOTIDE, AND FOR CONTROLLING THE LOADING OF ONE OR MORE HELICASES INTO A TARGET POLYNUCLEOTIDE, USE OF A TRANSMEMBRANE PORE AND AN APPLIED POTENTIAL AND OF ONE OR MORE SPACERS, COMPLEX, AND, KIT
GB201314695D0 (en) 2013-08-16 2013-10-02 Oxford Nanopore Tech Ltd Method
WO2014172390A2 (en) 2013-04-15 2014-10-23 Cedars-Sinai Medical Center Methods for detecting cancer metastasis
WO2014195032A1 (en) 2013-06-07 2014-12-11 Medical Prognosis Institute A/S Methods and devices for predicting treatment efficacy of fulvestrant in cancer patients
WO2014210223A1 (en) 2013-06-25 2014-12-31 Prognosys Biosciences, Inc. Spatially encoded biological assays using a microfluidic device
EP4053560A1 (en) 2013-11-26 2022-09-07 The Brigham and Women's Hospital, Inc. Compositions and methods for modulating an immune response
ES2890078T3 (en) 2013-12-20 2022-01-17 Illumina Inc Conservation of genomic connectivity information in fragmented genomic DNA samples
GB201403096D0 (en) 2014-02-21 2014-04-09 Oxford Nanopore Tech Ltd Sample preparation method
CA2954764A1 (en) 2014-07-15 2016-01-21 Ontario Institute For Cancer Research Methods and devices for predicting anthracycline treatment efficacy
US10011861B2 (en) 2014-08-14 2018-07-03 Luminex Corporation Cleavable hairpin primers
CN107532207B (en) 2015-04-10 2021-05-07 空间转录公司 Spatially differentiated, multiplexed nucleic acid analysis of biological samples
GB201609220D0 (en) 2016-05-25 2016-07-06 Oxford Nanopore Tech Ltd Method
CA3030430A1 (en) 2016-07-15 2018-01-18 The Regents Of The University Of California Methods of producing nucleic acid libraries
KR101930835B1 (en) * 2016-11-29 2018-12-19 가천대학교 산학협력단 A method and a system for producing combinational logic network based on gene expression
KR101947138B1 (en) * 2017-05-02 2019-02-12 김성현 High-temperature Activation Primer useful for multiplex PCR and nucleic acid amplification method using the same
JP7296969B2 (en) 2018-01-12 2023-06-23 クラレット バイオサイエンス, エルエルシー Methods and compositions for analyzing nucleic acids
JP7096893B2 (en) * 2018-02-05 2022-07-06 エフ.ホフマン-ラ ロシュ アーゲー Preparation of single-stranded circular DNA templates for single molecules
GB201807793D0 (en) 2018-05-14 2018-06-27 Oxford Nanopore Tech Ltd Method
AU2019280712A1 (en) 2018-06-06 2021-01-07 The Regents Of The University Of California Methods of producing nucleic acid libraries and compositions and kits for practicing same
US11519033B2 (en) 2018-08-28 2022-12-06 10X Genomics, Inc. Method for transposase-mediated spatial tagging and analyzing genomic DNA in a biological sample
US11926867B2 (en) 2019-01-06 2024-03-12 10X Genomics, Inc. Generating capture probes for spatial analysis
US11649485B2 (en) 2019-01-06 2023-05-16 10X Genomics, Inc. Generating capture probes for spatial analysis
CN111855982B (en) * 2019-04-25 2022-07-22 武汉华大医学检验所有限公司 Method for detecting nucleic acid fragment length
EP3976820A1 (en) 2019-05-30 2022-04-06 10X Genomics, Inc. Methods of detecting spatial heterogeneity of a biological sample
WO2021091611A1 (en) 2019-11-08 2021-05-14 10X Genomics, Inc. Spatially-tagged analyte capture agents for analyte multiplexing
WO2021092433A2 (en) 2019-11-08 2021-05-14 10X Genomics, Inc. Enhancing specificity of analyte binding
SG11202106899SA (en) 2019-12-23 2021-09-29 10X Genomics Inc Methods for spatial analysis using rna-templated ligation
US11702693B2 (en) 2020-01-21 2023-07-18 10X Genomics, Inc. Methods for printing cells and generating arrays of barcoded cells
US11732299B2 (en) 2020-01-21 2023-08-22 10X Genomics, Inc. Spatial assays with perturbed cells
US11821035B1 (en) 2020-01-29 2023-11-21 10X Genomics, Inc. Compositions and methods of making gene expression libraries
US11898205B2 (en) 2020-02-03 2024-02-13 10X Genomics, Inc. Increasing capture efficiency of spatial assays
US11732300B2 (en) 2020-02-05 2023-08-22 10X Genomics, Inc. Increasing efficiency of spatial analysis in a biological sample
US11835462B2 (en) 2020-02-11 2023-12-05 10X Genomics, Inc. Methods and compositions for partitioning a biological sample
US11891654B2 (en) 2020-02-24 2024-02-06 10X Genomics, Inc. Methods of making gene expression libraries
US11926863B1 (en) 2020-02-27 2024-03-12 10X Genomics, Inc. Solid state single cell method for analyzing fixed biological cells
US11768175B1 (en) 2020-03-04 2023-09-26 10X Genomics, Inc. Electrophoretic methods for spatial analysis
EP4242325A3 (en) 2020-04-22 2023-10-04 10X Genomics, Inc. Methods for spatial analysis using targeted rna depletion
WO2021237087A1 (en) 2020-05-22 2021-11-25 10X Genomics, Inc. Spatial analysis to detect sequence variants
WO2021236929A1 (en) 2020-05-22 2021-11-25 10X Genomics, Inc. Simultaneous spatio-temporal measurement of gene expression and cellular activity
WO2021242834A1 (en) 2020-05-26 2021-12-02 10X Genomics, Inc. Method for resetting an array
EP4158054A1 (en) 2020-06-02 2023-04-05 10X Genomics, Inc. Spatial transcriptomics for antigen-receptors
WO2021247543A2 (en) 2020-06-02 2021-12-09 10X Genomics, Inc. Nucleic acid library methods
WO2021252499A1 (en) 2020-06-08 2021-12-16 10X Genomics, Inc. Methods of determining a surgical margin and methods of use thereof
WO2021252591A1 (en) 2020-06-10 2021-12-16 10X Genomics, Inc. Methods for determining a location of an analyte in a biological sample
CN116034166A (en) 2020-06-25 2023-04-28 10X基因组学有限公司 Spatial analysis of DNA methylation
US11761038B1 (en) 2020-07-06 2023-09-19 10X Genomics, Inc. Methods for identifying a location of an RNA in a biological sample
US11492662B2 (en) 2020-08-06 2022-11-08 Singular Genomics Systems, Inc. Methods for in situ transcriptomics and proteomics
US11926822B1 (en) 2020-09-23 2024-03-12 10X Genomics, Inc. Three-dimensional spatial analysis
US11827935B1 (en) 2020-11-19 2023-11-28 10X Genomics, Inc. Methods for spatial analysis using rolling circle amplification and detection probes
AU2021409136A1 (en) 2020-12-21 2023-06-29 10X Genomics, Inc. Methods, compositions, and systems for capturing probes and/or barcodes
EP4301870A1 (en) 2021-03-18 2024-01-10 10X Genomics, Inc. Multiplex capture of gene and protein expression from a biological sample
CN113552191B (en) * 2021-07-28 2023-11-21 江苏师范大学 Construction method of proportional electrochemical sensor for detecting methylated DNA based on multilayer DNA amplification loop
WO2023034489A1 (en) 2021-09-01 2023-03-09 10X Genomics, Inc. Methods, compositions, and kits for blocking a capture probe on a spatial array

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0427073A2 (en) * 1989-11-09 1991-05-15 Miles Inc. Nucleic acid amplification employing ligatable hairpin probe and transcription
WO1997031256A2 (en) * 1996-02-09 1997-08-28 Cornell Research Foundation, Inc. Detection of nucleic acid sequence differences using the ligase detection reaction with addressable arrays
US5710000A (en) * 1994-09-16 1998-01-20 Affymetrix, Inc. Capturing sequences adjacent to Type-IIs restriction sites for genomic library mapping
WO1998003673A1 (en) * 1996-07-19 1998-01-29 Cornell Research Foundation, Inc. High fidelity detection of nucleic acid differences by ligase detection reaction
WO1998023777A2 (en) * 1996-11-27 1998-06-04 Baylor College Of Medicine Assay for detecting apoptotic cells
US5770365A (en) * 1995-08-25 1998-06-23 Tm Technologies, Inc. Nucleic acid capture moieties
US5798210A (en) * 1993-03-26 1998-08-25 Institut Pasteur Derivatives utilizable in nucleic acid sequencing
US5800994A (en) * 1994-04-04 1998-09-01 Chiron Diagnostics Corporation Hybridization-ligation assays for the detection of specific nucleic acid sequences
WO1998040518A2 (en) * 1997-03-11 1998-09-17 Wisconsin Alumni Research Foundation Nucleic acid indexing
US5858656A (en) * 1990-04-06 1999-01-12 Queen's University Of Kingston Indexing linkers
WO2001012855A2 (en) * 1999-08-13 2001-02-22 Yale University Binary encoded sequence tags

Family Cites Families (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4883750A (en) 1984-12-13 1989-11-28 Applied Biosystems, Inc. Detection of specific sequences in nucleic acids
US5242794A (en) 1984-12-13 1993-09-07 Applied Biosystems, Inc. Detection of specific sequences in nucleic acids
US5871928A (en) 1989-06-07 1999-02-16 Fodor; Stephen P. A. Methods for nucleic acid analysis
US6074818A (en) * 1990-08-24 2000-06-13 The University Of Tennessee Research Corporation Fingerprinting of nucleic acids, products and methods
US5455166A (en) 1991-01-31 1995-10-03 Becton, Dickinson And Company Strand displacement amplification
US5607924A (en) 1992-01-21 1997-03-04 Pharmacyclics, Inc. DNA photocleavage using texaphyrins
DE69333650T2 (en) * 1992-02-19 2006-01-12 The Public Health Research Institute Of The City Of New York, Inc. NEW ARRANGEMENT OF OLIGONUCLEOTIDES AND THEIR BENEFITS FOR SORTING, ISOLATING, SEQUENCING AND MANIPULATING NUCLEIC ACIDS
GB9214873D0 (en) * 1992-07-13 1992-08-26 Medical Res Council Process for categorising nucleotide sequence populations
WO1994003624A1 (en) 1992-08-04 1994-02-17 Auerbach Jeffrey I Methods for the isothermal amplification of nucleic acid molecules
US5614389A (en) 1992-08-04 1997-03-25 Replicon, Inc. Methods for the isothermal amplification of nucleic acid molecules
US5733733A (en) 1992-08-04 1998-03-31 Replicon, Inc. Methods for the isothermal amplification of nucleic acid molecules
US5503980A (en) 1992-11-06 1996-04-02 Trustees Of Boston University Positional sequencing by hybridization
US5962221A (en) * 1993-01-19 1999-10-05 Univ Tennessee Res Corp Oligonucleotide constructs and methods for the generation of sequence signatures from nucleic acids
US5429307A (en) * 1993-06-14 1995-07-04 Apollo Sprayers International, Inc. Dual air supply spray gun
US6007987A (en) 1993-08-23 1999-12-28 The Trustees Of Boston University Positional sequencing by hybridization
US5429807A (en) 1993-10-28 1995-07-04 Beckman Instruments, Inc. Method and apparatus for creating biopolymer arrays on a solid support surface
US5552278A (en) * 1994-04-04 1996-09-03 Spectragen, Inc. DNA sequencing by stepwise ligation and cleavage
US5871918A (en) 1996-06-20 1999-02-16 The University Of North Carolina At Chapel Hill Electrochemical detection of nucleic acid hybridization
US5604097A (en) * 1994-10-13 1997-02-18 Spectragen, Inc. Methods for sorting polynucleotides using oligonucleotide tags
US5599695A (en) 1995-02-27 1997-02-04 Affymetrix, Inc. Printing molecular library arrays using deprotection agents solely in the vapor phase
DE69621507T2 (en) * 1995-03-28 2003-01-09 Japan Science & Tech Corp Method for molecular indexing of genes using restriction enzymes
US5968745A (en) 1995-06-27 1999-10-19 The University Of North Carolina At Chapel Hill Polymer-electrodes for detecting nucleic acid hybridization and method of use thereof
US5736330A (en) 1995-10-11 1998-04-07 Luminex Corporation Method and compositions for flow cytometric determination of DNA sequences
US5871697A (en) 1995-10-24 1999-02-16 Curagen Corporation Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing
AU714486B2 (en) 1995-11-21 2000-01-06 Yale University Unimolecular segment amplification and detection
US5854033A (en) 1995-11-21 1998-12-29 Yale University Rolling circle replication reporter systems
US5658736A (en) * 1996-01-16 1997-08-19 Genetics Institute, Inc. Oligonucleotide population preparation
US6613508B1 (en) * 1996-01-23 2003-09-02 Qiagen Genomics, Inc. Methods and compositions for analyzing nucleic acid molecules utilizing sizing techniques
US6312893B1 (en) * 1996-01-23 2001-11-06 Qiagen Genomics, Inc. Methods and compositions for determining the sequence of nucleic acid molecules
US5854413A (en) 1996-04-30 1998-12-29 Incyte Pharmaceuticals, Inc. Synaptogyrin homolog
HUP0003944A3 (en) * 1996-06-06 2003-08-28 Lynx Therapeutics Inc Hayward Sequencing by ligation of encoded adaptors
JP3996644B2 (en) 1996-06-07 2007-10-24 イムニベスト・コーポレイション Magnetic separation with external and internal gradients
US5866336A (en) * 1996-07-16 1999-02-02 Oncor, Inc. Nucleic acid amplification oligonucleotides with molecular energy transfer labels and methods based thereon
US6329180B1 (en) * 1996-09-13 2001-12-11 Alex M. Garvin Genetic analysis using peptide tagged in-vitro synthesized proteins
US5858671A (en) * 1996-11-01 1999-01-12 The University Of Iowa Research Foundation Iterative and regenerative DNA sequencing method
AU5794498A (en) * 1996-12-10 1998-07-03 Genetrace Systems, Inc. Releasable nonvolatile mass-label molecules
CA2286864A1 (en) * 1997-01-10 1998-07-16 Pioneer Hi-Bred International, Inc. Hybridization-based genetic amplification and analysis
US6023540A (en) 1997-03-14 2000-02-08 Trustees Of Tufts College Fiber optic sensor with encoded microspheres
US5888737A (en) * 1997-04-15 1999-03-30 Lynx Therapeutics, Inc. Adaptor-based sequence analysis
GB9707980D0 (en) * 1997-04-21 1997-06-11 Brax Genomics Ltd Characterising DNA
US6406845B1 (en) 1997-05-05 2002-06-18 Trustees Of Tuft College Fiber optic biosensor for selectively detecting oligonucleotide species in a mixed fluid sample
EP0994968A1 (en) * 1997-07-11 2000-04-26 Brax Group Limited Characterising nucleic acid
AU1080999A (en) 1997-10-14 1999-05-03 Luminex Corporation Precision fluorescently dyed particles and methods of making and using same
AU1603199A (en) 1997-12-03 1999-06-16 Curagen Corporation Methods and devices for measuring differential gene expression
KR100593712B1 (en) 1998-01-22 2006-06-30 루미넥스 코포레이션 Microparticles with Multiple Fluorescent Signals
US6562567B2 (en) * 1998-01-27 2003-05-13 California Institute Of Technology Method of detecting a nucleic acid
AU4318599A (en) 1998-06-08 1999-12-30 Xanthon, Inc. Electrochemical probes for detection of molecular interactions and drug discovery
US6294659B1 (en) 1998-07-15 2001-09-25 Duke University Photocleavable nucleoside base and nucleic acids including
US6037130A (en) * 1998-07-28 2000-03-14 The Public Health Institute Of The City Of New York, Inc. Wavelength-shifting probes and primers and their use in assays and kits
US6134586A (en) 1998-07-31 2000-10-17 Philips Electronics N.A. Corp. Striping data across disk zones
EP1102977A1 (en) 1998-08-08 2001-05-30 Imperial Cancer Research Technology Limited Fluorescence assay for biological systems
US6232067B1 (en) 1998-08-17 2001-05-15 The Perkin-Elmer Corporation Adapter directed expression analysis
US6629040B1 (en) * 1999-03-19 2003-09-30 University Of Washington Isotope distribution encoded tags for protein identification
JP2002540802A (en) 1999-04-06 2002-12-03 イェール ユニバーシティ Fixed address analysis of sequence indicators
US6613523B2 (en) * 2001-06-29 2003-09-02 Agilent Technologies, Inc. Method of DNA sequencing using cleavable tags
US7799576B2 (en) * 2003-01-30 2010-09-21 Dh Technologies Development Pte. Ltd. Isobaric labels for mass spectrometric analysis of peptides and method thereof

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0427073A2 (en) * 1989-11-09 1991-05-15 Miles Inc. Nucleic acid amplification employing ligatable hairpin probe and transcription
US5858656A (en) * 1990-04-06 1999-01-12 Queen's University Of Kingston Indexing linkers
US5798210A (en) * 1993-03-26 1998-08-25 Institut Pasteur Derivatives utilizable in nucleic acid sequencing
US5800994A (en) * 1994-04-04 1998-09-01 Chiron Diagnostics Corporation Hybridization-ligation assays for the detection of specific nucleic acid sequences
US5710000A (en) * 1994-09-16 1998-01-20 Affymetrix, Inc. Capturing sequences adjacent to Type-IIs restriction sites for genomic library mapping
US5770365A (en) * 1995-08-25 1998-06-23 Tm Technologies, Inc. Nucleic acid capture moieties
WO1997031256A2 (en) * 1996-02-09 1997-08-28 Cornell Research Foundation, Inc. Detection of nucleic acid sequence differences using the ligase detection reaction with addressable arrays
WO1998003673A1 (en) * 1996-07-19 1998-01-29 Cornell Research Foundation, Inc. High fidelity detection of nucleic acid differences by ligase detection reaction
WO1998023777A2 (en) * 1996-11-27 1998-06-04 Baylor College Of Medicine Assay for detecting apoptotic cells
WO1998040518A2 (en) * 1997-03-11 1998-09-17 Wisconsin Alumni Research Foundation Nucleic acid indexing
WO2001012855A2 (en) * 1999-08-13 2001-02-22 Yale University Binary encoded sequence tags

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CARPENTER WILLIAM R ET AL: "A transcriptionally amplified DNA probe assay with ligatable probes and immunochemical detection." CLINICAL CHEMISTRY, vol. 39, no. 9, 1993, pages 1934-1938, XP002137905 Twenty-fifth Annual Oak Ridge Conference on Return to the Future;Knoxville, Tennesse, USA; April 22-23, 1993 ISSN: 0009-9147 *
LANDEGREN ULF ET AL: "Detecting genes with ligases." METHODS (ORLANDO), vol. 9, no. 1, 1996, pages 84-90, XP000600242 ISSN: 1046-2023 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1550718A1 (en) * 2002-07-30 2005-07-06 Tum Gene, Inc. Method of detecting base mutation
EP1550718A4 (en) * 2002-07-30 2005-11-30 Tum Gene Inc Method of detecting base mutation
US7491492B2 (en) 2002-07-30 2009-02-17 Toppan Printing Co., Ltd. Method of detecting nucleotide mutations
US8313905B2 (en) 2008-01-24 2012-11-20 Samsung Electronics Co., Ltd. Detection oligomer and method for controlling quality of biochip using detection oligomer
GB2456670B (en) * 2008-01-24 2013-01-16 Samsung Electronics Co Ltd Detection oligomer and method for controlling quality of biochip using detection oligomer

Also Published As

Publication number Publication date
WO2001012855A3 (en) 2002-03-21
US6403319B1 (en) 2002-06-11
DE60026321D1 (en) 2006-04-27
US6383754B1 (en) 2002-05-07
CA2383264A1 (en) 2001-02-22
WO2001012856A3 (en) 2002-01-24
US6773886B2 (en) 2004-08-10
AU6770800A (en) 2001-03-13
ATE318932T1 (en) 2006-03-15
US20040265888A1 (en) 2004-12-30
EP1206577B1 (en) 2006-03-01
US20030082556A1 (en) 2003-05-01
AU6638000A (en) 2001-03-13
JP2003527087A (en) 2003-09-16
EP1206577A2 (en) 2002-05-22
WO2001012855A2 (en) 2001-02-22

Similar Documents

Publication Publication Date Title
US6403319B1 (en) Analysis of sequence tags with hairpin primers
AU778438B2 (en) Fixed address analysis of sequence tags
JP3863189B2 (en) Characterization to DNA
EP1124990B1 (en) Complexity management and analysis of genomic dna
US6156502A (en) Arbitrary sequence oligonucleotide fingerprinting
CA2307674C (en) Probe arrays and methods of using probe arrays for distinguishing dna
EP1259643B1 (en) Nucleic acid detection methods using universal priming
US5093245A (en) Labeling by simultaneous ligation and restriction
AU754849B2 (en) DNA polymorphism identity determination using flow cytometry
US20060223094A1 (en) Methods and compositions for producing labeled probe nucleic acids for use in array based comparative genomic hybridization applications
WO2007087312A2 (en) Molecular counting
JP2002518060A (en) Nucleotide detection method
MXPA03000575A (en) Methods for analysis and identification of transcribed genes, and fingerprinting.
AU733924B2 (en) Characterising DNA
US20030044827A1 (en) Method for immobilizing DNA
JP2004500062A (en) Methods for selectively isolating nucleic acids

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP