WO1995015400A1 - Genotyping by simultaneous analysis of multiple microsatellite loci - Google Patents

Genotyping by simultaneous analysis of multiple microsatellite loci Download PDF

Info

Publication number
WO1995015400A1
WO1995015400A1 PCT/US1994/013945 US9413945W WO9515400A1 WO 1995015400 A1 WO1995015400 A1 WO 1995015400A1 US 9413945 W US9413945 W US 9413945W WO 9515400 A1 WO9515400 A1 WO 9515400A1
Authority
WO
WIPO (PCT)
Prior art keywords
pcr
dna
primer
primer pair
labelled
Prior art date
Application number
PCT/US1994/013945
Other languages
French (fr)
Inventor
Roy C. Levitt
Original Assignee
The Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Johns Hopkins University filed Critical The Johns Hopkins University
Publication of WO1995015400A1 publication Critical patent/WO1995015400A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification

Definitions

  • This invention is directed to semi-automated methods for linkage mapping of the genome by genotyping of multiple microsatellite loci. Summary of Background Information
  • Each disease phenotype will still need to be "mapped" or associated with a particular location in the genome. This is usually carried out by analyzing DNA isolated from blood specimens collected from individuals within families affected by a genetic disorder. Once a disorder or abnormal phenotype has been linked to a particular region on a chromosome, the limited number of genes within this area will permit us to suggest a candidate gene that can contribute to the phenotype. Thus, once the localization of a major disease phenotype to a chromosomal region is confirmed, a few candidate genes can be examined for mutations as well as potential pathogenic mechanisms.
  • the chromosomes are the basic units of inheritance on which genes and DNA markers are organized in a linear fashion (see Figure 1). Linkage is evident when a gene(s) that produces a phenotypic trait, or a significant portion of the trait, and the surrounding DNA markers are inherited together (cosegregate at meiosis). In contrast, those markers that are not associated with the anomalous phenotype of interest will be randomly distributed among affected family members as a result of the independent assortment of chromosomes and crossing over during meiosis (see Figure 2, compare "A" markers to "B"-"F” markers).
  • a marker, or gene is from the genetic locus of interest (for example, markers 1 and 4 as compared to markers 2 and 3 in Figure 1), the more likely they will be separated by crossing over at meiosis.
  • the recombinant genotypes produced by crossing over between maternal and paternal chromosomes at meiosis allows us to predict the ordering of genes and markers through the interval under examination. Recombination between the markers 1A and 3A, and 2A and 4A in the affected members in Figure 2, suggest that the mutant gene of interest lies between markers 1 and 4.
  • linkage to a marker of known chromosomal location allows placement of the phenotype on the chromosomal map.
  • DNA markers are used to recognize each of the parental chromosomes. Recall that in general each chromosome is inherited independently of any other; and the likelihood of inheriting either chromosome of a pair from each parent is 50:50. Therefore, when a marker is unlinked to the gene(s) producing an anomalous phenotype, one expects both the maternal and paternal chromosomes to be equally distributed in the affected offspring. Linkage in the human is established by the method of likelihood ratios (see Ott, 1992
  • SBBST ⁇ I or 3 to the probability that it would arise under an alternative hypothesis (typically, nonlinkage).
  • the ratio of these probabilities is called the odds ratio for one hypothesis relative to the other.
  • mammalian geneticists prefer the log of the odds ratio, or the lod score.
  • the maximum likelihood estimate is the recombination fraction where the likelihood ratio is largest.
  • Lod scores from multiple pedigrees are thus added until the score grows to 3 (signifying 1000:1 odds) or falls to -2 (indicating 1:100 odds). Linkage can be easily evaluated using likelihood ratios, even in complicated pedigrees, by testing on the computer for these competing hypothesis.
  • Additional strategies have been devised that can handle genetic heterogeneity more effectively (Oh, 1974, Am. J. Hum. Genet. , 2 ⁇ :588-597) as well as disorders caused by multiple genes (Lander, et al., 1986, Proc. Natl. Acad. Sci. USA, 32:7353-7357).
  • microsatellite markers are highly polymorphic, simple sequence repeat (SSR) markers, generally defined as repeats of 6 bp or less running in tandem for up to 100 bp long (Beckmann, et al. , 1992, Genomics, 12:627-631). These repeat sequences are flanked by unique SSR markers.
  • DNA sequences that may be identified for each marker location can be identified for each marker location.
  • primers that correspond to the unique DNA sequence surrounding each marker the polymerase chain reaction (PCR, see, e.g., Saiki, et al., 1988, Science, 222:489) can be used to detect each polymorphism.
  • SSR This type of genetic marker is abundant and found throughout the genome. SSR may be as frequent as one every 6 kb (Beckmann, et al. , 1992). Where SSR markers show considerable polymorphism (differences in the number of repeats) between individuals, the markers can be particularly informative. Many such SSR markers have been isolated throughout the genome, and are well mapped (Weissenbach, et al., 1992). Many of these SSR markers are now available commercially for linkage studies (e.g., from Research Genetics, Huntsville, AL). Those markers which frequently allow the investigator to identify each parental chromosome as unique and to identify each crossover rapidly (see Figure 2) approach the ideal for linkage studies.
  • PCR polymerase chain reaction
  • SSR simple sequence repeat
  • this invention provides highly informative SSR markers, assembled into "SETS" that do not overlap in size when separated electrophoretically on an acrylamide gel and that can be labelled with different fluorophores.
  • Each SET contains 6 or more pairs of primers that provide for amplification of markers (preferably 7-8 pairs of primers) that have been labelled with the same fluorophore having a distinct color, separate SETs having different fluorophore labels (e.g., blue, green, or yellow).
  • PCR products corresponding to these SETS are combined into a GROUP for electrophoretic analysis in a single lane.
  • a GROUP of 18 or more, preferably 21 to 24 dinucleotide markers can be electrophoresed along with an internal size standard and analyzed simultaneously (multiplexing) in real-time for each individual studied.
  • the invention provides a kit for use in automated genotyping within a population comprising four or more GROUPS, each GROUP containing at least three SETS, and each SET in turn comprising at least 6 labelled pairs of primers for amplification of DNA by polymerase chain reaction (PCR), the sequence of each primer pair corresponding to a portion of the unique genomic sequence of a microsatellite sequence (which is made up of a nucleotide repeat sequence flanked by unique sequences), the nucleotide repeat sequence being polymorphic within the population.
  • PCR polymerase chain reaction
  • Amplification of DNA from a human sample by the polymerase chain reaction (PCR) primed with a particular primer pair amplifies the nucleotide repeat sequence and at least some of the immediately adjacent unique sequences of the microsatellite sequence to produce a PCR product identified with the primer pair.
  • the distance in the genome between the microsatellite sequence amplified by one primer pair of the kit and the nearest other microsatellite sequence amplified by another primer pair of the kit is at least 2 centimorgans (cM) and no more than 50 cM.
  • Each SET consists of at least 6 of the primer pairs, where the length of the segment amplified by a particular primer pair (its PCR product) differs from the length of PCR products from all other primer pairs in the SET by at least 5 nucleotides for tetranucleotide repeats, at least 6 nucleotides for trinucleotide repeats and at least 9 nucleotides for dinucleotide repeats. At least one primer of each primer pair is labelled with a fluorescent label that is the same for all primer pairs in the SET.
  • Each GROUP consists of at least three SETS of primer pairs labelled with fluorescent labels, and primers from one SET in the GROUP are labelled with a fluorescent label which fluoresces at a wavelength which is substantially different from the wavelength at which the fluorescent labels on the primers in each of the other SETS in the GROUP fluoresce.
  • kits cover the entire genome with markers spaced approximately 10 cM apart in the genome
  • the kit will usually contain at least about 10 GROUPS.
  • a kit is provided for screening of the genome with individual markers spaced in the genome about 50 cM from the nearest other marker in the kit, and the kit contains at least 4 GROUPS.
  • the invention also provides kits containing fewer GROUPS with primers whose PCR products identify microsatellite sequences found in the genome spaced closely about the locations picked out by screening studies performed using the screening kit.
  • the invention also provides a method of analyzing genomic DNA for the presence of polymorphisms comprising: extracting DNA from a human sample; combining, in a polymerase chain reaction (PCR) vessel, an aliquot of the extracted DNA, at least one primer pair selected from one of the GROUPS described above, and PCR amplification enzymes; cycling the temperature of each PCR vessel to produce PCR products that can be identified with the primer pair whose sequence corresponds to unique sequence in the amplified DNA, using an annealing temperature at which non-specific annealing is minimized; then combining all PCR products from all PCR vessels containing primer pairs from a single GROUP into a mixture, and subsequently separating the mixture of PCR products electrophoretically by size; and detecting separated PCR products by fluorescence detection at wavelengths corresponding to the fluorescent wavelength for each of the fluorescent labels in the kit.
  • PCR polymerase chain reaction
  • one primer of each primer pair is labelled with a fluorescent label and the other primer in the pair is labelled with biotin
  • a mixture containing all PCR products corresponding to the primer pairs from a single GROUP is prepared by binding the PCR products to a plurality of paramagnetic beads carrying on their surface a protein which specifically binds biotin (the beads being added to each PCR vessel after amplification), separating the magnetic beads from the PCR reaction medium, then separating the two strands of the amplified DNA segments and combining the strands labelled with a fluorescent label for all primer pairs from one GROUP into the mixture.
  • the invention also provides a method for selecting a SET of PCR primers for use in automated genotyping comprising selecting at least 6 microsatellite sequences, which contain di- nucleotide, trinucleotide or tetranucleotide repeat sequences that are flanked by unique sequences in the human genome, and are polymorphic within the population, the microsatellite sequences being separated from each other by at least 2 centimorgans in the genome, and for each microsatellite sequence constructing primer pairs having the sequence of the unique sequences flanking the microsatellite sequences, so that the primer pairs will direct PCR amplification of DNA segments corresponding to each microsatellite sequence and the length of all polymorphs of the microsatellite sequence amplified by a particular primer pair is detectably different from the length of all polymorphs of other microsatellite sequences amplified by other primer pairs in the SET.
  • the invention also provides a kit for use in automated genotyping comprising at least 10 GROUPS of at least 3 SETS of PCR primers obtained by this method, and a method of analyzing genomic DNA for the presence of polymorphisms comprising amplifying DNA extracted from a human sample using PCR directed by these primer pairs to produce PCR products labelled with detectable labels that are the same for all PCR products from a single SET, followed by separating electrophoretically a mixture containing all PCR products amplified from the DNA sample by any primer pair of said SET and characterizing the detectably labelled PCR products by length.
  • the invention also provides a diagnostic method for detection by polymerase chain reaction of genomic rearrangement (including deletions, additions, crossovers and gene amplification), of a genomic region containing at least 6 known loci at which genetic rearrangement is diagnostic for a disease, using a kit comprising at least one SET containing at least 6 PCR primer pairs, the sequences of each primer pair corresponding to the unique sequences flanking one of the loci of genomic rearrangement.
  • the primer pairs in the SET are constructed so that the PCR product amplified by a particular pair of primers corresponds to a DNA segment surrounding one locus of rearrangement with length that is characteristic of a specific rearrangement, and the length of the PCR products amplified by a particular pair of primers differs from the length of all other PCR products amplified by other primers in the SET.
  • DNA from a sample is amplified in a PCR vessel using the polymerase chain reaction (PCR) primed with at least one of the primer pairs of the SET by cycling the temperature of the vessels with an annealing temperature that minimizes non-specific annealing to produce detectably labelled PCR products, and the PCR products for all primer pairs in the SET are detectably labelled with the same label.
  • PCR polymerase chain reaction
  • Labelled PCR products are separated electrophoretically by size from a mixture containing all PCR products amplified from the DNA sample by any primer pair of the SET, and the separated, detectably labelled PCR products are characterized by length.
  • all primers in the SET have annealing temperatures within a 4C range, and amplification for all primers in the SET is carried out simultaneously in the same vessel.
  • the inventor has created a kit comprising SETS of highly polymorphic fluorescent primers specific for microsatellite markers that cover the genome at approximately 10 cM intervals for linkage studies.
  • a fluorescence-based protocol based on these SETS has been developed for detection of multiple microsatellite markers, and the protocol is accurate as compared to a conventional radiolabeling method that depends on a known DNA sequence ladder and conventional autoradiography for detection. It has now been demonstrated that genotyping by semi-automated fluorescence-based techniques is both highly accurate and efficient. We routinely type 24 fluorescent markers simultaneously using these techniques in my laboratory.
  • Figure 1 shows the genetic map of the chromosomal region surrounding a putative GENETIC locus.
  • Figure 2 shows segregation data from a fabricated three generation family affected with a genetic disorder for the four markers illustrated in Figure 1.
  • Squares indicate males, circles indicate females.
  • Affected and unaffected family members are indicated by solid and open symbols, respectively.
  • Crossovers that have occurred during meiosis are indicated by the arrowheads.
  • Recombination with markers 1 and 4 from chromosome A exclude a localization for the gene causing this disorder in the region immediately above marker 1 and below marker 4.
  • the region from chromosome A between markers 1 and 4 (including markers 2 and 3) co- segregates with the abnormal phenotype in all the affected individuals in this family but is not found in any unaffected individuals.
  • Chromosomal region 4 of chromosome B from affected individual 1-1 occurs in both affected and unaffected offspring in generation ⁇ , showing no linkage.
  • the markers used in this demonstration approach the ideal by providing maximal genetic information for every individual studied.
  • Figure 3 illustrates the most common form of simple sequence repeat.
  • the marker is heterozygous, or differs in the number of dinucleotides between the maternal and paternal chromosomes.
  • These PCR products would differ in length by 8 nucleotides, and are each easily detected using gel electrophoresis.
  • the solid bars indicate surrounding sequence that is unique (occurs only once in the human genome) and can be used to design PCR primers for amplifying this simple sequence repeat.
  • Figure 4 shows a cartoon of GROUP 1 markers. Each simple sequence repeat marker is identified on the left, and the size range for known alleles are noted on the right. Each marker covers a region of a chromosome to be examined for linkage with a genetic disorder.
  • the colored boxes refer to the region on the gel where alleles for each marker may be found.
  • the markers are chosen to avoid overlap between these regions. For increased efficiency each
  • SET is labelled with one of three fluorophores — yellow: tetramethyl-6-carboxy-rhodamine
  • TMR Trimer-rhodamine
  • FAM 5-carboxy-fluorescein
  • FEM 2',7'-dimethoxy-4',5'-dichloro-6- carboxy-fluorescein
  • ROX red 6-carboxy-rhodamine
  • FIG. 5 shows a typical set of electrophoretograms for GROUP 2 using DNA from a single individual.
  • Figure 6 shows an electrophoretogram of SET A, GROUP 1 markers from one individual. The size (nucleotides) of each PCR product is given on the X-axis above the electrophoretogram.
  • Figure 7 A-M provides a listing of the markers in 13 GROUPS each containing 16-24 markers divided into three SETS.
  • the first column gives a locus designation for the marker to identify the entry in the Genbank Data Base which provides the unique sequences surrounding the markers.
  • the unique sequence information can be used to design primers that will direct PCR amplification of the marker.
  • the size range of the published alleles (in base pairs), the degree of heterozygosity in the population and the chromosomal location are listed, in that order, for each marker followed by the nucleotide sequences of preferred primer pairs, along with their annealing temperatures and preferred choice for labelled primer.
  • Figure 8 demonstrates the difference in autoradiographic image produced depending on whether the forward or reverse primer is labelled.
  • Figure 9 shows an autoradiograph of PCR-amplified DNA using the primers of GROUP 2, SET B.
  • the variation in intensity in products of this SET is typical of this type of marker.
  • Figure 10 shows the effect of varying the amount of paramagnetic beads in a magnetic bead-based recovery from PCR.
  • PCR polymerase chain reaction
  • Fluorescent labeling of PCR-based markers provides many potential advantages over radio-labels (e.g. , 3 P) and other labels in common use for PCR markers. Fluorescent labels are nontoxic, stable, and can be combined and analyzed together in a single electrophoretic lane (multiplexing) to provide a many-fold increase in efficiency over standard methods of detection. Fluorescence signals are linear over a much greater range of intensity than conventional autoradiography and other methods of detection in use, providing a better means of distinguishing between alleles and artifact. Band intensity provides an objective method for distinguishing between alleles and artifacts and may also provide a better means for identifying the products of microsatellite markers that frequently vary significantly in intensity.
  • real-time fluorescence detection methods may provide a substantial increase in efficiency over standard methods of detection based on radiolabeling.
  • a much larger range of product sizes can be resolved on each gel run as compared to radiolabeling techniques because with the automated, real-time equipment such as the Applied Biosystems Inc. , the PCR products pass by the detector toward the bottom of the gel where the band resolution is greatest.
  • Efficiency is further improved by the potential real-time semi-automated detection of alleles.
  • internal size standards are easily incorporated for reproducibility and the accurate sizing of alleles, avoiding day to day variability. Computerized data acquisition and handling further aid productivity and reduce errors in data entry and manipulation.
  • Example 1 demonstrates the accuracy of sizing microsatellite PCR products using a fluorescence-based approach as compared to a conventional radiation-based method using a known sequence ladder.
  • DNA templates may be obtained from the collection of Centre d'Etude du Polymorphisme Humaine, Paris (CEPH) for use as a standard set of alleles to compare these techniques, because there is little question of the genetic identity of each of the individuals in this collection.
  • fractional size estimates should preferably be accurate to within 0.5 nucleotides. Variation greater than this could lead to confusion during band matching, after rounding up or down for size estimates provided as a fraction of a nucleotide. Since our analysis suggests that the maximum variation is likely to be less than 0.5 nucleotides (and generally significantly less), the method will be useful in the intended applications.
  • Each SSR marker is characterized by PCR primer pairs which have the same sequence as a portion of the unique DNA sequence on the 5' side of the sense and antisense strands, respectively, encoding the repeat sequence at a particular point in the genome.
  • PCR primer pairs which have the same sequence as a portion of the unique DNA sequence on the 5' side of the sense and antisense strands, respectively, encoding the repeat sequence at a particular point in the genome.
  • the number of repeats of the simple sequence at a particular locus varies between individuals (polymorphism), and this polymorphism results in PCR products of varying size for different individuals.
  • the size of the PCR product can be used to determine if two individuals have an allele in common at the genetic locus of the SSR marker.
  • the spacing in the gel between PCR products identified with different markers is critical.
  • the PCR products corresponding to each marker in a SET are spaced a critical distance from surrounding markers such that none of the PCR products for the largest known alleles of one marker overlap in size with PCR products for the shortest known alleles of another marker in the SET when separated on a 6% denaturing acrylamide gel.
  • An additional safety margin should be provided, because rare undocumented alleles (larger or smaller) may occur for any given marker.
  • Size spacing of less than 9 nucleotides between dinucleotide SSR markers increases the likelihood for overlap because 2-4 stuttering bands (each 2 nucleotides apart) below the smallest allele of one marker may overlap with the largest allele of the marker below it.
  • PCR products for trinucleotide repeat sequences and tetranucleotide repeat sequences are not observed to exhibit stuttering bands, so the minimum separation distance above and below the largest and smallest known alleles can be less for tri- and tetranucleotide repeats.
  • PCR products for trinucleotide repeats in a SET will differ by at least 5 base pairs, and for tetranucleotide markers by at least 6 base pairs.
  • a SET will contain 7-9 SSR markers, most preferably 8-9 markers. The upper limit on the number of markers in a SET is dependent on the length of the electrophoretic separation.
  • the PCR product of each primer pair in the SET is tagged with the same label, preferably a fluorescent dye.
  • a fluorescent label is covalentiy attached to one of the primers in a primer pair.
  • the PCR product may be uniformly labelled by adding one or more fluorescently-labelled nucleoside triphosphates to the PCR reaction. Labelling of the primers may be accomplished by including a fluorescently-labelled nucleotide during synthesis of the primer or by linking a fluorescent label to the primer after synthesis. Fluorophore labels for attachment to nucleic acids, including PCR primers, are readily available in the art. (See, e.g., Nagaoka, et al., (1992) Chem. Pharm.
  • the labels contain coupling groups that react with modified nucleotides of the PCR primers to form covalent links. Attaching such fluorophores to the primers in the SETS of this invention is easily within the skill of the ordinary worker.
  • Fluorescent labels with non-overlapping emission spectra are also available commercially, for example, from Applied BioSystems, Inc., including 5-carboxy- fluorescein (FAM-blue), 2',7'dimethoxy-4',5'-dichloro-6-carboxy-fluorescein (JOE-green), N,N,N ⁇ N'-tetramethyl-6-carboxy-rhodamme(TMR-yeUow), red); from Biological Detection Systems, Inc., Pittsburgh, PA (BDS) including nucleoside triphosphates coupled to cyanine dyes that fluoresce in the green or orange region, or Boehringer Mannheim Corporation Biochemical Products, Indianapolis, IN, including fluorescein-5(6)- carboxamidocaproxyl-d
  • primers or PCR products
  • biotin see, e.g., Innis, et al., "PCR Protocols," Academic Press, NY, 1990, pp. 100-103
  • streptavidin coupled to a particular fluorescent dye added to all of the PCR products of a particular SET. Variations of these labelling methods or similar methods known to those skilled in the art may be used, so long as all PCR product for markers in one SET are labelled with the same label.
  • SETS each labelled with a different fluorophore
  • GROUP markers that we have termed a "GROUP.”
  • the number of SETS in a GROUP will depend on the availability of distinct labels.
  • PCR products for each SET in the GROUP will usually be labelled with fluorophores that emit light at a wavelength substantially different from the wavelengths emitted by fluorophore labels of the other SETS in the GROUP, where "substantially different" means sufficiently distinct to be distinguished by the detection means chosen for detecting PCR products after electrophoresis.
  • TMR TMR
  • FAM FAM
  • JOE Applied Biosystems
  • these fluorescent PCR products may be separated on an automated electrophoresis systems, such as the Applied Biosystems 373 sequencer with internal size standards in each lane (labelled, for example, with ROX (red dye), Applied Biosystems) and analyzed using, e.g., GeneScan 672 software (Applied Biosystems) (Ziegle, et al., 1991, Miami Short Rep., 1:70) and scored using GENOTYPER software (Applied Biosystems), with data displayed as an electrophoretogram or in a spread sheet format. Gel band fluorescent intensities and peak areas provide an objective method of distinguishing alleles from artifact (stuttering bands).
  • a typical electrophoretogram from a single individual for SET A GROUP 1 is illustrated in Figure 6. Marker Selection and Development:
  • the human genome is estimated to be approximately 3000 cM in length. Therefore, to adequately "cover” the entire genome at 10 cM intervals will require approximately 300 highly informative well spaced markers.
  • An alternative estimate obtained by summing the meiotic maps from all the chromosomes suggests that the genome is approximately 5000 cM in length (NIH/CEPH Collaborative Mapping Group, 1992, Science, 252:67-86).
  • Adequate "coverage" of the entire genome based on this size estimate at 15 cM intervals (which would allow testing for linkage without using a prohibitively large number of families) will require about 333 highly informative well spaced markers.
  • Characteristics of preferred markers can be summarized as follows: unique sequence surrounding the marker is available for use in designing primers, they have been sized accurately, the heterozygosity value is known, and each marker has been carefully localized. Over 1000 SSR markers, including the surrounding unique sequence and chromosomal location, have been described to date in the Genome Data Base (GDB), October 19, 1993, The Johns Hopkins University, Baltimore, Maryland. In contrast to older approaches, such as RFLP, many of the preferred SSR markers are heterozygous (alleles differ at a particular locus) > 50% of the time and therefore are highly informative for linkage studies.
  • each allele of the markers used in the method of this invention will be easily detectable after amplification by PCR as a predictable component of a complex image or signature by 5' end labeling with 32 P, labeling with fluorescence, or by a variety of other methods. Most preferably, the markers also produce an easily scored product or simple pattern of stutter bands that are the signature of mononucleotide and dinucleotide repeats.
  • stutter bands are artifacts produced during PCR, and are less common in PCR of tri-and tetranucleotide repeats. Although these stutter bands have been generally considered undesirable, they can be quite helpful to the investigator (or computer) during the scoring of genotypes by allowing for the identification of 'false' bands (background bands due to non ⁇ specific annealing). Each allele can then be easily scored by 5' end labeling with 32 P or fluorescence after amplification by PCR, as a predictable component of a complex image. Background bands are generally not associated with stuttering artifacts.
  • dinucleotide SSR is preferred in the method of this invention, because the potential advantages for automated genotyping may not be so easily incorporated into practice for mono-, tri- and tetranucleotide repeats.
  • PCR products of trinucleotide and tetranucleotide repeats lack the unique "stuttering" signature of dinucleotide repeats, making it difficult for the computer to distinguish real alleles from artifacts produced by nonspecific annealing during PCR.
  • a simple set of PCR products are produced as alleles (little or no stuttering) from tri- or tetranucleotide SSRs, it is often difficult to eliminate other PCR artifacts completely.
  • PCR artifacts are not easily distinguished from “false” bands when large numbers of PCR products that vary significantly in intensity are combined as described by this method.
  • the unique signature derived from the stuttering bands of dinucleotide repeats provides a simple means of distinguishing real products (alleles) from artifactual bands.
  • the cost of the hardware is generally considered the limiting factor when adopting the fluorescent approach.
  • Tri- and tetranucleotide markers generally require a significantly larger fraction of each gel because alleles span a much larger size range. Thus longer run time is required, and fewer markers can be resolved per gel.
  • the cost of the hardware becomes readily affordable if one considers the utility and throughput of such an instrument when used according to the method of this invention.
  • the use of fewer markers per lane i.e., tetranucleotide repeats
  • markers for inclusion in each SET is based on the need to: maximize heterozygosity values (genetic informativeness), place the marker within a SET based on the size of the PCR products (alleles produced must not overlap with those of the marker above of below it), and the location of the marker in the genetic map (ideally we would have 450-500 markers placed 10 cM or less apart).
  • the PCR products corresponding to markers within a SET are sized to assure that infrequent alleles and stutter bands do not produce overlap between the markers (compare e.g., Figures 4 and 6).
  • PCR products for SETS of dinucleotide markers differ by approximately 9 nucleotides, preferably, at least 10 nucleotides, in length.
  • new oligonucleotide primers based on the unique sequence surrounding a polymorphic marker are designed and synthesized to assure that the PCR products do not overlap during electrophoresis.
  • Figures 7A-M show 289 SSR markers that have been selected and combined into 11 GROUPS of 21-24 markers and 2 incomplete GROUPS of 16 markers so that markers in each GROUP can be separated and analyzed simultaneously.
  • the selected markers cover the genetic map on average once every 10 cM. Most are heterozygous greater than 70% of the time.
  • each SET is composed of 8 markers from multiple linkage groups (see, e.g., Figure 7B-H).
  • SETS of markers are part of a single linkage group (i.e. a single chromosome), but this may require significant additional labor because fewer existing primers will be suitable.
  • GDB Additional or alternative SSR loci to assemble into GROUPS of markers may be found in GDB. Loci listed in GDB can be arranged on the genetic map by using map location information in GDB. Additional or alternative primers may then be designed using information
  • GROUP 1 markers are currently performing well in multiple laboratories.
  • new oligonucleotide primers must be designed from the sequence surrounding each marker to produce PCR products that fit between the products of the markers above and below it without overlap.
  • the unique primer 3' sequence should contain at least 7 nucleotides, the ⁇ G threshold should be at least -1.0 kcal/mol, most preferably -1.4 kcal/mol, and duplex formation should be avoided, the maximum length of duplex not exceeding 2 base pairs.
  • the sequence of preferred primers will also minimize or eliminate self-complementarily, hairpin formation, and false priming. Once the sequences of candidate primers are chosen, synthesis is readily accomplished by standard methods (see, e.g., Sambrook, et al.). Optimization of PCR Conditions and Appearance on the Gel: These new primers must be tested to assure that they produce an easily scored collection of products of the correct size.
  • Scoring may be easier if the label is on one primer rather than the other for particular markers (see, e.g., Figure 8).
  • Primers developed for dinucleotide markers may perform well in the PCR reaction, but produce products unacceptable for genotyping (single base stuttering bands, stuttering bands of equal intensity with true alleles, or stuttering bands that are larger than the correct allele), and such primers should be avoided.
  • the PCR conditions for each marker should be optimized to eliminate any artifactual PCR products due to nonspecific annealing that may complicate the analysis of a GROUP of combined markers.
  • the temperature of the annealing phase of each PCR cycle should be optimized for each primer pair. Accordingly, the annealing phase temperature is set relatively high, so that specific hybridization occurs, but non-specific hybridization between the template DNA and the primers is minimized.
  • the selectivity provided by this optimization is preserved in the method of this invention by limiting the number of primer pairs in any PCR reaction vessel to those whose optimized annealing temperature is the same or nearly the same.
  • all primer pairs in the same PCR vessel have annealing temperatures within 4C of each other.
  • an entire 96 well plate is dedicated to PCR reactions using primers for a single marker.
  • each PCR vessel on a plate has only one primer pair, but the plate contains vessels having different primer pairs, so long as all primer pairs on the same plate have annealing temperatures within 4C.
  • all of the primer pairs for a SET or even a GROUP are constructed to have optimized annealing temperatures in a narrow range, most preferably 4°C, and all of the primers are present in a single PCR reaction vessel, obviating the need to mix the individual PCR products prior to electrophoretic separation.
  • each marker should be evaluated to assure it is sized correctly within the SET and that the alleles can be easily scored as distinct products.
  • reported heterozygosity values are usually verified using a population of unrelated individuals.
  • the same DNA templates provided herein may be used as controls for verification of protocols and quality assurance.
  • Preferred controls include CEPH parents (BIOS corporation, New Haven, Conn.; Cell Repository, Camden, N.J.), such as families 1331, 1347, 884, for which reference alleles are known (see, Weber, et al., and Genethon Microsatellite Map Catalog, Genethon Human Genome Research Center, Evry, France). Pooled DNA from volunteers who have donated blood that has been purified as described in the EXAMPLES may be used as well.
  • This optimization process requires the synthesis of oligonucleotide primers, dilution and aliquoting of primers, identification of the appropriate annealing temperature (T°) and PCR protocol, electrophoresis of the products, autoradiography and data analysis. If labelled primers are used for detection of products, 5' end labeling of both primers should be tested to determine which one produces the best image 1 . The size of the PCR products from each marker should be verified experimentally to assure that it does not overlap with the products of the surrounding markers in the same SET. As a control for this purpose, PCR products from a pool of DNA samples from a population of unrelated individuals may be electrophoresed against a DNA sequence ladder. In a preferred mode the test pool will contain at least 50 chromosomes.
  • Initial characterization of primers for each SSR marker may be performed with 32 P labels because this is less costly, but the smooth adaptation of fluorescent-based techniques for genotyping with markers that have been optimized using 32 P is also dependent on assuring the PCR products labelled with a fluorescent dye perform as expected during PCR and analysis. Therefore, the reliability of the developed protocol should be checked by electrophoresis of DNA samples labelled by PCR with the fluorescent labels.
  • the image produced by labeling one of the pair of primers is blurred, see, e.g., Figure 8.
  • the PCR products of different microsatellite markers frequently vary significantly in intensity (see, e.g., Figure 9).
  • the sizing of fluorescent PCR products of grossly different concentrations is potentially complicated by sample overloading, causing spectral interference between the dye labels during analysis. There was no interference in the detection of the overlapping products using the four dyes in Examples 1 or 5, because the concentration of each PCR product was determined and adjusted to prevent overloading. However in our experience this can become a problem when working routinely with 21 to 24 pooled markers.
  • PCR products are recovered and combined into a mixture containing the GROUP by a simple protocol that uses magnetic separation technology to purify the fluorescent PCR products and which restricts the total amount of product pooled to prevent overloading.
  • Magnetic separation provides simple separations based on specific binding interactions without the need for expensive centrifuges. Saturation binding to a limited amount of paramagnetic beads can be used to control the amount of labelled PCR product carried
  • Relative intensity may be adjusted by this means and overloading may be avoided.
  • one primer is labelled with a component that will bind to magnetic microbeads, for example biotin-labelled primers will bind to streptavidin-coated magnetic beads.
  • biotin-labelled primers will bind to streptavidin-coated magnetic beads.
  • Magnetic beads coated with streptavidin are commercially available (DynabeadsTM) and procedures for separation are described in, e.g., “Magnetic Separation Techniques Applied to Cellular and Molecular Biology," Kemshead, etal., eds., Wordsmiths' Conference Publications, Somerset, U.K., 1991.
  • a fixed amount of magnetic beads are added to the PCR reaction after amplification using primers that will bind to the magnetic beads.
  • the magnetic beads with the PCR product attached are separated from the remainder of the PCR reaction mixture, including salts and unused, detectably-labelled primer, and then the PCR product is recovered from the magnetic beads (for example, by separating the strands, leaving one strand attached to the bead and recovering the other strand whose primer carries the detectable label).
  • the entire PCR product may be labelled by including biotinylated UTP in the PCR reaction medium as described by Dennis, et al., 1990, in "PCR Protocols," Innis, et al., eds.
  • the PCR product can be bound to the beads for purification from the PCR reaction mix and excess primer, and subsequently recovered from the beads by, for example denaturation of streptavidin.
  • paramagnetic beads which have attached to their surfaces single stranded DNA corresponding to a part of the sequence of the PCR product may be added to the PCR reaction mix at the end of amplification, followed by cycling above the melting temperature, reannealing and then separating the paramagnetic beads and any other DNA strands annealed to the beads from the reaction mix. Labelled strands can then be recovered from the beads, as above.
  • SETS and GROUPS of fluorescent SSR markers covering the human genome can be completed in approximately 6-9 months, using the procedures provided herein.
  • additional fluorescent markers will be developed (approximately 500 SSR markers) providing a higher resolution tool for gene mapping.
  • the resolution of this marker collection will approach 10 cM and will preferably cover the telomeres which will better assure linkage detection in complex non-Mendelian disorders like asthma and diabetes.
  • the method of this invention offers several significant advantages over a similar strategy adopted by Diehl et al., 1991, Am. J. Hum. Genet., 47:177. Spacing markers in a SET according to this invention avoids overlap, providing improved discrimination among markers and between markers and artifacts. As many as eight or more markers may be incorporated into a SET. When necessary, new oligonucleotide primers based on the unique sequence surrounding a polymorphic marker can be designed and synthesized as taught herein to assure that the PCR products do not overlap during electrophoresis. Errors introduced by sample handling may also be minimized by storing DNA from each individual to be studied in a 96-well format.
  • the method of this invention can also increase the efficiency of diagnostic studies of the genome, when the desired diagnostic procedures involve the detection of genetic changes that affect the length of genomic DNA at 6 or more locations. Such changes include additions, deletions, intra-and interchromosomal crossover, gene amplification and similar gene rearrangements.
  • the loci of many such rearrangements are known and associated with many diseases, especially cancers and metabolic errors inherited recessively.
  • PCR using primer pairs which direct amplification of a DNA segment including one of these loci can be used diagnostically where the rearrangement associated with the disease causes a change in the length of the PCR product.
  • a SET of primers designed according to the principles above can be used in the production of PCR products that can be analyzed electrophoretically in a single lane, for more efficient use of electrophoresis and analysis equipment.
  • DNA from CEPH (Centre d'Etude du Polymorphisme Humaine, Paris) families 884, 1331, 1332, 1333, 1362 were amplified for Marshfield markers, mfd 1 (176-196bp), mfd 59 (175-195bp), and mfd 154 (186-204bp) using the polymerase chain reaction (PCR). Fluorescent techniques: The forward and reverse primers were each labelled at the 5' end for detection by autoradiography with [ 32 P] ⁇ ATP(6000 Ci/ ⁇ mole) using polynucleotide kinase. A primer was selected from each marker for fluorescent labeling on the basis of the image of the products (see Figure 8).
  • the optimal annealing temperature was selected for each marker empirically by selecting a temperature that eliminated nonspecific annealing or artifactual (background) PCR products. Fluorescent labels were attached at the 5' end via phosphoramidate derivitization using Aminolink 2 (Applied Biosystems). Primer B (see Figure 10) for mfd 1 was labelled yellow (TMR), primer A (see Figure 10) for mfd 59 was labelled blue (FAM), and primer B (see Figure 10) for mfd 154 was labelled green (JOE).
  • PCR conditions were: 0.4 ⁇ M primers, 1.5 ⁇ M MgCl 2 , 50 ⁇ M Kcl, 200 ⁇ M dNTPs and 0.5 units Taq polymerase (final con ⁇ centrations); 94°C for 10 min; followed immediately by 30 cycles of 94°C for 30 sec; 58°C (mfd 59, mfd 154) for 30 sec or 60°C (mfd 1) for 30 sec; and 72°C for 30 sec; followed by 72°C for 7 min.
  • PCR was carried out in a volume of 12.5 ⁇ l using 25 ng of CEPH DNA.
  • CEPH DNA was stored in a 96 well microtiter plate (Perkin Elmer/Cetus). Amplifications were
  • the supernatant was aspirated, the pellet was washed once with 1.5 volumes of ice cold ethanol (70%), and the plate centrifuged 30 minutes at 1400XG at 4°C. The supernatant was aspirated and the plate was air dried. Pellets were resuspended in a volume of sterile ddH 2 O equal to the starting volume (pool).
  • Radiolabelled products were separated by conventional electrophoresis and scored manually from autoradiographs. Fluorescent PCR products were separated on a 373 sequencer with internal size standards in each lane (GeneScan 2500-ROX; Applied Biosystem) and analyzed using GeneScanTM 672 software (Applied Biosystems). Each sample (representing 0.5 ⁇ l of each product) was heated to 99 °C after adding 1 ⁇ l of the internal lane size standards (GeneScan 2500-ROX, Applied Biosystems) and 2 ⁇ l formamide/EDTA loading buffer, until the total volume was reduced to 2-3 ⁇ l. Electrophoresis was carried out using 6% acrylamide (Biorad), 8 M urea (Ultrapure, USB) gels in 1 X TBE. The reduced volume was loaded and run for 4-8 hours on a model 373 Sequencer (Applied Biosystems) using a 24 cm well to read distance.
  • the size of the PCR product is determined by reference to the internal lane size standards (Carrano et al. 1989, Genomics, 4:129-136).
  • the size standard ROX-2500 (Applied Biosystems) including fragments: 37, 94, 109, 116, 172, 186, 222, 233, 238, 269, 286, 361, and 479 nucleotides in length was used with modifications.
  • PCR fragments 61 and 68 nucleotides in length were gel purified, labelled by aminolinking with ROX, and added in equal volumes to the ROX-2500 standards. These fragments were added because desalting by ethanol precipitation recovers the unused PCR primers with the products.
  • the GeneScan 672 (version 1.0) software recognizes any peak labelled with ROX, computes a calibration curve based on a second-order least-squares fit, and uses these data to estimate the allele sizes of the PCR products (Ziegle et al. 1992). Data from each lane can be analyzed independently, or four lanes of data for a single fluorescent dye can be displayed simultaneously to compare individuals within a family. Allele sizes in nucleotide bases, the genotypes, are assigned by interactively distinguishing major peaks from background artifacts. The scale on the display can be adjusted to analyze peaks with differences in fluorescent intensity. The intensity of each fluorescent band and peak areas provide an objective method of distinguishing alleles from artifact (including stuttering bands).
  • Genomic DNA is isolated as described by M.J. Johns, et al., Analytical Biochem., lSQ:276-278 (1989).
  • DNA templates can be stored in a 96 well grid (e.g., Perkin Elmer/Cetus).
  • the integrity of the grid may be maintained throughout the protocol to avoid errors introduced by manual pipetting and sample handling.
  • Multichannel pipetting from a 96-well grid expedites sample handling while minimizing human errors.
  • PCR is performed in a reaction volume of 12.5 ⁇ l, containing 50 ⁇ M dATP, dGTP, dTTP, dCTP; 0.07 ⁇ M of the labelled oligonucleotide primer, and 4 ⁇ M of the unlabelled primer.
  • Taq polymerase Perkin-Elmer ⁇ Cetus 0.5 units is added on ice. PCR will usually be
  • thermalcycler e.g., a Perkin-Elmer ⁇ Cetus 9600 thermalcycler. Standard thermalycycler settings are 94°C for 10 minutes, followed by 30 cycles 94°C for 30 seconds, 30 seconds at average annealing temperature for the primers and 72°C for 30 seconds; final extension is at 72 °C for 7 minutes. Labelled PCR products are purified by co-precipitation in EtOH. 24 markers may be co- precipitated simultaneously in the 96-well format using ethanol. Ethanol precipitation desalts the products but copurifies the primers.
  • the labelled primer peak produces an enormous signal that complicates the analysis of products under 93 nucleotides in length because it interferes with the 37 nucleotide ROX GeneScan-2500 standard.
  • internal standards may incorporate fragments that are 50, 60, and/or 70 nucleotides in length in addition to the GeneScan 2500 standard fragments or an equivalent set of fragments.
  • the amplified products are analyzed by denaturing gel electrophoresis (Sambrook, et al.). Loading buffer (2X concentration) is added to an equal volume of the PCR reaction, and die PCR reaction is loaded on a 6% polyacrylamide gel. Radioactive products will be sized against a sequence ladder; the gels are dried and then exposed to Kodak XAR film for 4-24 hours with or without intensifying screens. Fluorescent labelled PCR products may alternatively be analyzed by semi-automated detection using, e.g., an ABI 373 A automated sequencers and GeneScan 672 software from Applied Biosystems, Inc. EXAMPLE 3
  • PCR products are produced as in Example 2 and then purified and combined for electrophoresis using a magnetic bead protocol in place of EtOH precipitation.
  • One of each pair of primers is labelled with biotin and the other with a fluorescent label as above.
  • Double stranded PCR products are purified using streptavidin conjugated to paramagnetic beads to bind the primer 5' labelled with biotin. This procedure may be easily adapted to the 96-well format in any laboratory without expensive centrifuges.
  • the DNA bound to magnetic beads is separated from the PCR reaction media, the two strands are melted and separated, and the strand labelled with the fluorescent primer is pooled with other labelled strands of its GROUP for electrophoresis.
  • the result of increasing the amount of beads used for separation of a single PCR product from its PCR reaction mix is shown in Figure 12.
  • each marker may be optimized, using 12 wells or less of a 96-well plate. Eight markers are amplified per plate at a single temperature. Alternatively, a thermalcycler with a smaller sample capacity may be used.
  • the 5' end of the primers to be tested is labelled with 32 P using the polynucleotide kinase reaction.
  • 2(A+T) + (G+C) (If the calculated temperatures for 2 primers differ greatly, for example 54° and 64°, begin closer to lower T°)
  • 2(A+T) + (G+C) (If the calculated temperatures for 2 primers differ greatly, for example 54° and 64°, begin closer to lower T°)
  • a protocol extending this approach to include up to 24 microsatellite markers in each electrophoretic lane was tested as follows. The selection of markers was based on the need to: maximize heterozygosity (genetic informativeness), distribute markers across the entire genetic map, and the placement of the marker within a SET based on the known size of the PCR products (alleles and stuttering bands produced must not overlap with those of the marker above of below it). Highly informative microsatellite markers were assembled into a ladder or "SET". Each marker in a SET is spaced a distance of at least 9 nucleotides from surrounding markers such that none of the PCR products overlap in size when separated on a 6% denaturing acrylamide gel.
  • Each SET was labelled with one of three different commercially available fluorophores (TMR, FAM, and JOE; Applied Biosystems). The fourth fluorophore (ROX) was reserved for the internal size standard. Three SETS each labelled with a different fluorophore were pooled into a collection of markers we have termed a "GROUP".
  • New primers were designed as necessary using OLIGO 4.0 (Research Genetics, Huntsville, AL) to fit within the marker ladder.
  • Each GROUP was constructed to avoid overlap between markers within SETS but to allow overlap between SETS.
  • thermocycler block may be used with a lower capacity. Variability among thermalcycler operating temperatures may require adjusting the annealing temperature when switching from one machine to another. Therefore the use of the protocols described for marker GROUPS 1 and 2 should be preceded by a reevaluation of the suggested annealing temperatures for optimal performance. This can generally be carried out once on a few markers and when necessary the annealing temperatures can be adjusted up or down for all the markers for that machine.
  • Marker GROUPS 1 and 2 are described in Figures 7A and B, respectively.
  • the primers sequence, chromosomal location, choice of labelled primer, and optimal annealing temperature is listed for each locus.
  • GROUP 1 is composed of a combination of 21 di-, tri-, and tetranucleotide markers from multiple linkage groups. The product sizes range from 66 to 322 nucleotides.
  • Group 2 is composed of 24 dinucleotide markers with products ranging in size from 75 to 349 nucleotides.
  • the mean heterozygosity for both GROUPS is 74%. Scoring of the fluorescent products using the ABI 373 sequencer and GeneScan 672 software was unambiguous in samples that were desalted by ethanol precipitation. Desalting was carried out as follows: 5 ⁇ l of each PCR product from the same SET (like color) was combined.
  • a typical set of electrophoretograms of each SET from GROUP 2 for a single individual is illustrated in Figure 5.
  • Each of the alleles can be easily recognized by the unique signature of the stuttering bands for these dinucleotide repeat markers amplified by PCR.
  • Samples that were not desalted were difficult to score because the mobilities of the products and the ROX-2500 internal lane standards were altered.
  • Salt and primer loads become a problem when combining multiple products for electrophoresis because the necessary volume reduction results in sample concentration. The salt concentration rises with the product concentration and
  • SUBSTITUTE SHEET (P/JLE 26) interferes with the separation of the products and standards. This becomes critical when pooling 21 to 24 markers.

Abstract

Despite the introduction of molecular methods, such as the polymerase chain reaction, and the discovery of highly polymorphic microsatellite markers, genotyping remains a rate limiting factor in our ability to localize disease genes by linkage. This invention provides a method for genotyping multiple loci by semi-automated fluorescence-based techniques that is both highly accurate and efficient as compared to conventional techniques. The semi-automated techniques developed will be useful in high resolution genomic analyses including: linkage studies, cancer genetics, forensics, and cytogenetics including studies of uniparental disomy or other patterns of chromosomal inheritance.

Description

GENOTYPING BY SIMULTANEOUS ANALYSIS OF MULTIPLE MICROSATELLITE LOCI
The work leading to this invention was supported in part by Grant No. GM 47145 from the National Institutes of Health. The United States Government may retain certain rights in this invention.
BACKGROUND OF THE INVENTION Field of the Invention
This invention is directed to semi-automated methods for linkage mapping of the genome by genotyping of multiple microsatellite loci. Summary of Background Information
For most genetic disorders, there is no known biochemical defect. Consequently, the mutant genes associated with the disease and their disease-causing abnormal gene products are recognized solely by the anomalous phenotype they produce. Identifying the chromosomal localization for the gene(s) that produce these disease phenotypes is often the first crucial step toward isolation and characterization of the mutation(s) by recombinant DNA techniques. The significance of mapping a gene is perhaps better appreciated when put into context with the human genome project. Consider for a moment that even after every base of the DNA in the entire human genome has been sequenced through the Human Genome Initiative (HGI), and every gene has been localized in this sequence, it may still not be clear which disorder(s) arise from which gene(s). Each disease phenotype will still need to be "mapped" or associated with a particular location in the genome. This is usually carried out by analyzing DNA isolated from blood specimens collected from individuals within families affected by a genetic disorder. Once a disorder or abnormal phenotype has been linked to a particular region on a chromosome, the limited number of genes within this area will permit us to suggest a candidate gene that can contribute to the phenotype. Thus, once the localization of a major disease phenotype to a chromosomal region is confirmed, a few candidate genes can be examined for mutations as well as potential pathogenic mechanisms.
If no genes have been mapped to the region, then linkage studies with closely- spaced surrounding markers can often be used to delineate a large chromosomal interval (1-2 Mb) in which to search for transcribed sequences. This approach (originally termed "reverse genetics") is now generally referred to as "positional cloning". In the past the isolation of candidate genes from these large genomic regions was the rate-limiting step in positional cloning, requiring years of intensive work. However, recent improvements in methods to capture expressed sequences encoded within large genomic segments have been described. Thus, there is now a need for advances in the molecular genetic methods employed in the linkage mapping of disease genes.
Linkage
The chromosomes are the basic units of inheritance on which genes and DNA markers are organized in a linear fashion (see Figure 1). Linkage is evident when a gene(s) that produces a phenotypic trait, or a significant portion of the trait, and the surrounding DNA markers are inherited together (cosegregate at meiosis). In contrast, those markers that are not associated with the anomalous phenotype of interest will be randomly distributed among affected family members as a result of the independent assortment of chromosomes and crossing over during meiosis (see Figure 2, compare "A" markers to "B"-"F" markers).
In general, the farther a marker, or gene, is from the genetic locus of interest (for example, markers 1 and 4 as compared to markers 2 and 3 in Figure 1), the more likely they will be separated by crossing over at meiosis. The recombinant genotypes produced by crossing over between maternal and paternal chromosomes at meiosis allows us to predict the ordering of genes and markers through the interval under examination. Recombination between the markers 1A and 3A, and 2A and 4A in the affected members in Figure 2, suggest that the mutant gene of interest lies between markers 1 and 4. Thus linkage to a marker of known chromosomal location allows placement of the phenotype on the chromosomal map.
Analysis for testing linkage with use of DNA markers is based on standard likelihood theory. The DNA markers are used to recognize each of the parental chromosomes. Recall that in general each chromosome is inherited independently of any other; and the likelihood of inheriting either chromosome of a pair from each parent is 50:50. Therefore, when a marker is unlinked to the gene(s) producing an anomalous phenotype, one expects both the maternal and paternal chromosomes to be equally distributed in the affected offspring. Linkage in the human is established by the method of likelihood ratios (see Ott, 1992
"Analysis of Human Genetic Linkage," The Johns Hopkins University Press, Baltimore, for a review). One compares the probability that observed family data, such as that in Figure 2, would arise under one hypothesis (for instance, linkage with no recombination with marker 2
SBBSTΪI or 3) to the probability that it would arise under an alternative hypothesis (typically, nonlinkage). The ratio of these probabilities is called the odds ratio for one hypothesis relative to the other. By convention, mammalian geneticists prefer the log of the odds ratio, or the lod score. Generally, linkage is considered proven when the odds in favor of linkage versus nonlinkage become overwhelming, or reach 1000:1 (LOD = 3) (see Morton, 1955, Am. J. Hum. Genet., 7:277-318). Linkage is rejected when the odds drop to 100:1 against this hypothesis (LOD = - 2). The maximum likelihood estimate is the recombination fraction where the likelihood ratio is largest. Lod scores from multiple pedigrees are thus added until the score grows to 3 (signifying 1000:1 odds) or falls to -2 (indicating 1:100 odds). Linkage can be easily evaluated using likelihood ratios, even in complicated pedigrees, by testing on the computer for these competing hypothesis. Recently, additional strategies have been devised that can handle genetic heterogeneity more effectively (Oh, 1974, Am. J. Hum. Genet. , 2^:588-597) as well as disorders caused by multiple genes (Lander, et al., 1986, Proc. Natl. Acad. Sci. USA, 32:7353-7357). Genotyping With Molecular Genetic Methods The descriptions of many types of DNA sequence polymorphisms have provided the fundamental basis for our understanding of the structure of the mammalian genome (CEPH consortium map, 1992, Science, 252:67-86; Weissenbach et al., 1992, Nature, 252:794). The construction of extensive framework linkage maps has been greatly facilitated by the use of these DNA polymorphisms, and has provided a practical means for the localization of disease genes by linkage. The process of linkage mapping in Mendelian and complex disorders using these techniques has been further facilitated by the recent description of a detailed "second-generation" linkage map of the human genome (Weissenbach et al., 1992). In particular the recent description of highly polymorphic PCR-based microsatellite markers for genotyping has greatly advanced the construction of high resolution linkage maps (Weber and May, 1989, Am. J. Hum. Genet., 44:388-396; Litt and Luty, 1989, Am. J. Hum. Genet., 44:397-401).
The microsatellite markers are highly polymorphic, simple sequence repeat (SSR) markers, generally defined as repeats of 6 bp or less running in tandem for up to 100 bp long (Beckmann, et al. , 1992, Genomics, 12:627-631). These repeat sequences are flanked by unique
DNA sequences that may be identified for each marker location. With primers that correspond to the unique DNA sequence surrounding each marker, the polymerase chain reaction (PCR, see, e.g., Saiki, et al., 1988, Science, 222:489) can be used to detect each polymorphism.
This type of genetic marker is abundant and found throughout the genome. SSR may be as frequent as one every 6 kb (Beckmann, et al. , 1992). Where SSR markers show considerable polymorphism (differences in the number of repeats) between individuals, the markers can be particularly informative. Many such SSR markers have been isolated throughout the genome, and are well mapped (Weissenbach, et al., 1992). Many of these SSR markers are now available commercially for linkage studies (e.g., from Research Genetics, Huntsville, AL). Those markers which frequently allow the investigator to identify each parental chromosome as unique and to identify each crossover rapidly (see Figure 2) approach the ideal for linkage studies.
Most SSR are (GT)n dinucleotide repeat length polymorphisms (see Figure 3). It is estimated that there are about 100,000 of the (GT)„ type SSR, or one approximately every 30 kb (Beckmann, et al, 1992). Over 1,000 SSR markers have been described to date in the Genome Data Base, October 19, 1993, The Johns Hopkins University, Baltimore, Maryland, and thousands of additional markers are now in development.
SϋSSϊF It is now well accepted that methods based on the polymerase chain reaction (PCR) and highly polymorphic simple sequence repeat (SSR) markers (e.g. Figure 3) are the techniques of choice for genotyping in linkage studies (Weber, et al., 1989; Litt, et al., 1989; Edwards, et al. 1991, Am J. Hum. Genet., 4 746-56). PCR-based methods are faster and therefore less costly than restriction fragment length polymorphism (RFLP) methods; moreover, they do not require nucleic acid probes, and are more informative in linkage studies. Efforts are underway to develop automated techniques for genotyping that will further improve the efficiency of linkage studies utilizing this type of microsatellite markers polymorphism. The advantages of analyzing multiple polymorphic loci using an automated DNA sequencer were first described by Skolnick and Wallace in 1988 (Genomics, 2:273-279). Building on techniques reported by Connell, et al. (1987, Biotechniques, 5:342-348), Ziegle et al., (1992 Genomics, 14:1026-1031), extended this approach to incorporate automated DNA sizing technology for genotyping microsatellite loci using four color fluorescence-based techniques.
However, the analysis of microsatellite markers still relies on gel electrophoresis which has limited sample handling capacity. Furthermore, the gel electrophoresis of DNA fragments is complicated by problems with gel distortion, such as band shifting that warrant internal size standards and bandmatching software (Lander, 1991, Am . Hum. Genet, 48_: 819-823). Crosstalk or interference during analysis between multiple dyes with spectral overlap is another potential problem when multiple PCR fragments of the same size are to be identified within the same gel lane. Since the processing of gels and the scoring of autoradiographs remains the rate-limiting step in genotyping, methods are being sought that improve the efficiency of sample handling while minimizing errors in data transcription and analysis. The challenge of mapping the major genes in complex disorders requires efficient and highly accurate methods of genotyping. Recent technological enhancements in molecular genetics have significantly improved our ability to locate disease genes by linkage analysis. However, despite the introduction of molecular methods, such as PCR, and the discovery of highly polymorphic SSR, genotyping is still rate-limiting for localizing disease genes by linkage. The present methods remain highly technical, time-consuming, and expensive. SUMMARY OF THE INVENTION
It is an object of this invention to provide a robust semi-automated protocol for genotyping using multiplex analysis of many microsatellite loci while maintaining, or improving, typing accuracy as compared to traditional methods. It is also an object of this invention to provide a collection of highly reproducible microsatellite markers at approximately 10-50 cM intervals throughout the human genome which can be detectably-labelled.
It is a further object to provide protocols for the reliable use of these marker systems in automated genotyping. To meet these and other objects, and to better exploit the inherent advantages of fluorescence-based genotyping techniques, this invention provides highly informative SSR markers, assembled into "SETS" that do not overlap in size when separated electrophoretically on an acrylamide gel and that can be labelled with different fluorophores. Each SET contains 6 or more pairs of primers that provide for amplification of markers (preferably 7-8 pairs of primers) that have been labelled with the same fluorophore having a distinct color, separate SETs having different fluorophore labels (e.g., blue, green, or yellow). PCR products corresponding to these SETS are combined into a GROUP for electrophoretic analysis in a single lane. Using this methodology, a GROUP of 18 or more, preferably 21 to 24 dinucleotide markers can be electrophoresed along with an internal size standard and analyzed simultaneously (multiplexing) in real-time for each individual studied.
In particular, the invention provides a kit for use in automated genotyping within a population comprising four or more GROUPS, each GROUP containing at least three SETS, and each SET in turn comprising at least 6 labelled pairs of primers for amplification of DNA by polymerase chain reaction (PCR), the sequence of each primer pair corresponding to a portion of the unique genomic sequence of a microsatellite sequence (which is made up of a nucleotide repeat sequence flanked by unique sequences), the nucleotide repeat sequence being polymorphic within the population. Amplification of DNA from a human sample by the polymerase chain reaction (PCR) primed with a particular primer pair amplifies the nucleotide repeat sequence and at least some of the immediately adjacent unique sequences of the microsatellite sequence to produce a PCR product identified with the primer pair. The distance in the genome between the microsatellite sequence amplified by one primer pair of the kit and the nearest other microsatellite sequence amplified by another primer pair of the kit is at least 2 centimorgans (cM) and no more than 50 cM. Each SET consists of at least 6 of the primer pairs, where the length of the segment amplified by a particular primer pair (its PCR product) differs from the length of PCR products from all other primer pairs in the SET by at least 5 nucleotides for tetranucleotide repeats, at least 6 nucleotides for trinucleotide repeats and at least 9 nucleotides for dinucleotide repeats. At least one primer of each primer pair is labelled with a fluorescent label that is the same for all primer pairs in the SET. Each GROUP consists of at least three SETS of primer pairs labelled with fluorescent labels, and primers from one SET in the GROUP are labelled with a fluorescent label which fluoresces at a wavelength which is substantially different from the wavelength at which the fluorescent labels on the primers in each of the other SETS in the GROUP fluoresce.
Where the primers in a single kit cover the entire genome with markers spaced approximately 10 cM apart in the genome, the kit will usually contain at least about 10 GROUPS. In another embodiment, a kit is provided for screening of the genome with individual markers spaced in the genome about 50 cM from the nearest other marker in the kit, and the kit contains at least 4 GROUPS. The invention also provides kits containing fewer GROUPS with primers whose PCR products identify microsatellite sequences found in the genome spaced closely about the locations picked out by screening studies performed using the screening kit.
The invention also provides a method of analyzing genomic DNA for the presence of polymorphisms comprising: extracting DNA from a human sample; combining, in a polymerase chain reaction (PCR) vessel, an aliquot of the extracted DNA, at least one primer pair selected from one of the GROUPS described above, and PCR amplification enzymes; cycling the temperature of each PCR vessel to produce PCR products that can be identified with the primer pair whose sequence corresponds to unique sequence in the amplified DNA, using an annealing temperature at which non-specific annealing is minimized; then combining all PCR products from all PCR vessels containing primer pairs from a single GROUP into a mixture, and subsequently separating the mixture of PCR products electrophoretically by size; and detecting separated PCR products by fluorescence detection at wavelengths corresponding to the fluorescent wavelength for each of the fluorescent labels in the kit. In a preferred embodiment, one primer of each primer pair is labelled with a fluorescent label and the other primer in the pair is labelled with biotin, and a mixture containing all PCR products corresponding to the primer pairs from a single GROUP is prepared by binding the PCR products to a plurality of paramagnetic beads carrying on their surface a protein which specifically binds biotin (the beads being added to each PCR vessel after amplification), separating the magnetic beads from the PCR reaction medium, then separating the two strands of the amplified DNA segments and combining the strands labelled with a fluorescent label for all primer pairs from one GROUP into the mixture.
The invention also provides a method for selecting a SET of PCR primers for use in automated genotyping comprising selecting at least 6 microsatellite sequences, which contain di- nucleotide, trinucleotide or tetranucleotide repeat sequences that are flanked by unique sequences in the human genome, and are polymorphic within the population, the microsatellite sequences being separated from each other by at least 2 centimorgans in the genome, and for each microsatellite sequence constructing primer pairs having the sequence of the unique sequences flanking the microsatellite sequences, so that the primer pairs will direct PCR amplification of DNA segments corresponding to each microsatellite sequence and the length of all polymorphs of the microsatellite sequence amplified by a particular primer pair is detectably different from the length of all polymorphs of other microsatellite sequences amplified by other primer pairs in the SET. The invention also provides a kit for use in automated genotyping comprising at least 10 GROUPS of at least 3 SETS of PCR primers obtained by this method, and a method of analyzing genomic DNA for the presence of polymorphisms comprising amplifying DNA extracted from a human sample using PCR directed by these primer pairs to produce PCR products labelled with detectable labels that are the same for all PCR products from a single SET, followed by separating electrophoretically a mixture containing all PCR products amplified from the DNA sample by any primer pair of said SET and characterizing the detectably labelled PCR products by length.
The invention also provides a diagnostic method for detection by polymerase chain reaction of genomic rearrangement (including deletions, additions, crossovers and gene amplification), of a genomic region containing at least 6 known loci at which genetic rearrangement is diagnostic for a disease, using a kit comprising at least one SET containing at least 6 PCR primer pairs, the sequences of each primer pair corresponding to the unique sequences flanking one of the loci of genomic rearrangement. The primer pairs in the SET are constructed so that the PCR product amplified by a particular pair of primers corresponds to a DNA segment surrounding one locus of rearrangement with length that is characteristic of a specific rearrangement, and the length of the PCR products amplified by a particular pair of primers differs from the length of all other PCR products amplified by other primers in the SET. DNA from a sample is amplified in a PCR vessel using the polymerase chain reaction (PCR) primed with at least one of the primer pairs of the SET by cycling the temperature of the vessels with an annealing temperature that minimizes non-specific annealing to produce detectably labelled PCR products, and the PCR products for all primer pairs in the SET are detectably labelled with the same label. Labelled PCR products are separated electrophoretically by size from a mixture containing all PCR products amplified from the DNA sample by any primer pair of the SET, and the separated, detectably labelled PCR products are characterized by length. In a preferred mode, all primers in the SET have annealing temperatures within a 4C range, and amplification for all primers in the SET is carried out simultaneously in the same vessel.
The inventor has created a kit comprising SETS of highly polymorphic fluorescent primers specific for microsatellite markers that cover the genome at approximately 10 cM intervals for linkage studies. A fluorescence-based protocol based on these SETS has been developed for detection of multiple microsatellite markers, and the protocol is accurate as compared to a conventional radiolabeling method that depends on a known DNA sequence ladder and conventional autoradiography for detection. It has now been demonstrated that genotyping by semi-automated fluorescence-based techniques is both highly accurate and efficient. We routinely type 24 fluorescent markers simultaneously using these techniques in my laboratory. The combined analysis of 24 dinucleotide markers in a single gel maximizes the use of automated analysis equipment, such as the Applied Biosystems 373A hardware, by producing PCR products sufficiently small to run the instrument at least twice daily. The methods provided herein may improve productivity by more than an order of magnitude and can be easily adopted to most linkage studies. BRIEF DESCRIPTION OF THE FIGURES
Figure 1 shows the genetic map of the chromosomal region surrounding a putative GENETIC locus. In this example the greater the spacing between markers the more likely recombination will occur during meiosis.
Figure 2 shows segregation data from a fabricated three generation family affected with a genetic disorder for the four markers illustrated in Figure 1. Squares indicate males, circles indicate females. Affected and unaffected family members are indicated by solid and open symbols, respectively. Crossovers that have occurred during meiosis are indicated by the arrowheads. Recombination with markers 1 and 4 from chromosome A exclude a localization for the gene causing this disorder in the region immediately above marker 1 and below marker 4. The region from chromosome A between markers 1 and 4 (including markers 2 and 3) co- segregates with the abnormal phenotype in all the affected individuals in this family but is not found in any unaffected individuals. These data confirm a localization for the GENETIC locus under study to this chromosomal region.
Chromosomal region 4 of chromosome B from affected individual 1-1 occurs in both affected and unaffected offspring in generation π, showing no linkage. The markers used in this demonstration approach the ideal by providing maximal genetic information for every individual studied.
Figure 3 illustrates the most common form of simple sequence repeat. In this individual the marker is heterozygous, or differs in the number of dinucleotides between the maternal and paternal chromosomes. These PCR products would differ in length by 8 nucleotides, and are each easily detected using gel electrophoresis. The solid bars indicate surrounding sequence that is unique (occurs only once in the human genome) and can be used to design PCR primers for amplifying this simple sequence repeat.
Figure 4 shows a cartoon of GROUP 1 markers. Each simple sequence repeat marker is identified on the left, and the size range for known alleles are noted on the right. Each marker covers a region of a chromosome to be examined for linkage with a genetic disorder.
The colored boxes refer to the region on the gel where alleles for each marker may be found.
The markers are chosen to avoid overlap between these regions. For increased efficiency each
SET is labelled with one of three fluorophores — yellow: tetramethyl-6-carboxy-rhodamine
(TMR), blue: 5-carboxy-fluorescein (FAM), and green: 2',7'-dimethoxy-4',5'-dichloro-6- carboxy-fluorescein (JOE); (red 6-carboxy-rhodamine (ROX) is reserved for internal size standards), Applied Biosystems. The products of the PCR amplifications are pooled and subjected to the electrophoresis together. Marker data are derived from the Genome Data Base
(GDB), The Johns Hopkins University, Baltimore, Maryland. Figure 5 shows a typical set of electrophoretograms for GROUP 2 using DNA from a single individual.
Figure 6 shows an electrophoretogram of SET A, GROUP 1 markers from one individual. The size (nucleotides) of each PCR product is given on the X-axis above the electrophoretogram.
Figure 7 A-M provides a listing of the markers in 13 GROUPS each containing 16-24 markers divided into three SETS. The first column gives a locus designation for the marker to identify the entry in the Genbank Data Base which provides the unique sequences surrounding the markers. The unique sequence information can be used to design primers that will direct PCR amplification of the marker. After the locus designation, the size range of the published alleles (in base pairs), the degree of heterozygosity in the population and the chromosomal location are listed, in that order, for each marker followed by the nucleotide sequences of preferred primer pairs, along with their annealing temperatures and preferred choice for labelled primer. Figure 8 demonstrates the difference in autoradiographic image produced depending on whether the forward or reverse primer is labelled.
Figure 9 shows an autoradiograph of PCR-amplified DNA using the primers of GROUP 2, SET B. The variation in intensity in products of this SET is typical of this type of marker.
Figure 10 shows the effect of varying the amount of paramagnetic beads in a magnetic bead-based recovery from PCR. DETAILED DESCRIPTION OF THE INVENTION
Methods for sequencing DNA, for synthesizing oligodeoxynucleotides of defined sequence, and for separating nucleic acid segments by molecular weight using, e.g., electrophoresis are well known to those skilled in the art and well described in the literature, in, for example, "Molecular Cloning: A Laboratory Manual," Sambrook, et al., eds., Cold Spring
Harbor Laboratory Press, 1989. General methods of analyzing DNA by the polymerase chain reaction (PCR) including isolation and preparation of DNA templates, synthesis and labelling of primers, amplification, and analysis of PCR products are also well known and described in the literature, for example in Sambrook, et al., 1989, or in "PCR Protocols: A Guide to Methods and Applications," Innis, et al., eds., Academic Press, 1990. The skilled worker in this art is familiar with these and other methods of manipulating and analyzing DNA, and routine application of such methods within the skill of the ordinary skilled worker is assumed in the following description. Semi-Automated Genotyping: Despite the improvements in linkage techniques introduced by PCR and SSRs, genotyping remains highly technical, time consuming, and expensive. The application of fluorescence-based technology is one way to further reduce the cost and increase the efficiency of this type of project. Fluorescent labeling of PCR-based markers provides many potential advantages over radio-labels (e.g. , 3 P) and other labels in common use for PCR markers. Fluorescent labels are nontoxic, stable, and can be combined and analyzed together in a single electrophoretic lane (multiplexing) to provide a many-fold increase in efficiency over standard methods of detection. Fluorescence signals are linear over a much greater range of intensity than conventional autoradiography and other methods of detection in use, providing a better means of distinguishing between alleles and artifact. Band intensity provides an objective method for distinguishing between alleles and artifacts and may also provide a better means for identifying the products of microsatellite markers that frequently vary significantly in intensity.
Ultimately, real-time fluorescence detection methods may provide a substantial increase in efficiency over standard methods of detection based on radiolabeling. A much larger range of product sizes can be resolved on each gel run as compared to radiolabeling techniques because with the automated, real-time equipment such as the Applied Biosystems Inc. , the PCR products pass by the detector toward the bottom of the gel where the band resolution is greatest. Efficiency is further improved by the potential real-time semi-automated detection of alleles. In addition, internal size standards are easily incorporated for reproducibility and the accurate sizing of alleles, avoiding day to day variability. Computerized data acquisition and handling further aid productivity and reduce errors in data entry and manipulation. Ultimately, automation is likely to occur more rapidly with fluorescence-based techniques then with other methods of labeling and detection. As an initial test of the fluorescence technology, a study was conducted comparing the accuracy and reliability of these methods with 32P end-labeling (see Example 1). Three markers were chosen because they produce PCR products of the same size range. Products of PCR reactions run with primers complementary to the unique sequences on either side of the SSR for these markers were obtained using primer pairs in which one primer of each pair was conjugated to a fluorescent label. These PCR products were electrophoresed simultaneously in a single electrophoretic lane to test if these genotypes could be accurately determined. Similar to the report by Ziegel, et al., 1992, there was no difficulty in discerning PCR fragments of the same size labelled with different fluorophores. Determining the size of DNA fragments accurately is critical to genotyping in a number of applications. When parental alleles are available, a simple comparison can determine which, if either, parental allele has been passed on to a child. However, frequently in linkage studies the parental alleles are not available for comparison, and paternity must be questioned. This is also true in DNA forensics, where an unknown must be compared with many others and its size
determined unambiguously. The analysis of PCR products that differ grossly in concentration is complicated by bandshifting and other gel related artifacts. The accuracy of this typing procedure must be based on empiric studies of reproducibility using "known" samples as standards. Non-polymoiphic internal size standards can be used to remedy these problems (Lander, 1991).
Example 1 demonstrates the accuracy of sizing microsatellite PCR products using a fluorescence-based approach as compared to a conventional radiation-based method using a known sequence ladder. DNA templates may be obtained from the collection of Centre d'Etude du Polymorphisme Humaine, Paris (CEPH) for use as a standard set of alleles to compare these techniques, because there is little question of the genetic identity of each of the individuals in this collection. To avoid ambiguity in genotyping with the fluorescent method, fractional size estimates should preferably be accurate to within 0.5 nucleotides. Variation greater than this could lead to confusion during band matching, after rounding up or down for size estimates provided as a fraction of a nucleotide. Since our analysis suggests that the maximum variation is likely to be less than 0.5 nucleotides (and generally significantly less), the method will be useful in the intended applications.
As shown in Example 1, no sizing errors occurred with the use of the multi-color fluorescence-based technique, showing that this methodology is highly accurate and reproducible for scoring microsatellite markers. Since the only sizing error resulted from the use of the conventional radiolabeling technique, the fluorescence-based protocol appears at least as accurate as the conventional method. Therefore, this approach appears to adequately compensate for gel distortion and dye related artifacts as compared to radiation labeling techniques. Accordingly, the advantages demonstrated for fluorescence-based techniques may be exploited by the method of this invention, which uses at least 6 highly informative SSR markers assembled into a ladder which we have designated a "SET". Each SSR marker is characterized by PCR primer pairs which have the same sequence as a portion of the unique DNA sequence on the 5' side of the sense and antisense strands, respectively, encoding the repeat sequence at a particular point in the genome. When the genetic material of a particular individual is amplified by PCR using one of these primer pairs, a segment of DNA conesponding to the sequence of the particular SSR and its unique flanking sequences is produced (the PCR product). The size of the PCR product is dependent both on how much of the unique sequences are covered by the primers in the pair and on the number of times the repeat sequence is repeated. The number of repeats of the simple sequence at a particular locus varies between individuals (polymorphism), and this polymorphism results in PCR products of varying size for different individuals. Thus the size of the PCR product can be used to determine if two individuals have an allele in common at the genetic locus of the SSR marker.
The spacing in the gel between PCR products identified with different markers is critical. By carefully selecting the length of the primer sequences for each marker, the PCR products corresponding to each marker in a SET are spaced a critical distance from surrounding markers such that none of the PCR products for the largest known alleles of one marker overlap in size with PCR products for the shortest known alleles of another marker in the SET when separated on a 6% denaturing acrylamide gel. An additional safety margin should be provided, because rare undocumented alleles (larger or smaller) may occur for any given marker. Size spacing of less than 9 nucleotides between dinucleotide SSR markers increases the likelihood for overlap because 2-4 stuttering bands (each 2 nucleotides apart) below the smallest allele of one marker may overlap with the largest allele of the marker below it. PCR products for trinucleotide repeat sequences and tetranucleotide repeat sequences are not observed to exhibit stuttering bands, so the minimum separation distance above and below the largest and smallest known alleles can be less for tri- and tetranucleotide repeats. Usually, PCR products for trinucleotide repeats in a SET will differ by at least 5 base pairs, and for tetranucleotide markers by at least 6 base pairs. Preferably a SET will contain 7-9 SSR markers, most preferably 8-9 markers. The upper limit on the number of markers in a SET is dependent on the length of the electrophoretic separation.
The PCR product of each primer pair in the SET is tagged with the same label, preferably a fluorescent dye. Usually a fluorescent label is covalentiy attached to one of the primers in a primer pair. Alternatively, the PCR product may be uniformly labelled by adding one or more fluorescently-labelled nucleoside triphosphates to the PCR reaction. Labelling of the primers may be accomplished by including a fluorescently-labelled nucleotide during synthesis of the primer or by linking a fluorescent label to the primer after synthesis. Fluorophore labels for attachment to nucleic acids, including PCR primers, are readily available in the art. (See, e.g., Nagaoka, et al., (1992) Chem. Pharm. Bull., 4Q:2559-2561; Giusti, et al., (1993) PCR Methods Appl, 2:223-227; Alexandrova, etal., Nucleic Acids Symp. Ser. 1991, p. 277; Schubert, et al., (1992) DNA Seq., 7:273-279; Vu, et al., (1990) Tetrahedron Lett., 21:7269-7272.) Usually the labels contain coupling groups that react with modified nucleotides of the PCR primers to form covalent links. Attaching such fluorophores to the primers in the SETS of this invention is easily within the skill of the ordinary worker. See, e.g., Levenson and Chang, 1990, "Nonisotopically Labelled Probes and Primers," in PCR Protocols, Innis, et al., eds., Academic Press, NY. Fluorescent labels with non-overlapping emission spectra are also available commercially, for example, from Applied BioSystems, Inc., including 5-carboxy- fluorescein (FAM-blue), 2',7'dimethoxy-4',5'-dichloro-6-carboxy-fluorescein (JOE-green), N,N,N\N'-tetramethyl-6-carboxy-rhodamme(TMR-yeUow), red); from Biological Detection Systems, Inc., Pittsburgh, PA (BDS) including nucleoside triphosphates coupled to cyanine dyes that fluoresce in the green or orange region, or Boehringer Mannheim Corporation Biochemical Products, Indianapolis, IN, including fluorescein-5(6)- carboxamidocaproxyl-dUTP (yellow), 7-hydroxy-coumarin-3-carboxyl-dUTP (blue), and tetiamethykhodamine-5(6)-amino-thiono-dUTP (red).
Additional suggestions for selecting labels with non-overlapping fluorescent spectra and derivitizing oligonucleotides, with them can be found in Smith, et al. 1986, Nature. 321:674- 679, incorporated herein by reference. Alternatively, primers (or PCR products) may be labelled with biotin (see, e.g., Innis, et al., "PCR Protocols," Academic Press, NY, 1990, pp. 100-103) and then streptavidin coupled to a particular fluorescent dye added to all of the PCR products of a particular SET. Variations of these labelling methods or similar methods known to those skilled in the art may be used, so long as all PCR product for markers in one SET are labelled with the same label.
SETS, each labelled with a different fluorophore, can be pooled into a collection of markers that we have termed a "GROUP." The number of SETS in a GROUP will depend on the availability of distinct labels. PCR products for each SET in the GROUP will usually be labelled with fluorophores that emit light at a wavelength substantially different from the wavelengths emitted by fluorophore labels of the other SETS in the GROUP, where "substantially different" means sufficiently distinct to be distinguished by the detection means chosen for detecting PCR products after electrophoresis. For example, three commercially available fluorophores, referred to as TMR, FAM, and JOE (Applied Biosystems), have
different colors which are yellow, blue, and green, respectively.
Using this approach we have analyzed as many as 24 SSR markers in a single electrophoretic lane using three distinct fluorescent labels to label three SETS in the GROUP (see e.g. Fig. 4). In a preferred mode, these fluorescent PCR products may be separated on an automated electrophoresis systems, such as the Applied Biosystems 373 sequencer with internal size standards in each lane (labelled, for example, with ROX (red dye), Applied Biosystems) and analyzed using, e.g., GeneScan 672 software (Applied Biosystems) (Ziegle, et al., 1991, Miami Short Rep., 1:70) and scored using GENOTYPER software (Applied Biosystems), with data displayed as an electrophoretogram or in a spread sheet format. Gel band fluorescent intensities and peak areas provide an objective method of distinguishing alleles from artifact (stuttering bands). A typical electrophoretogram from a single individual for SET A GROUP 1 is illustrated in Figure 6. Marker Selection and Development:
The human genome is estimated to be approximately 3000 cM in length. Therefore, to adequately "cover" the entire genome at 10 cM intervals will require approximately 300 highly informative well spaced markers. An alternative estimate obtained by summing the meiotic maps from all the chromosomes suggests that the genome is approximately 5000 cM in length (NIH/CEPH Collaborative Mapping Group, 1992, Science, 252:67-86). Adequate "coverage" of the entire genome based on this size estimate at 15 cM intervals (which would allow testing for linkage without using a prohibitively large number of families) will require about 333 highly informative well spaced markers.
Characteristics of preferred markers can be summarized as follows: unique sequence surrounding the marker is available for use in designing primers, they have been sized accurately, the heterozygosity value is known, and each marker has been carefully localized. Over 1000 SSR markers, including the surrounding unique sequence and chromosomal location, have been described to date in the Genome Data Base (GDB), October 19, 1993, The Johns Hopkins University, Baltimore, Maryland. In contrast to older approaches, such as RFLP, many of the preferred SSR markers are heterozygous (alleles differ at a particular locus) > 50% of the time and therefore are highly informative for linkage studies. Each allele of the markers used in the method of this invention will be easily detectable after amplification by PCR as a predictable component of a complex image or signature by 5' end labeling with 32P, labeling with fluorescence, or by a variety of other methods. Most preferably, the markers also produce an easily scored product or simple pattern of stutter bands that are the signature of mononucleotide and dinucleotide repeats.
Most dinucleotide repeats produce two or three smaller less intense products or "stutter bands" (Weber, 1989). These are artifacts produced during PCR, and are less common in PCR of tri-and tetranucleotide repeats. Although these stutter bands have been generally considered undesirable, they can be quite helpful to the investigator (or computer) during the scoring of genotypes by allowing for the identification of 'false' bands (background bands due to non¬ specific annealing). Each allele can then be easily scored by 5' end labeling with 32P or fluorescence after amplification by PCR, as a predictable component of a complex image. Background bands are generally not associated with stuttering artifacts. Because artifacts due to nonspecific annealing are difficult to eliminate entirely from a PCR reaction, the adaptation of a similar protocol for the multiplex semi-automated genotyping of tri-, and tetranucleotide repeats may be more problematic. The method of this invention reduces artifacts due to non- specific annealing by control of the annealing temperature for respective primers during
temperature cycling.
The use of dinucleotide SSR is preferred in the method of this invention, because the potential advantages for automated genotyping may not be so easily incorporated into practice for mono-, tri- and tetranucleotide repeats. PCR products of trinucleotide and tetranucleotide repeats lack the unique "stuttering" signature of dinucleotide repeats, making it difficult for the computer to distinguish real alleles from artifacts produced by nonspecific annealing during PCR. Although a simple set of PCR products are produced as alleles (little or no stuttering) from tri- or tetranucleotide SSRs, it is often difficult to eliminate other PCR artifacts completely. These PCR artifacts are not easily distinguished from "false" bands when large numbers of PCR products that vary significantly in intensity are combined as described by this method. The unique signature derived from the stuttering bands of dinucleotide repeats provides a simple means of distinguishing real products (alleles) from artifactual bands.
Furthermore, the cost of the hardware is generally considered the limiting factor when adopting the fluorescent approach. Tri- and tetranucleotide markers generally require a significantly larger fraction of each gel because alleles span a much larger size range. Thus longer run time is required, and fewer markers can be resolved per gel. The cost of the hardware becomes readily affordable if one considers the utility and throughput of such an instrument when used according to the method of this invention. However, the use of fewer markers per lane (i.e., tetranucleotide repeats) would substantially reduce the cost effectiveness of the hardware by reducing efficiency.
Finally, far fewer of tri- and tetranucleotide markers have been fully characterized at present. Thus, the availability of well-characterized primers which can be assembled into SETS and GROUPS remains another limiting factor at present. Construction of Marker SETS:
The selection of markers for inclusion in each SET is based on the need to: maximize heterozygosity values (genetic informativeness), place the marker within a SET based on the size of the PCR products (alleles produced must not overlap with those of the marker above of below it), and the location of the marker in the genetic map (ideally we would have 450-500 markers placed 10 cM or less apart). The PCR products corresponding to markers within a SET are sized to assure that infrequent alleles and stutter bands do not produce overlap between the markers (compare e.g., Figures 4 and 6). PCR products for SETS of dinucleotide markers differ by approximately 9 nucleotides, preferably, at least 10 nucleotides, in length. When necessary, new oligonucleotide primers based on the unique sequence surrounding a polymorphic marker are designed and synthesized to assure that the PCR products do not overlap during electrophoresis.
Figures 7A-M show 289 SSR markers that have been selected and combined into 11 GROUPS of 21-24 markers and 2 incomplete GROUPS of 16 markers so that markers in each GROUP can be separated and analyzed simultaneously. The selected markers cover the genetic map on average once every 10 cM. Most are heterozygous greater than 70% of the time. In a preferred embodiment, each SET is composed of 8 markers from multiple linkage groups (see, e.g., Figure 7B-H). Most preferably, SETS of markers are part of a single linkage group (i.e. a single chromosome), but this may require significant additional labor because fewer existing primers will be suitable.
Additional or alternative SSR loci to assemble into GROUPS of markers may be found in GDB. Loci listed in GDB can be arranged on the genetic map by using map location information in GDB. Additional or alternative primers may then be designed using information
on the surrounding DNA sequence available in Genbank, based on the locus designations from GDB. GROUP 1 markers (Figure 7A) are currently performing well in multiple laboratories. In many cases new oligonucleotide primers must be designed from the sequence surrounding each marker to produce PCR products that fit between the products of the markers above and below it without overlap. The new primers can readily be designed from the known sequence surrounding the SSR. Criteria for selecting a sequence to be synthesized as a PCR primer are well known (see, e.g., Sambrook, et al., and Innis, et al., especially p. 9). Preferably, the unique primer 3' sequence should contain at least 7 nucleotides, the Δ G threshold should be at least -1.0 kcal/mol, most preferably -1.4 kcal/mol, and duplex formation should be avoided, the maximum length of duplex not exceeding 2 base pairs. The sequence of preferred primers will also minimize or eliminate self-complementarily, hairpin formation, and false priming. Once the sequences of candidate primers are chosen, synthesis is readily accomplished by standard methods (see, e.g., Sambrook, et al.). Optimization of PCR Conditions and Appearance on the Gel: These new primers must be tested to assure that they produce an easily scored collection of products of the correct size. Scoring may be easier if the label is on one primer rather than the other for particular markers (see, e.g., Figure 8). Primers developed for dinucleotide markers may perform well in the PCR reaction, but produce products unacceptable for genotyping (single base stuttering bands, stuttering bands of equal intensity with true alleles, or stuttering bands that are larger than the correct allele), and such primers should be avoided.
For best results, the PCR conditions for each marker should be optimized to eliminate any artifactual PCR products due to nonspecific annealing that may complicate the analysis of a GROUP of combined markers. In particular, the temperature of the annealing phase of each PCR cycle should be optimized for each primer pair. Accordingly, the annealing phase temperature is set relatively high, so that specific hybridization occurs, but non-specific hybridization between the template DNA and the primers is minimized. Usually, the selectivity provided by this optimization is preserved in the method of this invention by limiting the number of primer pairs in any PCR reaction vessel to those whose optimized annealing temperature is the same or nearly the same. Preferably, all primer pairs in the same PCR vessel have annealing temperatures within 4C of each other. At one extreme, an entire 96 well plate is dedicated to PCR reactions using primers for a single marker. (When genotyping is preformed for a large number of individuals, using a separate plate for PCR reactions for each marker will not reduce efficiency.) Alternatively, each PCR vessel on a plate has only one primer pair, but the plate contains vessels having different primer pairs, so long as all primer pairs on the same plate have annealing temperatures within 4C. In a preferred mode, all of the primer pairs for a SET or even a GROUP are constructed to have optimized annealing temperatures in a narrow range, most preferably 4°C, and all of the primers are present in a single PCR reaction vessel, obviating the need to mix the individual PCR products prior to electrophoretic separation.
In addition, each marker should be evaluated to assure it is sized correctly within the SET and that the alleles can be easily scored as distinct products. Furthermore, reported heterozygosity values are usually verified using a population of unrelated individuals. The same DNA templates provided herein may be used as controls for verification of protocols and quality assurance. Preferred controls include CEPH parents (BIOS corporation, New Haven, Conn.; Cell Repository, Camden, N.J.), such as families 1331, 1347, 884, for which reference alleles are known (see, Weber, et al., and Genethon Microsatellite Map Catalog, Genethon Human Genome Research Center, Evry, France). Pooled DNA from volunteers who have donated blood that has been purified as described in the EXAMPLES may be used as well.
This optimization process requires the synthesis of oligonucleotide primers, dilution and aliquoting of primers, identification of the appropriate annealing temperature (T°) and PCR protocol, electrophoresis of the products, autoradiography and data analysis. If labelled primers are used for detection of products, 5' end labeling of both primers should be tested to determine which one produces the best image1. The size of the PCR products from each marker should be verified experimentally to assure that it does not overlap with the products of the surrounding markers in the same SET. As a control for this purpose, PCR products from a pool of DNA samples from a population of unrelated individuals may be electrophoresed against a DNA sequence ladder. In a preferred mode the test pool will contain at least 50 chromosomes.
Initial characterization of primers for each SSR marker may be performed with 32P labels because this is less costly, but the smooth adaptation of fluorescent-based techniques for genotyping with markers that have been optimized using 32P is also dependent on assuring the PCR products labelled with a fluorescent dye perform as expected during PCR and analysis. Therefore, the reliability of the developed protocol should be checked by electrophoresis of DNA samples labelled by PCR with the fluorescent labels.
Frequently the image produced by labeling one of the pair of primers is blurred, see, e.g., Figure 8. The PCR products of different microsatellite markers frequently vary significantly in intensity (see, e.g., Figure 9). The sizing of fluorescent PCR products of grossly different concentrations is potentially complicated by sample overloading, causing spectral interference between the dye labels during analysis. There was no interference in the detection of the overlapping products using the four dyes in Examples 1 or 5, because the concentration of each PCR product was determined and adjusted to prevent overloading. However in our experience this can become a problem when working routinely with 21 to 24 pooled markers.
Overloading can lead to artifacts that become especially troublesome when they are interpreted as internal size standards. To prevent the inaccurate sizing of the products by the GeneScan 672 software, we have found that the selection of the standard peaks must be carried out manually. During large scale applications, such as in our linkage studies, this may become a serious problem. Moreover, it is often impractical to estimate the concentration of each of the fluorescent products in order to adjust the concentration of the individual samples to be pooled. Generally adjustments in the volumes for each marker can be made for all the samples by estimating the relative intensity of the marker within a SET. This is easily accomplished by referring to the data table of fluorescent band intensities or by viewing the electrophoretogram directly.
In a preferred mode, PCR products are recovered and combined into a mixture containing the GROUP by a simple protocol that uses magnetic separation technology to purify the fluorescent PCR products and which restricts the total amount of product pooled to prevent overloading. Magnetic separation provides simple separations based on specific binding interactions without the need for expensive centrifuges. Saturation binding to a limited amount of paramagnetic beads can be used to control the amount of labelled PCR product carried
Figure imgf000030_0001
forward in the analysis. Relative intensity may be adjusted by this means and overloading may be avoided.
In a preferred embodiment, one primer is labelled with a component that will bind to magnetic microbeads, for example biotin-labelled primers will bind to streptavidin-coated magnetic beads. Methods for labelling primers with biotin are taught in, e.g., Innis, et al.,
"PCR Protocols," 1990, pp. 100-103 and references cited therein. Magnetic beads coated with streptavidin are commercially available (Dynabeads™) and procedures for separation are described in, e.g., "Magnetic Separation Techniques Applied to Cellular and Molecular Biology," Kemshead, etal., eds., Wordsmiths' Conference Publications, Somerset, U.K., 1991. A fixed amount of magnetic beads are added to the PCR reaction after amplification using primers that will bind to the magnetic beads. The magnetic beads with the PCR product attached are separated from the remainder of the PCR reaction mixture, including salts and unused, detectably-labelled primer, and then the PCR product is recovered from the magnetic beads (for example, by separating the strands, leaving one strand attached to the bead and recovering the other strand whose primer carries the detectable label).
Alternatively, the entire PCR product may be labelled by including biotinylated UTP in the PCR reaction medium as described by Dennis, et al., 1990, in "PCR Protocols," Innis, et al., eds. The PCR product can be bound to the beads for purification from the PCR reaction mix and excess primer, and subsequently recovered from the beads by, for example denaturation of streptavidin. In another alternative mode, paramagnetic beads which have attached to their surfaces single stranded DNA corresponding to a part of the sequence of the PCR product may be added to the PCR reaction mix at the end of amplification, followed by cycling above the melting temperature, reannealing and then separating the paramagnetic beads and any other DNA strands annealed to the beads from the reaction mix. Labelled strands can then be recovered from the beads, as above.
Selection of SETS and GROUPS of fluorescent SSR markers covering the human genome (approximately 300) can be completed in approximately 6-9 months, using the procedures provided herein. Preferably, additional fluorescent markers will be developed (approximately 500 SSR markers) providing a higher resolution tool for gene mapping. The resolution of this marker collection will approach 10 cM and will preferably cover the telomeres which will better assure linkage detection in complex non-Mendelian disorders like asthma and diabetes.
The development of a common index set of fluorescent markers that can be used in multiple laboratories simultaneously should provide certain advantages in genomic studies.
Typing these common index loci in a number of different populations afflicted with the same disorder will facilitate the comparison of linkage results and provide the information required for the eventual application of these techniques to forensic medicine.
The method of this invention offers several significant advantages over a similar strategy adopted by Diehl et al., 1991, Am. J. Hum. Genet., 47:177. Spacing markers in a SET according to this invention avoids overlap, providing improved discrimination among markers and between markers and artifacts. As many as eight or more markers may be incorporated into a SET. When necessary, new oligonucleotide primers based on the unique sequence surrounding a polymorphic marker can be designed and synthesized as taught herein to assure that the PCR products do not overlap during electrophoresis. Errors introduced by sample handling may also be minimized by storing DNA from each individual to be studied in a 96-well format. Our protocol preserves the integrity of a 96 well format including PCR amplifications, product pooling, and sample purification, thereby minimizing sample handling and errors introduced by excessive sample manipulations. In a preferred mode, efficiency is further aided by the transfer of a row of samples by multichannel pipette.
The combined analysis of multiple markers maximizes the use of the Applied Biosystems 373 sequencer or similar automated analysis hardware. Since the capacity of the 373 sequencer is 36 lanes per gel, 864 genotypes (1728 alleles) can be analyzed routinely from one gel using the semi-automated method of this invention. A typical linkage study would include about 100
families or about 500 individuals. For a 5-year study including about 300 markers, approximately 180 gels, or about 3 gels per month, will be required. By using the method of this invention, at least 2 gels per day can be run per 373 sequencer. Thus, up to 12 investigators can be accommodated on one instrument, which substantially reduces the cost per investigator.
The method of this invention can also increase the efficiency of diagnostic studies of the genome, when the desired diagnostic procedures involve the detection of genetic changes that affect the length of genomic DNA at 6 or more locations. Such changes include additions, deletions, intra-and interchromosomal crossover, gene amplification and similar gene rearrangements. The loci of many such rearrangements are known and associated with many diseases, especially cancers and metabolic errors inherited recessively. PCR using primer pairs which direct amplification of a DNA segment including one of these loci can be used diagnostically where the rearrangement associated with the disease causes a change in the length of the PCR product. A SET of primers designed according to the principles above can be used in the production of PCR products that can be analyzed electrophoretically in a single lane, for more efficient use of electrophoresis and analysis equipment. EXAMPLES
The following examples describe particular embodiments within the broader invention. These embodiments are described for illustrative purposes only, without intention to limit the invention. EXAMPLE 1
As an initial test of the fluorescence technology, a study was conducted to compare the accuracy and efficiency of these methods with a conventional radiation-based method. Three microsatellite loci producing PCR products that overlap in size were chosen to compare the accuracy of genotyping by fluorescence versus radiolabeling. Discrepancies between the genotypes derived from each technique were resolved by repetition. To estimate the variation in sizing of the fluorescence-based technique certain samples were loaded on 3 or more gels for comparison. DNA from CEPH (Centre d'Etude du Polymorphisme Humaine, Paris) families 884, 1331, 1332, 1333, 1362 were amplified for Marshfield markers, mfd 1 (176-196bp), mfd 59 (175-195bp), and mfd 154 (186-204bp) using the polymerase chain reaction (PCR). Fluorescent techniques: The forward and reverse primers were each labelled at the 5' end for detection by autoradiography with [32P] γATP(6000 Ci/μmole) using polynucleotide kinase. A primer was selected from each marker for fluorescent labeling on the basis of the image of the products (see Figure 8). The optimal annealing temperature was selected for each marker empirically by selecting a temperature that eliminated nonspecific annealing or artifactual (background) PCR products. Fluorescent labels were attached at the 5' end via phosphoramidate derivitization using Aminolink 2 (Applied Biosystems). Primer B (see Figure 10) for mfd 1 was labelled yellow (TMR), primer A (see Figure 10) for mfd 59 was labelled blue (FAM), and primer B (see Figure 10) for mfd 154 was labelled green (JOE). PCR conditions were: 0.4 μM primers, 1.5 μM MgCl2, 50 μM Kcl, 200 μM dNTPs and 0.5 units Taq polymerase (final con¬ centrations); 94°C for 10 min; followed immediately by 30 cycles of 94°C for 30 sec; 58°C (mfd 59, mfd 154) for 30 sec or 60°C (mfd 1) for 30 sec; and 72°C for 30 sec; followed by 72°C for 7 min. PCR was carried out in a volume of 12.5 μl using 25 ng of CEPH DNA. CEPH DNA was stored in a 96 well microtiter plate (Perkin Elmer/Cetus). Amplifications were
performed in 96 well microtiter plates using a Perkin Elmer/Cetus Model 9600 thermalcycler and accessories, maintaining the integrity of the 96 well template. Five microliters were combined from each marker for each CEPH individual using a multichannel pipette (Transferpette-8, Brinkman). The pooled PCR products were desalted by adding 2 volumes of sterile deionized distilled water (ddH2O), ice cold ethanol (100%) equal to the total volume, and chilling for 30 minutes at -70°C. The microtiter plate was spun at 4°C at 1400XG for 2 hours in a Beckman Model GS6R centrifuge. The supernatant was aspirated, the pellet was washed once with 1.5 volumes of ice cold ethanol (70%), and the plate centrifuged 30 minutes at 1400XG at 4°C. The supernatant was aspirated and the plate was air dried. Pellets were resuspended in a volume of sterile ddH2O equal to the starting volume (pool).
Radiolabelled products were separated by conventional electrophoresis and scored manually from autoradiographs. Fluorescent PCR products were separated on a 373 sequencer with internal size standards in each lane (GeneScan 2500-ROX; Applied Biosystem) and analyzed using GeneScan™ 672 software (Applied Biosystems). Each sample (representing 0.5 μl of each product) was heated to 99 °C after adding 1 μl of the internal lane size standards (GeneScan 2500-ROX, Applied Biosystems) and 2 μl formamide/EDTA loading buffer, until the total volume was reduced to 2-3 μl. Electrophoresis was carried out using 6% acrylamide (Biorad), 8 M urea (Ultrapure, USB) gels in 1 X TBE. The reduced volume was loaded and run for 4-8 hours on a model 373 Sequencer (Applied Biosystems) using a 24 cm well to read distance.
The size of the PCR product is determined by reference to the internal lane size standards (Carrano et al. 1989, Genomics, 4:129-136). The size standard ROX-2500 (Applied Biosystems) including fragments: 37, 94, 109, 116, 172, 186, 222, 233, 238, 269, 286, 361, and 479 nucleotides in length was used with modifications. PCR fragments 61 and 68 nucleotides in length were gel purified, labelled by aminolinking with ROX, and added in equal volumes to the ROX-2500 standards. These fragments were added because desalting by ethanol precipitation recovers the unused PCR primers with the products. The intense peak produced by the unincorporated labelled primer is seen in the standards because of interference between dyes and obscures the detection of the 37 nucleotide standard fragment. Therefore, we have modified the GeneScan-2500 standards to provide a fragment of known size labelled with ROX to accurately estimate the length of the smallest alleles.
The GeneScan 672 (version 1.0) software recognizes any peak labelled with ROX, computes a calibration curve based on a second-order least-squares fit, and uses these data to estimate the allele sizes of the PCR products (Ziegle et al. 1992). Data from each lane can be analyzed independently, or four lanes of data for a single fluorescent dye can be displayed simultaneously to compare individuals within a family. Allele sizes in nucleotide bases, the genotypes, are assigned by interactively distinguishing major peaks from background artifacts. The scale on the display can be adjusted to analyze peaks with differences in fluorescent intensity. The intensity of each fluorescent band and peak areas provide an objective method of distinguishing alleles from artifact (including stuttering bands). Allele sizes can be transferred to a spreadsheet database for linkage or a multicolor electrophoretogram. mfd 1, mfd 59, and mfd 154 PCR products overlap in size (175-204) bp (see Figure 10). There was no evidence of interference between the dyes even when there was complete overlap during the electrophoresis of PCR products, similar to that reported by Ziegel et al., 1992. In our experience, interference between dyes does become a problem with overloaded samples. A comparison of the genotyping results of the radioactive and fluorescent labeling methods
revealed 4 discrepancies out of 462 possible comparisons (alleles) (see Table 1). One
transcription error occurred in the manual data manipulation of the fluorescently labelled products. There was no interference between fluorophores with the detection of the overlapping products using the four dyes. No sizing errors were attributed to the fluorescence-based technique and each marker displayed Mendelian inheritance. The average size variation across
all comparisons was 0.28 nucleotides. However, the maximum difference (range) found for any of the 462 comparisons was 0.47 nucleotides (see Table 2). Generally sizing varied less within a gel than between gels. The variation in the size of the alleles was similar when comparing each of the individual markers. The remaining discrepancies occurred with the use of the standard radioactive-based protocol and represented an enor rate of less than 1 % . Inaccurately sized PCR products and sample misloadings produced mistypings with the conventional technique (see Table 1). In general, fluorescent internal size standards provided more precise sizing than did radiolabeling. These data demonstrate both improved accuracy and efficiency for typing SSR markers with use of fluorescence-based techniques. TABLE 1
CEPH Genotype Genotype Explanation DNA Marker Radiolabelled Fluorescence
884-18/mfd 1 178,192 178,194* size estimate error
1331-16/mfd 59 179,179 179,185* gel loading error
1331-17/mfd 59 179,170 179,185* gel loading error
61332-15/mfd 154 185,200* 200,200 recording error
* indicates correct score by length in nucleotide residues
TABLE 2
COMPARISON RANGE (in nucleotides)
Maximum Average Standard Deviation intergel571 0.47 0.28 .08 intragel"1 0.42 0.18 .07 mfd l247 0.35 0.19 0.1 mfd 59177 0.37 0.15 .08 mfd 154147 0.42 0.23 .06
Superscripts indicate number of samples EXAMPLE 2 Mapping with Fluorescent Primers
Genomic DNA is isolated as described by M.J. Johns, et al., Analytical Biochem., lSQ:276-278 (1989).
To minimize sample handling, DNA templates can be stored in a 96 well grid (e.g., Perkin Elmer/Cetus). The integrity of the grid may be maintained throughout the protocol to avoid errors introduced by manual pipetting and sample handling. Multichannel pipetting from a 96-well grid expedites sample handling while minimizing human errors.
PCR is performed in a reaction volume of 12.5 μl, containing 50μM dATP, dGTP, dTTP, dCTP; 0.07μM of the labelled oligonucleotide primer, and 4 μM of the unlabelled primer. Taq polymerase (Perkin-Elmer\Cetus) 0.5 units is added on ice. PCR will usually be
performed in a thermalcycler, e.g., a Perkin-Elmer\Cetus 9600 thermalcycler. Standard thermalycycler settings are 94°C for 10 minutes, followed by 30 cycles 94°C for 30 seconds, 30 seconds at average annealing temperature for the primers and 72°C for 30 seconds; final extension is at 72 °C for 7 minutes. Labelled PCR products are purified by co-precipitation in EtOH. 24 markers may be co- precipitated simultaneously in the 96-well format using ethanol. Ethanol precipitation desalts the products but copurifies the primers. The labelled primer peak produces an enormous signal that complicates the analysis of products under 93 nucleotides in length because it interferes with the 37 nucleotide ROX GeneScan-2500 standard. As an alternative, internal standards may incorporate fragments that are 50, 60, and/or 70 nucleotides in length in addition to the GeneScan 2500 standard fragments or an equivalent set of fragments.
The amplified products are analyzed by denaturing gel electrophoresis (Sambrook, et al.). Loading buffer (2X concentration) is added to an equal volume of the PCR reaction, and die PCR reaction is loaded on a 6% polyacrylamide gel. Radioactive products will be sized against a sequence ladder; the gels are dried and then exposed to Kodak XAR film for 4-24 hours with or without intensifying screens. Fluorescent labelled PCR products may alternatively be analyzed by semi-automated detection using, e.g., an ABI 373 A automated sequencers and GeneScan 672 software from Applied Biosystems, Inc. EXAMPLE 3
PCR products are produced as in Example 2 and then purified and combined for electrophoresis using a magnetic bead protocol in place of EtOH precipitation. One of each pair of primers is labelled with biotin and the other with a fluorescent label as above. Double stranded PCR products are purified using streptavidin conjugated to paramagnetic beads to bind the primer 5' labelled with biotin. This procedure may be easily adapted to the 96-well format in any laboratory without expensive centrifuges. After the DNA bound to magnetic beads is separated from the PCR reaction media, the two strands are melted and separated, and the strand labelled with the fluorescent primer is pooled with other labelled strands of its GROUP for electrophoresis. The result of increasing the amount of beads used for separation of a single PCR product from its PCR reaction mix is shown in Figure 12. EXAMPLE 4
32P OPTIMIZATION OF PRIMER SETS DNA Templates CEPH parents and/or unrelated volunteers as controls may be tested. In addition, we usually include one "no DNA" control and one reference individual (alleles known) on each plate. To maximize the use of resources, each marker may be optimized, using 12 wells or less of a 96-well plate. Eight markers are amplified per plate at a single temperature. Alternatively, a thermalcycler with a smaller sample capacity may be used. The 5' end of the primers to be tested is labelled with 32P using the polynucleotide kinase reaction. Mix 5μ sterile ddH2O, 2.8 μl 5x kinase buffer (250 mM Tris, 50 mM MgCl2, 50 mM DTT, 0.25 mg/ml BSA), 6.0 μl 10 μM primer, 0.8 μl T4 polynucleotide kinase, and 3.0 μl γ^P ATP (6000 Ci/mmol). Incubate at 37° for 1 hour, then add 26 μl sterile ddH2O, spin through select D column (Five Prime Three Prime) loaded with P4 Biolgel (BIORAD) according to the manufacturers recommendations. The labelled primers may be stored at -20°C.
For optimization, set up simultaneous PCR reactions as described in Example 2, using DNA templates described above (e.g., 2 CEPH (1331-1, 1347-02), 1 pooled sample (50
chromosomes), 1 no DNA). Perform PCR at the annealing temperature (T°) calculated as follows
T° = 2(A+T) + (G+C) (If the calculated temperatures for 2 primers differ greatly, for example 54° and 64°, begin closer to lower T°) Check the amplified PCR product for artifact by electrophoresis on 6% gel. Continue optimization of the selected 32P-labelled primer with control individuals, increasing the annealing temperature in 2° increments until nonspecific products are eliminated. On average, determinations at approximately 4 T° values are required to optimize each primer.
When all markers from a SET are optimized (usually 8 markers), 3 μl from a pool of PCR product of DNA from unrelated individuals using primers for each marker in the SET is combined with an equal volume of loading buffer (2X concentration). Seven μl (or maximum well volume) of the combined mixture is loaded on a gel and electrophoresed. This last check on size and product intensity assures that the markers are robust and are spaced about 10 nucleotides apart. The primer sequences may then be used to synthesize fluorescent/biotinylated products.
EXAMPLE S
A protocol extending this approach to include up to 24 microsatellite markers in each electrophoretic lane was tested as follows. The selection of markers was based on the need to: maximize heterozygosity (genetic informativeness), distribute markers across the entire genetic map, and the placement of the marker within a SET based on the known size of the PCR products (alleles and stuttering bands produced must not overlap with those of the marker above of below it). Highly informative microsatellite markers were assembled into a ladder or "SET". Each marker in a SET is spaced a distance of at least 9 nucleotides from surrounding markers such that none of the PCR products overlap in size when separated on a 6% denaturing acrylamide gel. Since many dinucleotide repeats produce a complex pattern of 3 or more stutter bands, this spacing is critical to assure that more intense stutter bands from an upper marker will not be misinterpreted as a product from a lower marker. In addition, new alleles both larger and smaller than the reported product sizes for this type of marker have occasionally been discovered. Each SET was labelled with one of three different commercially available fluorophores (TMR, FAM, and JOE; Applied Biosystems). The fourth fluorophore (ROX) was reserved for the internal size standard. Three SETS each labelled with a different fluorophore were pooled into a collection of markers we have termed a "GROUP".
New primers were designed as necessary using OLIGO 4.0 (Research Genetics, Huntsville, AL) to fit within the marker ladder. Each GROUP was constructed to avoid overlap between markers within SETS but to allow overlap between SETS.
The autoradiographic image produced by many markers varied depending on whether the forward or reverse primer was labelled (see Figure 8). Therefore, both primers from each marker were evaluated for image clarity and the ability to distinguish the most intense product(s) or alleles. The appropriate primer was then selected for further use. Optimization of the PCR conditions for each marker was also accomplished using radiolabeling. The strategy of developing a ladder of markers warranted that the conditions for PCR eliminate nonspecific annealing and background bands. When nonspecific annealing could not be eliminated by raising the annealing temperature, a new marker was chosen for use. Thus uniform PCR conditions as described in Example 1 were used for all the markers chosen except that the annealing temperature was specific to each marker. GROUPS 1 and 2 have 6 and 9 different annealing
temperatures, respectively (see Figures 7A and B). An entire microtiter plate containing DNA from a number of different individuals will usually be amplified for a given marker at one temperature at a time, so this should not reduce the overall efficiency of the protocol. For studies with fewer samples a thermalcycler block may be used with a lower capacity. Variability among thermalcycler operating temperatures may require adjusting the annealing temperature when switching from one machine to another. Therefore the use of the protocols described for marker GROUPS 1 and 2 should be preceded by a reevaluation of the suggested annealing temperatures for optimal performance. This can generally be carried out once on a few markers and when necessary the annealing temperatures can be adjusted up or down for all the markers for that machine.
The intensity of the products varied considerably from marker to marker. When markers were radiolabelled and a SET was run on the same gel, detecting all of the products on the gel with a single film exposure was often impossible. Attempts to score on a single gel the larger products in each SET using radioactive-based techniques were unsuccessful. Although gradient gels improved the band spacing, a maximum of 4-5 markers could be resolved per gel on autoradiographs. An autoradiograph of GROUP 2 SET B is shown in Figure 9. The range of intensity in the products of this SET is typical of this type of marker and multiple autoradiographs are required for genotyping. These problems are partially overcome by the use of fluorescent labels (Ziegle et al., 1992). Fluorescent signal detection is linear over a greater range, so that the markers with the weakest product intensity are more readily typed in real-time along with the most intense products from other markers.
Marker GROUPS 1 and 2 are described in Figures 7A and B, respectively. The primers sequence, chromosomal location, choice of labelled primer, and optimal annealing temperature is listed for each locus. GROUP 1 is composed of a combination of 21 di-, tri-, and tetranucleotide markers from multiple linkage groups. The product sizes range from 66 to 322 nucleotides. Group 2 is composed of 24 dinucleotide markers with products ranging in size from 75 to 349 nucleotides. The mean heterozygosity for both GROUPS is 74%. Scoring of the fluorescent products using the ABI 373 sequencer and GeneScan 672 software was unambiguous in samples that were desalted by ethanol precipitation. Desalting was carried out as follows: 5 μl of each PCR product from the same SET (like color) was combined.
Then 1.0 μl per marker per SET was combined for each of the 3 SETS giving a final volume equal to the total number of markers in the GROUP. Sample handling was otherwise exactly as described above for the individual fluorescent markers.
A typical set of electrophoretograms of each SET from GROUP 2 for a single individual is illustrated in Figure 5. Each of the alleles can be easily recognized by the unique signature of the stuttering bands for these dinucleotide repeat markers amplified by PCR. Samples that were not desalted were difficult to score because the mobilities of the products and the ROX-2500 internal lane standards were altered. Salt and primer loads become a problem when combining multiple products for electrophoresis because the necessary volume reduction results in sample concentration. The salt concentration rises with the product concentration and
SUBSTITUTE SHEET (P/JLE 26) interferes with the separation of the products and standards. This becomes critical when pooling 21 to 24 markers.
It will be understood that while the invention has been described in conjunction with specific embodiments thereof, the foregoing description and examples are intended to illustrate, but not limit the scope of the invention. Other aspects, advantages and modifications will be
apparent to those skilled in the art to which the invention pertains, and these aspects and modifications are within the scope of the invention, which is limited only by the appended claims.

Claims

CLAIMS:
1. A kit for use in automated genotyping within a population comprising at least 4 GROUPS of at least three SETS each comprising labelled pairs of primers for amplification of DNA by polymerase chain reaction (PCR), each primer pair having unique sequence found in the flanking sequences of a microsatellite sequence comprising a nucleotide repeat sequence flanked by unique sequences, such that a polymerase chain reaction (PCR) primed with the primer pair amplifies the nucleotide repeat sequence and at least some immediately adjacent unique sequences of the microsatellite sequence to produce a PCR product identified with the primer pair, wherein the microsatellite sequences are nucleotide repeat sequences that are polymorphic within the population, each SET consisting of at least 6 primer pairs, each primer having the sequence of unique sequences respectively flanking at least 6 microsatellite sequences in the genome, such that the length of the segment amplified by a particular primer pair differs from the length of all other segments in the SET by at least 5 nucleotides, and at least one primer of each primer pair is labelled with a fluorescent label that is the same fluorescent label for all primer pairs in the SET, each GROUP consisting of at least three SETS of primer pairs labelled with fluorescent labels, wherein the wavelength at which the respective fluorescent labels fluoresce is substantially different for the labelled primers in each of the respective SETS, wherein the distance in the genome between one microsatellite sequence amplified by a primer pair of the kit and the nearest other microsatellite sequence amplified by another primer pair of the kit is at least 2 centimorgans (cM) and no more than 50 cM.
2. The kit of claim 1, wherein the PCR products identified with any primer pair amplifying microsatellite sequences containing dinucleotide repeats differ in length from PCR products identified with all other primer pairs of the same SET by at least 9 nucleotides.
3. The kit of claim 1, wherein one of said GROUPS consists of the three SETs of Figure 7A.
4. The kit of claim 1, wherein one of said GROUPS consists of the three SETs of Figure 7B.
5. The kit of claim 1, containing the 6 SETs shown in Figures 7 A and 7B.
6. A method of analyzing genomic DNA for the presence of polymorphisms comprising a) extracting DNA from a human sample; b) combining, in a polymerase chain reaction (PCR) vessel, an aliquot of said DNA from a human sample, at least one primer pair selected from a GROUP in the kit of claim 1, and PCR amplification enzymes; c) cycling the temperature of each PCR vessel so that PCR products identified with said at least one primer pair are produced by PCR amplification of segments from said DNA from a human sample, each vessel being cycled at an annealing temperature wherein non¬ specific annealing of the primers to said DNA from a human sample is minimized; d) then combining all PCR products from all PCR vessels containing primer pairs from one GROUP into a mixture, and subsequently separating the mixture of PCR products electrophoretically by size; e) detecting separated PCR products by fluorescence detection at wavelengths corresponding to the fluorescent wavelength for each of the fluorescent labels in the kit.
7. The method of claim 6, wherein the step of combining amplified DNA further comprises: i) contacting each vessel with a plurality of paramagnetic beads carrying on the surface a protein which specifically binds biotin, further wherein one primer of each primer pair is labelled with a fluorescent label and the other with biotin, for a period sufficient for said protein to bind biotin; ii) separating the magnetic beads from the PCR reaction medium; iii) separating the two strands of the amplified DNA segments and combining the strands labelled with a fluorescent label for all primer pairs from one GROUP into a mixture.
8. The method of claim 6, wherein the step of combining amplified DNA from the PCR vessels further comprises: i) contacting each vessel with a plurality of magnetic beads carrying DNA complementary to the sequence of one primer of the primer pair in the vessel for a period sufficient to allow annealing between the primer and the DNA on the magnetic beads; ii) separating the magnetic beads from the PCR reaction medium; and iii) eluting the PCR product from the magnetic beads.
9. The method of claim 6, wherein each primer pair of said kit is added to a different PCR vessel in step (b), such that the annealing temperature for temperature cycling in step (c) is the temperature wherein non-specific annealing of the unique primer pair is minimized and PCR product from all PCR vessels containing at least one primer pair from the same GROUP are combined in a single mixture before electrophoretic separation.
10. A method for selecting a SET of PCR primers for use in automated genotyping comprising selecting at least 6 microsatellite sequences in the human genome, wherein the microsatellite sequences are selected from dinucleotide, trinucleotide and tetranucleotide repeat sequences that are flanked by unique sequences, said microsatellite sequences being separated
from each other by at least 2 centimorgans in the genome and being polymorphic within the population; constructing primer pairs for each microsatellite sequence, said primers having the sequence of the unique sequences flanking the microsatellite sequences, such that the length of all polymorphs of the DNA segment amplified by a particular primer pair is detectably different from the length of all polymorphs of other segments amplified by primers in the SET.
11. A kit for use in automated genotyping comprising at least 4 GROUPS of at least 3 SETS of PCR primers obtained by the method of claim 10.
12. The kit of claim 11 , wherein at least one primer of each primer pair in the SET is labelled with a fluorescent label that is the same fluorescent label for all primer pairs in the
SET.
13. The kit of claim 11, wherein the length of all polymorphs of the DNA segment amplified by any primer pair amplifying microsatellite sequences containing dinucleotide repeats differs in length from the DNA segment amplified by all other primer pairs of the same SET by at least 9 nucleotides.
14. A method of analyzing genomic DNA for the presence of polymorphisms comprising a) extracting DNA from a human sample; b) combining, in a polymerase chain reaction (PCR) vessel, an aliquot of said DNA from a human sample, at least one primer pair selected from a GROUP in the kit of claim 11, and PCR amplification enzymes; c) cycling the temperature of each PCR vessel so that PCR products consisting essentially of amplified DNA segments labelled with detectable labels are produced by PCR amplification and the PCR products for all primer pairs in the SET are detectably labelled with the same label, each vessel being cycled at an annealing temperature wherein non¬ specific annealing is minimized; d) separating electrophoretically by size a mixture containing all PCR products amplified from said DNA from a human sample by any primer pair of said SET; e) detecting separated detectably labelled PCR products and characterizing them by length.
15. The method of claim 14, wherein the mixture in step (d) containing all PCR products amplified from said DNA from a human sample by any primer pair of said SET is obtained by: i) contacting each vessel with a plurality of paramagnetic beads carrying on the surface a protein which specifically binds biotin, further wherein one primer of each primer pair is labelled with a fluorescent label and the other with biotin, for a period sufficient for said protein to bind biotin; ii) separating the magnetic beads from the PCR reaction medium; iii) separating the two strands of the amplified DNA segments and combining the strands labelled with a fluorescent label for all primer pairs from one GROUP into a mixture.
16. The method of claim 14, wherein the mixture in step (d) containing all PCR products amplified from said DNA from a human sample by any primer pair of said SET is obtained by: i) contacting each vessel with a plurality of magnetic beads carrying DNA complementary to the sequence of one primer of the primer pair in the vessel for a period sufficient to allow annealing between the primer and the DNA on the magnetic beads; ii) separating the magnetic beads from the PCR reaction medium; and iii) eluting the PCR product from the magnetic beads.
17. A kit for analysis by polymerase chain reaction (PCR) of a genomic region containing at least 6 known loci at which genetic rearrangement is diagnostic for a disease, comprising at least one SET containing at least 6 PCR primer pairs, each primer pair having the sequence of unique sequences flanking one of said at least 6 loci of genomic rearrangement, such that a polymerase chain reaction (PCR) primed with the primer pair amplifies the DNA segment surrounding the locus of rearrangement to produce a PCR product of characteristic length, wherein the length of the PCR product is associated with specific diagnostic information, and wherein the length of the PCR product amplified by a particular pair of primers differs from the length of all other PCR products amplified by other primers in the SET and die PCR products for all primer pairs in the SET are detectably labelled with the same label.
18. A diagnostic method for detection by polymerase chain reaction (PCR) of genomic rearrangement in a genomic region containing at least 6 known loci at which genetic rearrangement is diagnostic for a disease, comprising
(a) extracting DNA from a human sample; (b) combining, in a polymerase chain reaction (PCR) vessel, an aliquot of said DNA from a human sample, at least one pair of amplification primers selected from a SET of at least 6 primer pairs, and PCR amplification enzymes, each primer pair of said SET having the sequence of unique sequences flanking one of said at least 6 loci of genomic rearrangement, such that a polymerase chain reaction (PCR) primed with the primer pair amplifies the DNA segment surrounding the locus of rearrangement to produce a PCR product of characteristic length, wherein change in the length of the PCR product is associated with rearrangement at the locus of rearrangement, and wherein the length of PCR products amplified by a particular pair of primers differs from the length of all other PCR products amplified by other primers in the SET; c) cycling the temperature of each PCR vessel so that PCR products consisting essentially of amplified DNA segments labelled with detectable labels are produced by PCR amplification and the PCR products for all primer pairs in the SET are detectably labelled with the same label, each vessel being cycled at an annealing temperature wherein non- specific annealing is minimized; d) separating electrophoretically by size a mixture containing all PCR products amplified from said DNA from a human sample by any primer pair of said SET; e) detecting separated detectably labelled PCR products and characterizing them by length.
19. The method of claim 14, wherein each primer pair of said SET is added to a different PCR vessel in step (b), such that the annealing temperature for temperature cycling in step (c) is the temperature wherein non-specific annealing of the unique primer pair is minimized and PCR product from all PCR vessels containing at least one primer pair from said SET are combined in a single mixture before electrophoretic separation.
PCT/US1994/013945 1993-12-03 1994-12-05 Genotyping by simultaneous analysis of multiple microsatellite loci WO1995015400A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16083793A 1993-12-03 1993-12-03
US160,837 1993-12-03

Publications (1)

Publication Number Publication Date
WO1995015400A1 true WO1995015400A1 (en) 1995-06-08

Family

ID=22578668

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1994/013945 WO1995015400A1 (en) 1993-12-03 1994-12-05 Genotyping by simultaneous analysis of multiple microsatellite loci

Country Status (1)

Country Link
WO (1) WO1995015400A1 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996017082A2 (en) * 1994-11-28 1996-06-06 E.I. Du Pont De Nemours And Company Compound microsatellite primers for the detection of genetic polymorphisms
US5670325A (en) * 1996-08-14 1997-09-23 Exact Laboratories, Inc. Method for the detection of clonal populations of transformed cells in a genomically heterogeneous cellular sample
US5741650A (en) * 1996-01-30 1998-04-21 Exact Laboratories, Inc. Methods for detecting colon cancer from stool samples
WO1998020166A2 (en) * 1996-11-06 1998-05-14 Sequenom, Inc. Dna diagnostics based on mass spectrometry
WO1999001576A1 (en) * 1997-07-02 1999-01-14 University Of Bristol Method of determining the genotype of an organism using an allele specific oligonucleotide probe which hybridises to microsatellite flanking sequences
WO1999028466A1 (en) * 1997-12-04 1999-06-10 Board Of Regents, The University Of Texas System Compositions and methods of use of het
US5928870A (en) * 1997-06-16 1999-07-27 Exact Laboratories, Inc. Methods for the detection of loss of heterozygosity
US5952178A (en) * 1996-08-14 1999-09-14 Exact Laboratories Methods for disease diagnosis from stool samples
US6020137A (en) * 1996-08-14 2000-02-01 Exact Laboratories, Inc. Methods for the detection of loss of heterozygosity
US6100029A (en) * 1996-08-14 2000-08-08 Exact Laboratories, Inc. Methods for the detection of chromosomal aberrations
WO2000065088A2 (en) * 1999-04-26 2000-11-02 Amersham Pharmacia Biotech Ab Primers for identifying typing or classifying nucleic acids
US6146828A (en) * 1996-08-14 2000-11-14 Exact Laboratories, Inc. Methods for detecting differences in RNA expression levels and uses therefor
US6203993B1 (en) 1996-08-14 2001-03-20 Exact Science Corp. Methods for the detection of nucleic acids
WO2001029248A2 (en) * 1999-10-19 2001-04-26 Bionex, Inc. Method for amplifying and detecting nucleic acid
US6258538B1 (en) 1995-03-17 2001-07-10 Sequenom, Inc. DNA diagnostics based on mass spectrometry
US6280947B1 (en) 1999-08-11 2001-08-28 Exact Sciences Corporation Methods for detecting nucleotide insertion or deletion using primer extension
US6300077B1 (en) 1996-08-14 2001-10-09 Exact Sciences Corporation Methods for the detection of nucleic acids
WO2002022879A2 (en) * 2000-09-15 2002-03-21 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
DE10236711A1 (en) * 2002-08-09 2004-02-26 Universität Hohenheim Typing genes that contain polymorphic microsatellite loci, useful for identifying predisposition to disease, by amplification and determining length of amplicons
US7074769B2 (en) 1998-07-02 2006-07-11 The Trustees Of Columbia University In The City Of New York Oligonucleotide inhibitors of bcl-xL
US7202031B2 (en) 2000-09-15 2007-04-10 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
US7468248B2 (en) 2002-12-31 2008-12-23 Cargill, Incorporated Methods and systems for inferring bovine traits
EP2210945B1 (en) * 1998-01-14 2013-06-26 Novartis Vaccines and Diagnostics S.r.l. Neisseria meningitidis antigens
US9109256B2 (en) 2004-10-27 2015-08-18 Esoterix Genetic Laboratories, Llc Method for monitoring disease progression or recurrence
WO2017051439A1 (en) * 2015-09-21 2017-03-30 UNIVERSITA' DEGLI STUDI Dl MESSINA Haplotypes of d7s 6440 microsatellite internal to the hipk2 gene as markers of autoimmune thyroiditis
US9777314B2 (en) 2005-04-21 2017-10-03 Esoterix Genetic Laboratories, Llc Analysis of heterogeneous nucleic acid samples
WO2023039509A1 (en) * 2021-09-10 2023-03-16 Cold Spring Harbor Laboratory Method of measuring microsatellite length variations

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5075217A (en) * 1989-04-21 1991-12-24 Marshfield Clinic Length polymorphisms in (dC-dA)n ·(dG-dT)n sequences
GB2259138A (en) * 1991-08-27 1993-03-03 Ici Plc Method of characterising genomic DNA

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5075217A (en) * 1989-04-21 1991-12-24 Marshfield Clinic Length polymorphisms in (dC-dA)n ·(dG-dT)n sequences
GB2259138A (en) * 1991-08-27 1993-03-03 Ici Plc Method of characterising genomic DNA

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
AMERICAN JOURNAL OF HUMAN GENETICS, Volume 44, issued 1989, LITT et al., "A Hypervariable Microsatellite Revealed by in Vitro Amplification of a Dinucleotide Repeat Within the Cardiac Muscle Actin Gene", pages 397-401. *
AMERICAN JOURNAL OF HUMAN GENETICS, Volume 47, issued 1990, DIEHL et al., "Automated Genotyping of Human DNA Polymorphisms", page A177. *
GENOMICS, Volume 14, issued 1992, ZIEGLE et al., "Application of Automated DNA Sizing Technology for Genotyping Microsatellite Loci", pages 1026-1031. *
GENOMICS, Volume 2, issued 1988, SKOLNICK et al., "Simultaneous Analysis of Multiple Polymorphic Loci Using Amplified Sequence Polymorphisms (ASPs)", pages 273-279. *
NATURE, Volume 359, issued 29 October 1992, WEISSENBACH et al., "A Second-Generation Linkage Map of the Human Genome", pages 794-801. *

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996017082A3 (en) * 1994-11-28 1996-08-08 Du Pont Compound microsatellite primers for the detection of genetic polymorphisms
WO1996017082A2 (en) * 1994-11-28 1996-06-06 E.I. Du Pont De Nemours And Company Compound microsatellite primers for the detection of genetic polymorphisms
US6258538B1 (en) 1995-03-17 2001-07-10 Sequenom, Inc. DNA diagnostics based on mass spectrometry
US5741650A (en) * 1996-01-30 1998-04-21 Exact Laboratories, Inc. Methods for detecting colon cancer from stool samples
US6146828A (en) * 1996-08-14 2000-11-14 Exact Laboratories, Inc. Methods for detecting differences in RNA expression levels and uses therefor
US5670325A (en) * 1996-08-14 1997-09-23 Exact Laboratories, Inc. Method for the detection of clonal populations of transformed cells in a genomically heterogeneous cellular sample
US6300077B1 (en) 1996-08-14 2001-10-09 Exact Sciences Corporation Methods for the detection of nucleic acids
US6303304B1 (en) 1996-08-14 2001-10-16 Exact Laboratories, Inc. Methods for disease diagnosis from stool samples
US6203993B1 (en) 1996-08-14 2001-03-20 Exact Science Corp. Methods for the detection of nucleic acids
US5952178A (en) * 1996-08-14 1999-09-14 Exact Laboratories Methods for disease diagnosis from stool samples
US6020137A (en) * 1996-08-14 2000-02-01 Exact Laboratories, Inc. Methods for the detection of loss of heterozygosity
US6100029A (en) * 1996-08-14 2000-08-08 Exact Laboratories, Inc. Methods for the detection of chromosomal aberrations
WO1998020166A3 (en) * 1996-11-06 1998-10-22 Sequenom Inc Dna diagnostics based on mass spectrometry
EP1164203A3 (en) * 1996-11-06 2002-03-13 Sequenom, Inc. DNA Diagnostics based on mass spectrometry
EP1164203A2 (en) * 1996-11-06 2001-12-19 Sequenom, Inc. DNA Diagnostics based on mass spectrometry
WO1998020166A2 (en) * 1996-11-06 1998-05-14 Sequenom, Inc. Dna diagnostics based on mass spectrometry
US5928870A (en) * 1997-06-16 1999-07-27 Exact Laboratories, Inc. Methods for the detection of loss of heterozygosity
WO1999001576A1 (en) * 1997-07-02 1999-01-14 University Of Bristol Method of determining the genotype of an organism using an allele specific oligonucleotide probe which hybridises to microsatellite flanking sequences
WO1999028466A1 (en) * 1997-12-04 1999-06-10 Board Of Regents, The University Of Texas System Compositions and methods of use of het
EP2210945B1 (en) * 1998-01-14 2013-06-26 Novartis Vaccines and Diagnostics S.r.l. Neisseria meningitidis antigens
US7074769B2 (en) 1998-07-02 2006-07-11 The Trustees Of Columbia University In The City Of New York Oligonucleotide inhibitors of bcl-xL
WO2000065088A2 (en) * 1999-04-26 2000-11-02 Amersham Pharmacia Biotech Ab Primers for identifying typing or classifying nucleic acids
WO2000065088A3 (en) * 1999-04-26 2001-08-09 Amersham Pharm Biotech Ab Primers for identifying typing or classifying nucleic acids
US6280947B1 (en) 1999-08-11 2001-08-28 Exact Sciences Corporation Methods for detecting nucleotide insertion or deletion using primer extension
WO2001029248A3 (en) * 1999-10-19 2001-09-20 Bionex Inc Method for amplifying and detecting nucleic acid
WO2001029248A2 (en) * 1999-10-19 2001-04-26 Bionex, Inc. Method for amplifying and detecting nucleic acid
WO2002022879A2 (en) * 2000-09-15 2002-03-21 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
US7749706B2 (en) 2000-09-15 2010-07-06 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
US7202031B2 (en) 2000-09-15 2007-04-10 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
US7364853B2 (en) 2000-09-15 2008-04-29 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
WO2002022879A3 (en) * 2000-09-15 2003-08-21 Promega Corp Detection of microsatellite instability and its use in diagnosis of tumors
US6844152B1 (en) 2000-09-15 2005-01-18 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
US7902343B2 (en) 2000-09-15 2011-03-08 Promega Corporation Detection of microsatellite instability and its use in diagnosis of tumors
DE10236711A1 (en) * 2002-08-09 2004-02-26 Universität Hohenheim Typing genes that contain polymorphic microsatellite loci, useful for identifying predisposition to disease, by amplification and determining length of amplicons
US8450064B2 (en) 2002-12-31 2013-05-28 Cargill Incorporated Methods and systems for inferring bovine traits
US9206478B2 (en) 2002-12-31 2015-12-08 Branhaven LLC Methods and systems for inferring bovine traits
US8026064B2 (en) 2002-12-31 2011-09-27 Metamorphix, Inc. Compositions, methods and systems for inferring bovine breed
US7511127B2 (en) 2002-12-31 2009-03-31 Cargill, Incorporated Compositions, methods and systems for inferring bovine breed
US7468248B2 (en) 2002-12-31 2008-12-23 Cargill, Incorporated Methods and systems for inferring bovine traits
US8669056B2 (en) 2002-12-31 2014-03-11 Cargill Incorporated Compositions, methods, and systems for inferring bovine breed
US11053547B2 (en) 2002-12-31 2021-07-06 Branhaven LLC Methods and systems for inferring bovine traits
US7709206B2 (en) 2002-12-31 2010-05-04 Metamorphix, Inc. Compositions, methods and systems for inferring bovine breed or trait
US10190167B2 (en) 2002-12-31 2019-01-29 Branhaven LLC Methods and systems for inferring bovine traits
US9982311B2 (en) 2002-12-31 2018-05-29 Branhaven LLC Compositions, methods, and systems for inferring bovine breed
US9109256B2 (en) 2004-10-27 2015-08-18 Esoterix Genetic Laboratories, Llc Method for monitoring disease progression or recurrence
US9777314B2 (en) 2005-04-21 2017-10-03 Esoterix Genetic Laboratories, Llc Analysis of heterogeneous nucleic acid samples
WO2017051439A1 (en) * 2015-09-21 2017-03-30 UNIVERSITA' DEGLI STUDI Dl MESSINA Haplotypes of d7s 6440 microsatellite internal to the hipk2 gene as markers of autoimmune thyroiditis
WO2023039509A1 (en) * 2021-09-10 2023-03-16 Cold Spring Harbor Laboratory Method of measuring microsatellite length variations

Similar Documents

Publication Publication Date Title
WO1995015400A1 (en) Genotyping by simultaneous analysis of multiple microsatellite loci
JP4422897B2 (en) Primer extension method for detecting nucleic acids
EP0960207B1 (en) Multiplex amplification of short tandem repeat loci
US5674686A (en) Allelic ladders for short tandem repeat loci
Ponce et al. High-throughput genetic mapping in Arabidopsis thaliana
US20080286773A1 (en) Method for Typing an Individual Using Short Tandem Repeat (Str) Loci of the Genomic Dna
US20020182609A1 (en) Microsphere based oligonucleotide ligation assays, kits, and methods of use, including high-throughput genotyping
US20140243229A1 (en) Methods and products related to genotyping and dna analysis
US20030096277A1 (en) Allele specific PCR for genotyping
KR20040105744A (en) Rapid analysis of variations in a genome
AU8162498A (en) Methods for the detection of multiple single nucleotide polymorphisms in a single reaction
EP1056889B1 (en) Methods related to genotyping and dna analysis
WO1996030545A9 (en) Mutation detection by differential primer extension of mutant and wildtype target sequences
WO2005093101A1 (en) Nucleic acid sequencing
WO1999061659A1 (en) A novel str marker system for dna fingerprinting
Dearlove High throughput genotyping technologies
WO1996034979A2 (en) Primers and methods for simultaneous amplification of multiple markers for dna fingerprinting
JP2011500062A (en) Detection of blood group genes
US20030113723A1 (en) Method for evaluating microsatellite instability in a tumor sample
CN111041079A (en) Flight mass spectrum genotyping detection method
US20080305470A1 (en) Nucleic Acid Sequencing
US20030077584A1 (en) Methods and compositons for bi-directional polymorphism detection
US8008002B2 (en) Nucleic acid sequencing
CN112680530B (en) Highly-degraded test material detection kit based on 18 multiple insertion deletion genetic markers
CN109312397A (en) The identification of Penta E locus polymorphic human body

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA