WO2013134341A1 - Primer designing pipeline for targeted sequencing - Google Patents

Primer designing pipeline for targeted sequencing Download PDF

Info

Publication number
WO2013134341A1
WO2013134341A1 PCT/US2013/029268 US2013029268W WO2013134341A1 WO 2013134341 A1 WO2013134341 A1 WO 2013134341A1 US 2013029268 W US2013029268 W US 2013029268W WO 2013134341 A1 WO2013134341 A1 WO 2013134341A1
Authority
WO
WIPO (PCT)
Prior art keywords
primer
primers
sequence
module
computerized system
Prior art date
Application number
PCT/US2013/029268
Other languages
French (fr)
Inventor
Adam J. THOMAS
Ramesh BUYYARAPU
Premchand GANDRA
Kanika ARORA
Navin ELANGO
Rajesh PERIANAYAGAM
Fang Lu
Original Assignee
Dow Agrosciences Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dow Agrosciences Llc filed Critical Dow Agrosciences Llc
Publication of WO2013134341A1 publication Critical patent/WO2013134341A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly

Definitions

  • This invention is generally related to the field of molecular biology, and more specifically the field of primer design for targeted/high-throughput sequencing.
  • Primer designing for DNA sequencing is a vital part for modern biological research.
  • Traditional primer designing programs for example publically available
  • Primer3 use a single input DNA sequence to process and design optimal primers under specified parameters including primer length, GC content, melting temperature (Tm) and others.
  • Tm melting temperature
  • these traditional primer designing programs are not suitable for handling large files with long or multiple sequences.
  • the target region can be a small ( ⁇ 1 kilo bases or kb) to large contiguous or repeat-masked regions (even entire genomes provided sufficient hardware and memory to handle the processes).
  • the systems and methods provided herein can be used to design multiple sets of overlapping or non-overlapping primers/amplicons for a target region.
  • the primers designed using the systems and methods provided herein can also be used for multiplex PCR analysis.
  • a computerized system for primer/amplicon design for sequencing comprises:
  • the system further comprises at least one of format output module, and a BLAST verification module. In a further embodiment, the system further comprises at least one of format output module, BLAST verification module, and adaptor verification module.
  • the input device is selected from the group consisting of automated sequencer, sequencing data input device, and sequencing data storage device.
  • the output interface comprises interface for
  • the database described herein contains information selected from the group consisting of genomic sequences, previously generated primers, and sequences for BLAST analysis.
  • the load sequence module processes sequences in FASTA format.
  • the load sequence module uses random file access.
  • the load sequence module does not use sequential file access.
  • the primer design module performs at least one of
  • the primer design module processes primer design in parallel. In another embodiment, the primer design module does not design primers in a non-parallel or sequential manner. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 10 primers per minute. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 100 primers per minute.
  • the primer design module generates primers or processes primer design at a speed between 200 and 500 primers per minute.
  • the primers constitute overlapping amplicons for sequence assembly.
  • the primers constitute overlapping amplicons for sequencing and assembly.
  • the overlapping region of amplicons comprises at least 50 bp or minimal overlap.
  • the overlapping region of amplicons comprises at least 100 bp.
  • the overlapping region of amplicons comprises between 100 bp and 1000 bp.
  • a method for use in a computerized system for primer/amplicon design for sequencing comprises:
  • the method further comprises pre-processing sequences by modifying sequences before primer design.
  • the method further comprises defining target regions/sequences.
  • the method further comprises defining windows for primer design.
  • the computerized system of the method comprises a system described herein.
  • the sequence data is larger than 100 kilo bases (kb).
  • the sequence data is larger than 10 Mega bases (mb).
  • the sequence data is between 10 mb and 1 giga bases (gb).
  • the load sequence module processes sequences in FASTA format.
  • the load sequence module uses random file access.
  • the method provided further comprises at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets.
  • the method provides primers at a speed greater than 10 primers per minute. In a further or alternative embodiment, the method provides primers at a speed greater than 100 primers per minute. In a further or alternative embodiment, the method provides primers at a speed between 200 and 500 primers per minute. In another embodiment, the primers constitute overlapping amplicons for sequence assembly. In a further embodiment, the overlapping region of amplicons comprises at least 50 bp or minimal overlap. In a further embodiment, the overlapping region of amplicons comprises at least 100 bp. In a further embodiment, the overlapping region of amplicons comprises between 100 bp and 1000 bp.
  • the method further comprises verifying the primers using a BLAST verification module. In another embodiment, the method further comprises verifying secondary structure of primers using an adaptor verification module. In another embodiment, the method further comprises simulating the sequencing using a sequencing simulation module. In another embodiment, the method further comprises an output format module for outputting for visualization using WebGBrowse. BRIEF DESCRIPTION OF THE DRAWINGS
  • Figure 1 shows an exemplary flowchart of the Primer Designing Pipeline provided herein.
  • Figure 2 shows an exemplary embodiment for overlapping amplicon design for high-throughput sequencing.
  • Figure 3 shows an exemplary automated process for the systems and methods provide herein.
  • Figure 4 shows an exemplary process for the BLAST verification modules and methods provided herein.
  • Figure 5 shows exemplary FASTA sequences to be loaded into the Primer Designing Pipeline provided herein.
  • Figure 6 shows an exemplary screen shot when the Primer Designing Pipeline provided loads FASTA files for downstream analysis.
  • a publically available "Primer3" program is incorporated by the systems and methods provided to process the overlapping primer designing task in targeted regions while also combining the utility of Batch-Primer3 using a customized and compiled program.
  • the "Primer3" program has been previously described in Steve Rozen and Helen J.
  • the Primer Designing Pipeline provided is programmed using .NET framework and allows multiple "Primer3" processes to be performed in parallel while validating for overlap.
  • the Primer Designing Pipeline described herein provides at least one of the advantages below: (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplex amplifications; (4) designing a tiling of amplicons across a sequence; and (5) picking primers from a reverse-translated amino acid sequence.
  • the systems and methods provided herein enable primer designing especially for "targeted re- sequencing” applications, for example, using high- throughput (HTP) next-generation sequencing (NGS) instruments.
  • the Primer Designing Pipeline provided can be modular to take small to large sequences as input and also allows changes of the amplicon lengths to suit various NGS platforms as requested by users.
  • primers designed using the systems and/or methods provided herein can be used with Fluidigm AccessArray system, a HTP multiplexed amplicon library generation system for efficient and cost-effective generation of sequencing data for further analysis.
  • the systems and methods described herein can provide HTP overlapping primer design for complementing the utility of Fluidigm AccessArray system for marker development, gene confirmation, transgene region validation for regulatory affairs, QTL mining and genotyping-by-sequencing.
  • FIG. 1 An exemplary primer design workflow is illustrated in Figure 1.
  • the users can select sequence files for which they want to design primers. Files are typically not loaded into memory but instead analyzed for random access.
  • the FASTA file format is a common platform for displaying biological sequences.
  • the target sequence(s) can typically be provided in a FASTA format.
  • the FASTA file can contain one or multiple sequences.
  • Each sequence is always preceded by a header line which is prefixed with ">" followed by the ID, description, and/or other pertinent information about the sequence.
  • the sequence information is then listed on subsequent lines and usually wraps (carriage return/line feed) every 70 or 80 characters depending on the program that generated the file.
  • wraps carriage return/line feed
  • DNA sequences in FASTA format are shown in Figure 5.
  • FASTA files can become quite large when dealing with long sequences, a large number sequences, or a combination of both.
  • most FASTA files which represent the entire genome of complex eukaryotes may easily exceed 2 gigabytes.
  • This size issue poses a problem for the primer design process because normal file handling methods involve starting at the beginning and reading the file into memory until the data of interest is found. For example, a sequential file access for a file larger than 2 gigabytes may take longer than 30 seconds to sequentially read to the spot in the file where the data of interest is. Consequently, this process has to take place for each set of primers which need to be designed for traditional primer designing programs.
  • a Load Sequence Module which runs the sequence loading processes in parallel. Random file access is combined with sequential file access to speed up the process. Each character in the file resides at a specific addressable location on a disk. In addition to starting at the beginning of the file and reading each character in order (sequential file access), the Load Sequence Module provides a means to access any location in the file at random as long as the address is known. Random file access can speed processes up considerably because the process does not have to read all the characters/data that came before the data in the file that it's interesting in extracting. Instead of being obligated to perform sequential file access faster, the Load Sequence Module provided is able to determine the starting address in the file at which the data of interest is located.
  • the Primer Designing Pipeline provided initially reads the entire file sequentially through once, analyzes it to determine how it' s formatted, and stores that analysis in a SQL Server Compact database. Then when the Primer Designing Pipeline provided is designing the primers and needs to extract a sequence from one of the files, the Primer Designing Pipeline provided uses the analysis results stored in the SQL Server Compact database to calculate the location within the file (or address) of the data of interest. Thus, that address is used to extract/read the data using random file access.
  • each block has the same line length, and it stores information about how many lines, characters per line, and file start and stop position for each block. For example as shown in Figure 5, the first block would start at file position 0, contain 1 line with 13 characters per line (a newline and carriage return exist at the end of the line), and the block would end at position 12. The block after that would start at file position 13, contain 2 lines with 72 characters long, and end at position 156. Additionally, since Load Sequence Module assumes that each header block is unique, each header block can be loaded into a hash table (dictionary) in memory for quick access when working with the file. The blocks following the header which contain the sequence are then linked to the header block in the hash table.
  • a hash table dictionary
  • the Primer Designing Pipeline provided would look up the header up in the hash table first. Then iterates through each block sequentially to see if the sequence starts in that block. If the file format follows the normal FASTA format, then the Primer Designing Pipeline provided should at most only have to check two blocks because there should be only be two blocks for each sequence in the file. Once the block containing the starting position is determined, the position within that block can be calculated because each character takes up one position/byte in the file.
  • each block is also analyzed for what type of newline characters as well as other whitespace characters occur at the end each line in the block. For example, if the following target "SEQUENCE_2 I Corn Sequence Gene A45: 136,8" (header : start, length) is needed to be extracted, then the Primer Designing Pipeline provided would determine that it fell in the first block following the header. For example, the block starts at file position 222, contains 3 lines with 72 characters per line, and there are 2 ending whitespace characters in each line.
  • Figure 6 shows an exemplary screenshot illustrating the part of the Load Sequence Module used to load the example FASTA file as shown in Figure 5, where the first block of each section is the header block.
  • Targets step segments for which primers to be designed are define by the user(s) (for example all sequences or only masked regions greater than a specified length) or loaded from a file such as a GFF file.
  • the GFF format is useful as input files for programs like WebGBrowse, which is previously disclosed in Ram Podicheti, Rajesh Gollapudi, and Qunfeng Dong. (2009) "WebGBrowse - a web server for GBrowse.” Bioinformatics, 25(12): 1550-1551, the content of which is incorporated by reference in its entirety.
  • Step 1 is the Load / Define Window step, where the area in which primers to be placed is defined (for example 100 bps up and downstream from the target) or loaded from a file.
  • the next step is Enter Primer Parameters, where parameters including primer length, melting temperature, GC content, 3' stability, and/or estimated secondary structure.
  • Primer3 is used as the primary design engine and use(s) can enter parameters as required by the Primer3 program.
  • a Primer Design Module having two major functions: (1) selecting and processing the target and (2) saving the results.
  • the algorithm of the Primer Design Module may adjust settings internally. For example with HTP primer design, the Primer Design Module can start at the beginning of the target area and design overlapping primer sets until it reaches the end. In one embodiment, additional evaluation may be needed such as BLASTing or secondary structure prediction before moving to the next set of primers.
  • Basic Local Alignment Search Tool (BLAST) is a commonly used sequence alignment tool. See Altschul et al. (1990) /. Mol. Biol. 215: 403- 410, the content of which is hereby incorporated by reference in its entirety.
  • the Primer Designing Pipeline provided automatically generates and adds specified adaptor sequences to the designed primers.
  • the user(s) When performing targeted genome sequencing where only specific sub- sequences within a genome are desired, the user(s) must manually design primers to create overlapping amplicons where the targeted region is larger than the maximum read length of the sequencing equipment. To date, most high throughput sequencing machines can only sequentially read a limited number of base pairs in one run. Thus, the source genetic material needs to be chopped up into segments that are less than the maximum read length. In order to assembly these segments back into one sequence, the segments need to have some overlap sequence (usually at least 20 base pairs).
  • Figure 2 shows an exemplary embodiment of high-throughput sequencing where a sequencer is used to sequence the target region.
  • the target region is 5,000 base pairs long and the sequencer can only read segments of DNA up to 700 base pairs - i.e., the 5,000 bp sequence needs to be "chopped" in 700 bp (base pairs) or less segments.
  • the source sequence isn't chopped but the 5,000 bp sequence is amplified into 700 bp segments for sequencing.
  • shorter overlapping copies are made instead.
  • the reads are stored in a data file and an assembly program assembles them back into one continuous sequence.
  • primers In order to make the shorter copies or amplicons during the amplification stage of the sequencing, primers, short sequences that mark the beginning and end of an amplicon, have to be designed and created.
  • Traditional tools for designing primers can only design one set of primers (or one amplicon) at a time. These overlapping amplicons have to be designed serially (one after the other) and usually in a fairly manual process.
  • the automated systems and methods for designing primers for overlapping amplicons.
  • the automated systems and methods start from a traditional primer design program/software, for example Primer3, which can be downloaded locally and run from a command line interface from either a Linux or Windows machine.
  • the automated systems and methods provided are generated using Perl script (parallelized or non-parallelized).
  • the automated systems and methods provided are generated using Microsoft .NET 4.0. , which contains functionality for parallelizing processes.
  • the systems and methods provided using Microsoft .NET 4.0 can design large batches of primers in a few minutes as compared to hours using non-parallelized Perl script or days with the traditional approaches.
  • the automated systems and methods provided use a parallelized approach.
  • the automated systems and methods provided does not use a non-parallelized approach.
  • Figure 3 shows an exemplary system provided using Microsoft .NET 4.0.
  • the output from the .NET program is a tab delimited test file containing the primer sequences and information about the quality of the primers.
  • the Primer Designing Pipeline provides also saves copies of the input sent to Primer3 and the output Primer3 generates. This can be useful in troubleshooting any issues that may arise or manually re-running one portion of the process if necessary.
  • a Format Output Module which reads the output from the automation program and generate a general feature file (GFF) formatted file that can be used by other programs including GBrowse to visually overlay the primers and amplicons on the source sequence.
  • GFF general feature file
  • the raw results from the Primer Designing Pipeline are compiled and formatted into a tab delimited format as well as optionally a GFF format for feeding into GBrowse for visualization. For one example, 14 pairs of
  • primers/amplicons are created within four minutes using the systems and methods provided. These 14 pairs of primers form overlapping amplicons over one broad targeted sequence. For another example, 9 pairs of primers/amplicons (i.e., 18 primers) are created within two minutes using the systems and methods provided. These 9 pairs of primers from overlapping amplicons over two separated targeted sequences, where the two targeted sequences are still within the same genome (i.e., skipping one region in between for sequencing).
  • the systems and/or methods provided also comprise a BLAST Verification Module.
  • An exemplary BLAST Verification Module is illustrated in Figure 4.
  • the BLAST Verification Module verifies target redundancy for amplicon primer design.
  • the BLAST Verification Module will take the outputs and BLAST them against the targeted genome or sequence library available in database.
  • the BLAST Verification Module allows a primer set (pair) or amplicon to be unique based on BLAST analysis.
  • the BLAST Verification Module provides BLAST analysis in parallel, thus saving time as compared to sequential analysis.
  • primers from the first set typically have to be BLASTed then the next one, and then the results from both BLAST queries must be compared to see if both primers land within pre-determined number of base pairs from each other (usually 1000) and are on opposite strands of the DNA pointing the correct direction for amplification to occur. If non-unique primers are found, then those specific sequences need to be re-run through the primer design process with different parameters.
  • the BLAST Verification Module specifies a BLAST database to use before running the primer design process. As the primer design got back individual results, the Primer Designing Pipeline disclosed can automatically check them for uniqueness and try to re-run that sequence if necessary.
  • a copy of the BLAST database is created locally on the user's workstation. In other embodiments, the BLAST databases is located on a serve and accessed remotely by the Primer Designing Pipeline.
  • the systems and/or methods provided also comprise an Adaptor Verification Module.
  • the Primer Designing Pipeline adds adapter/tag sequences to the primers in order for the sequencing machine to be able to sequence the amplicon. This adapter/tag sequence is often the same for all primers.
  • the secondary structure of a designed primer may change significantly after adding such adaptor/tag sequence.
  • RNAstructure for both DNA and RNA
  • RNA has been developed by the University of Rochester that can be used to predict the most likely binding structure a sequence or pair sequences can make.
  • the Adaptor Verification Module of the Primer Designing Pipeline can automate the RNAstructure program through the command line and enable prediction whether after adding an adapter sequence, a primer set is still a scientifically good choice.
  • the Adaptor Verification Module comprises an internal scoring system for classifying primers based on the predicted secondary structure.
  • Other programs can be used for the Adaptor Verification Module provided including Mfold (as disclosed in M. Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-3415, 2003) and UNAFold (as disclosed in N. R. Markham & M. Zuker. UNAFold: Software for Nucleic Acid Folding and Hybridization. In Data, Sequence Analysis, and Evolution, J. Keith, ed., Bioinformatics: Volume 2, Chapter 1, pp 3-31, Humana Press Inc., 2008), the content of both are hereby incorporated by reference in their entireties.
  • simulation of amplification and/or sequencing can be performed.
  • Several programs have been disclosed to simulate the entire sequencing process, for example in silico PCR amplification for amplification, MetaSim for simulating sequencing, and CAP3 for assembly.
  • the Primer Designing Pipeline can integrate the ability to simulate the entire sequencing process using a series of simulators.

Abstract

Provided are systems and methods for customized primer/amplicon designing programs which enable users to design overlapping primers/amplicons in a target region or multiple targeted regions. The target region can be a small (<1 kb) to large contiguous or repeat-masked regions (even entire genomes provided sufficient hardware and memory to handle the processes). The systems and methods provided herein can be used to design multiple sets of overlapping or non-overlapping primers/amplicons for a target region.

Description

PRIMER DESIGNING PIPELINE FOR TARGETED SEQUENCING
FIELD OF THE INVENTION
[0001] This invention is generally related to the field of molecular biology, and more specifically the field of primer design for targeted/high-throughput sequencing.
BACKGROUND OF THE INVENTION
[0002] Primer designing for DNA sequencing is a vital part for modern biological research. Traditional primer designing programs, for example publically available
"Primer3," use a single input DNA sequence to process and design optimal primers under specified parameters including primer length, GC content, melting temperature (Tm) and others. However, there is no dynamic primer designing program which can design overlapping primers simultaneously to facilitate coverage across an entire region or multiple targeted regions. In addition, these traditional primer designing programs are not suitable for handling large files with long or multiple sequences. Thus, there remains a need for an efficient primer designing pipeline for targeted sequence spanning a broad region or multiple targeted regions.
SUMMARY OF THE INVENTION
[0003] Provided are systems and methods for customized primer/amplicon designing programs which enable users to design overlapping primers/amplicons in a target region or multiple targeted regions. The target region can be a small (<1 kilo bases or kb) to large contiguous or repeat-masked regions (even entire genomes provided sufficient hardware and memory to handle the processes). The systems and methods provided herein can be used to design multiple sets of overlapping or non-overlapping primers/amplicons for a target region. In some embodiments, the primers designed using the systems and methods provided herein can also be used for multiplex PCR analysis.
[0004] In one aspect, provided is a computerized system for primer/amplicon design for sequencing. The system comprises:
(a) an input device and an output device/interface;
(b) an analysis system interface coupled to memory of a computer;
(c) an operating system optionally comprising a database;
(d) a load sequence module for loading nucleic acid sequences; and
(e) a primer design module for design primer pairs.
[0005] In one embodiment, the system further comprises at least one of format output module, and a BLAST verification module. In a further embodiment, the system further comprises at least one of format output module, BLAST verification module, and adaptor verification module. In another embodiment, the input device is selected from the group consisting of automated sequencer, sequencing data input device, and sequencing data storage device. In another embodiment, the output interface comprises interface for
WebGBrowse or GenomeBrowser.
[0006] In one embodiment, the database described herein contains information selected from the group consisting of genomic sequences, previously generated primers, and sequences for BLAST analysis. In another embodiment, the load sequence module processes sequences in FASTA format. In another embodiment, the load sequence module uses random file access. In another embodiment, the load sequence module does not use sequential file access.
[0007] In one embodiment, the primer design module performs at least one of
(1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets. In another embodiment, the primer design module processes primer design in parallel. In another embodiment, the primer design module does not design primers in a non-parallel or sequential manner. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 10 primers per minute. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 100 primers per minute. In a further or alternative embodiment, the primer design module generates primers or processes primer design at a speed between 200 and 500 primers per minute. In another embodiment, the primers constitute overlapping amplicons for sequence assembly. In another embodiment, the primers constitute overlapping amplicons for sequencing and assembly. In a further embodiment, the overlapping region of amplicons comprises at least 50 bp or minimal overlap. In a further embodiment, the overlapping region of amplicons comprises at least 100 bp. In a further embodiment, the overlapping region of amplicons comprises between 100 bp and 1000 bp.
[0008] In another aspect, provided is a method for use in a computerized system for primer/amplicon design for sequencing. The method comprises:
(a) upload sequence data using a load sequence module;
(b) designing multiple primers in parallel using a primer design module; and (c) outputting primer design through an output interface.
[0009] In one embodiment, the method further comprises pre-processing sequences by modifying sequences before primer design. In a further or alternative embodiment, the method further comprises defining target regions/sequences. In a further or alternative embodiment, the method further comprises defining windows for primer design.
[0010] In one embodiment, the computerized system of the method comprises a system described herein. In another embodiment, the sequence data is larger than 100 kilo bases (kb). In a further or alternative embodiment, the sequence data is larger than 10 Mega bases (mb). In a further or alternative embodiment, the sequence data is between 10 mb and 1 giga bases (gb).
[0011] In one embodiment, the load sequence module processes sequences in FASTA format. In another embodiment, the load sequence module uses random file access. In another embodiment, the method provided further comprises at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets.
[0012] In one embodiment, the method provides primers at a speed greater than 10 primers per minute. In a further or alternative embodiment, the method provides primers at a speed greater than 100 primers per minute. In a further or alternative embodiment, the method provides primers at a speed between 200 and 500 primers per minute. In another embodiment, the primers constitute overlapping amplicons for sequence assembly. In a further embodiment, the overlapping region of amplicons comprises at least 50 bp or minimal overlap. In a further embodiment, the overlapping region of amplicons comprises at least 100 bp. In a further embodiment, the overlapping region of amplicons comprises between 100 bp and 1000 bp.
[0013] In one embodiment, the method further comprises verifying the primers using a BLAST verification module. In another embodiment, the method further comprises verifying secondary structure of primers using an adaptor verification module. In another embodiment, the method further comprises simulating the sequencing using a sequencing simulation module. In another embodiment, the method further comprises an output format module for outputting for visualization using WebGBrowse. BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Figure 1 shows an exemplary flowchart of the Primer Designing Pipeline provided herein.
[0015] Figure 2 shows an exemplary embodiment for overlapping amplicon design for high-throughput sequencing.
[0016] Figure 3 shows an exemplary automated process for the systems and methods provide herein.
[0017] Figure 4 shows an exemplary process for the BLAST verification modules and methods provided herein.
[0018] Figure 5 shows exemplary FASTA sequences to be loaded into the Primer Designing Pipeline provided herein.
[0019] Figure 6 shows an exemplary screen shot when the Primer Designing Pipeline provided loads FASTA files for downstream analysis.
DETAILED DESCRIPTION OF THE INVENTION
[0020] Various algorithms have been descried previous and can be incorporated in the systems and methods provided to design multiple pairs of primers simultaneously. For example, primer design methods have been disclosed in U.S. Patent Nos. 5,512,458,
5,556,749, 6,928,368, 7,565,248, 7,698,069, and 8,014,955; patent applications
US2003/0108919, US2003/0215834, US2003/0215834, US2004/0012633, US2005/0032074, US2006/0281105, US2007/0032963, US2010/0070452, US2010/0184067, JP2003079366, JP2005301532, JP2009268360, JP2011004621, JP2011062085, and EP1136932;
international patent applications WO2009/063270, WO2009/152336, WO2010/113789, and WO 2011/053241, the content of which are incorporated by reference in their entireties. In some embodiments, a publically available "Primer3" program is incorporated by the systems and methods provided to process the overlapping primer designing task in targeted regions while also combining the utility of Batch-Primer3 using a customized and compiled program. The "Primer3" program has been previously described in Steve Rozen and Helen J.
Skaletsky. (2000) "Primer3 on the WWW for general users and for biologist programmers." In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in
Molecular Biology. Humana Press, Totowa, NJ, pp 365-386, the content of which is hereby incorporated by reference in its entirety. In some embodiments, the Primer Designing Pipeline provided is programmed using .NET framework and allows multiple "Primer3" processes to be performed in parallel while validating for overlap. In some embodiments, the Primer Designing Pipeline described herein provides at least one of the advantages below: (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplex amplifications; (4) designing a tiling of amplicons across a sequence; and (5) picking primers from a reverse-translated amino acid sequence.
[0021] In some embodiments, the systems and methods provided herein enable primer designing especially for "targeted re- sequencing" applications, for example, using high- throughput (HTP) next-generation sequencing (NGS) instruments. The Primer Designing Pipeline provided can be modular to take small to large sequences as input and also allows changes of the amplicon lengths to suit various NGS platforms as requested by users.
[0022] In some embodiments, primers designed using the systems and/or methods provided herein can be used with Fluidigm AccessArray system, a HTP multiplexed amplicon library generation system for efficient and cost-effective generation of sequencing data for further analysis. For targeted re-sequencing projects, the systems and methods described herein can provide HTP overlapping primer design for complementing the utility of Fluidigm AccessArray system for marker development, gene confirmation, transgene region validation for regulatory affairs, QTL mining and genotyping-by-sequencing.
[0023] An exemplary primer design workflow is illustrated in Figure 1. In the Load Sequence step, the users can select sequence files for which they want to design primers. Files are typically not loaded into memory but instead analyzed for random access. First introduced by Bill Pearson and David Lipman in 1988 for representing either nucleotide or amino acid sequences (see Pearson and Lipman, "Improved tolls for biological sequence comparison" (1988) Proc. Natl. Acad. Sci. USA 85:2444-2448; the content of which is hereby incorporated by reference in its entirety), the FASTA file format is a common platform for displaying biological sequences. When designing primers, the target sequence(s) can typically be provided in a FASTA format. The FASTA file can contain one or multiple sequences. Each sequence is always preceded by a header line which is prefixed with ">" followed by the ID, description, and/or other pertinent information about the sequence. The sequence information is then listed on subsequent lines and usually wraps (carriage return/line feed) every 70 or 80 characters depending on the program that generated the file. Typically DNA sequences in FASTA format are shown in Figure 5.
[0024] Especially for re-sequence projects and/or high-throughput sequencing, FASTA files can become quite large when dealing with long sequences, a large number sequences, or a combination of both. For example most FASTA files which represent the entire genome of complex eukaryotes may easily exceed 2 gigabytes. This size issue poses a problem for the primer design process because normal file handling methods involve starting at the beginning and reading the file into memory until the data of interest is found. For example, a sequential file access for a file larger than 2 gigabytes may take longer than 30 seconds to sequentially read to the spot in the file where the data of interest is. Consequently, this process has to take place for each set of primers which need to be designed for traditional primer designing programs.
[0025] Accordingly, provided is a Load Sequence Module which runs the sequence loading processes in parallel. Random file access is combined with sequential file access to speed up the process. Each character in the file resides at a specific addressable location on a disk. In addition to starting at the beginning of the file and reading each character in order (sequential file access), the Load Sequence Module provides a means to access any location in the file at random as long as the address is known. Random file access can speed processes up considerably because the process does not have to read all the characters/data that came before the data in the file that it's interesting in extracting. Instead of being obligated to perform sequential file access faster, the Load Sequence Module provided is able to determine the starting address in the file at which the data of interest is located.
[0026] In some embodiments, the Primer Designing Pipeline provided initially reads the entire file sequentially through once, analyzes it to determine how it' s formatted, and stores that analysis in a SQL Server Compact database. Then when the Primer Designing Pipeline provided is designing the primers and needs to extract a sequence from one of the files, the Primer Designing Pipeline provided uses the analysis results stored in the SQL Server Compact database to calculate the location within the file (or address) of the data of interest. Thus, that address is used to extract/read the data using random file access.
[0027] In a further embodiment, when the program initially reads through a file sequentially it breaks the file into blocks where each block has the same line length, and it stores information about how many lines, characters per line, and file start and stop position for each block. For example as shown in Figure 5, the first block would start at file position 0, contain 1 line with 13 characters per line (a newline and carriage return exist at the end of the line), and the block would end at position 12. The block after that would start at file position 13, contain 2 lines with 72 characters long, and end at position 156. Additionally, since Load Sequence Module assumes that each header block is unique, each header block can be loaded into a hash table (dictionary) in memory for quick access when working with the file. The blocks following the header which contain the sequence are then linked to the header block in the hash table.
[0028] Later when the Primer Designing Pipeline provided needs to access a sequence or sub-sequence within the file, the Primer Designing Pipeline provided would look up the header up in the hash table first. Then it iterates through each block sequentially to see if the sequence starts in that block. If the file format follows the normal FASTA format, then the Primer Designing Pipeline provided should at most only have to check two blocks because there should be only be two blocks for each sequence in the file. Once the block containing the starting position is determined, the position within that block can be calculated because each character takes up one position/byte in the file.
[0029] The challenging part of calculating the starting position is taking into account the newline characters that occur at the end of each line. In some embodiments, the newline character can also be preceded by a carriage return character. In some embodiments, each block is also analyzed for what type of newline characters as well as other whitespace characters occur at the end each line in the block. For example, if the following target "SEQUENCE_2 I Corn Sequence Gene A45: 136,8" (header : start, length) is needed to be extracted, then the Primer Designing Pipeline provided would determine that it fell in the first block following the header. For example, the block starts at file position 222, contains 3 lines with 72 characters per line, and there are 2 ending whitespace characters in each line. Using these statistics the Primer Designing Pipeline provided can divide the sequence position by the number of sequence characters per line: 136/70 = 1 line plus remainder 66. It then takes the dividend and multiplies it by the whitespace character count: 1 x 2 = 2. And then it adds that to the remainder: 66 + 2 = 68. That result is then added to the starting file position for the block to determine the actually starting file position for the sub-sequence. 222 + 68 = 290. The consequence is that the file position is determined random file access which can be used to read the sub-sequence more quickly than with sequentially file access.
[0030] Figure 6 shows an exemplary screenshot illustrating the part of the Load Sequence Module used to load the example FASTA file as shown in Figure 5, where the first block of each section is the header block.
[0031] Back to Figure 1, in the Pre-Process Sequences step, modifications may be added to the original sequences including masking and/or converting bases for methylation. In the Load / Define Targets step, segments for which primers to be designed are define by the user(s) (for example all sequences or only masked regions greater than a specified length) or loaded from a file such as a GFF file. The GFF format is useful as input files for programs like WebGBrowse, which is previously disclosed in Ram Podicheti, Rajesh Gollapudi, and Qunfeng Dong. (2009) "WebGBrowse - a web server for GBrowse." Bioinformatics, 25(12): 1550-1551, the content of which is incorporated by reference in its entirety.
[0032] Next in Figure 1 is the Load / Define Window step, where the area in which primers to be placed is defined (for example 100 bps up and downstream from the target) or loaded from a file. The next step is Enter Primer Parameters, where parameters including primer length, melting temperature, GC content, 3' stability, and/or estimated secondary structure. In some embodiment, Primer3 is used as the primary design engine and use(s) can enter parameters as required by the Primer3 program.
[0033] Provided is a Primer Design Module having two major functions: (1) selecting and processing the target and (2) saving the results. In some embodiments, when processing a target, the algorithm of the Primer Design Module may adjust settings internally. For example with HTP primer design, the Primer Design Module can start at the beginning of the target area and design overlapping primer sets until it reaches the end. In one embodiment, additional evaluation may be needed such as BLASTing or secondary structure prediction before moving to the next set of primers. Basic Local Alignment Search Tool (BLAST) is a commonly used sequence alignment tool. See Altschul et al. (1990) /. Mol. Biol. 215: 403- 410, the content of which is hereby incorporated by reference in its entirety. In some embodiments, the Primer Designing Pipeline provided automatically generates and adds specified adaptor sequences to the designed primers.
[0034] When performing targeted genome sequencing where only specific sub- sequences within a genome are desired, the user(s) must manually design primers to create overlapping amplicons where the targeted region is larger than the maximum read length of the sequencing equipment. To date, most high throughput sequencing machines can only sequentially read a limited number of base pairs in one run. Thus, the source genetic material needs to be chopped up into segments that are less than the maximum read length. In order to assembly these segments back into one sequence, the segments need to have some overlap sequence (usually at least 20 base pairs).
[0035] Figure 2 shows an exemplary embodiment of high-throughput sequencing where a sequencer is used to sequence the target region. The target region is 5,000 base pairs long and the sequencer can only read segments of DNA up to 700 base pairs - i.e., the 5,000 bp sequence needs to be "chopped" in 700 bp (base pairs) or less segments. In reality, the source sequence isn't chopped but the 5,000 bp sequence is amplified into 700 bp segments for sequencing. In the amplification step of the sequencing process where the multiple copies of the sequence are made to be sequenced, shorter overlapping copies are made instead. After the sequencing machine finishes reading/sequencing all the short copies, the reads are stored in a data file and an assembly program assembles them back into one continuous sequence.
[0036] In order to make the shorter copies or amplicons during the amplification stage of the sequencing, primers, short sequences that mark the beginning and end of an amplicon, have to be designed and created. Traditional tools for designing primers can only design one set of primers (or one amplicon) at a time. These overlapping amplicons have to be designed serially (one after the other) and usually in a fairly manual process. First the user designs primers for the first amplicon using some software, then sees where that amplicon ends and then designs primers for the next amplicon making sure the beginning of that amplicon overlaps the end of the previous. This is very tedious and requires considerable amounts of copying and pasting and calculating overlaps by hand. Additionally for targeted sequencing, there can be more than one target so this design process has to be performed for each target.
[0037] Provided is an automated systems and methods for designing primers for overlapping amplicons. In some embodiments, the automated systems and methods provided start from a traditional primer design program/software, for example Primer3, which can be downloaded locally and run from a command line interface from either a Linux or Windows machine.
[0038] In some embodiments, the automated systems and methods provided are generated using Perl script (parallelized or non-parallelized). In other embodiment, the automated systems and methods provided are generated using Microsoft .NET 4.0. , which contains functionality for parallelizing processes. In some embodiments, the systems and methods provided using Microsoft .NET 4.0 can design large batches of primers in a few minutes as compared to hours using non-parallelized Perl script or days with the traditional approaches. In some embodiments, the automated systems and methods provided use a parallelized approach. In some embodiments, the automated systems and methods provided does not use a non-parallelized approach.
[0039] Figure 3 shows an exemplary system provided using Microsoft .NET 4.0. The output from the .NET program is a tab delimited test file containing the primer sequences and information about the quality of the primers. In some embodiments, in addition to the final file containing the aggregated results, the Primer Designing Pipeline provides also saves copies of the input sent to Primer3 and the output Primer3 generates. This can be useful in troubleshooting any issues that may arise or manually re-running one portion of the process if necessary.
[0040] Back to Figure 1, provided is a Format Output Module which reads the output from the automation program and generate a general feature file (GFF) formatted file that can be used by other programs including GBrowse to visually overlay the primers and amplicons on the source sequence. In some embodiments, the raw results from the Primer Designing Pipeline are compiled and formatted into a tab delimited format as well as optionally a GFF format for feeding into GBrowse for visualization. For one example, 14 pairs of
primers/amplicons (i.e., 28 primers) are created within four minutes using the systems and methods provided. These 14 pairs of primers form overlapping amplicons over one broad targeted sequence. For another example, 9 pairs of primers/amplicons (i.e., 18 primers) are created within two minutes using the systems and methods provided. These 9 pairs of primers from overlapping amplicons over two separated targeted sequences, where the two targeted sequences are still within the same genome (i.e., skipping one region in between for sequencing).
[0041] In further embodiments, the systems and/or methods provided also comprise a BLAST Verification Module. An exemplary BLAST Verification Module is illustrated in Figure 4. In one embodiment, the BLAST Verification Module verifies target redundancy for amplicon primer design. In another embodiment, after the previous steps of the primer design process, the BLAST Verification Module will take the outputs and BLAST them against the targeted genome or sequence library available in database. Typically the BLAST Verification Module allows a primer set (pair) or amplicon to be unique based on BLAST analysis. In some embodiments, the BLAST Verification Module provides BLAST analysis in parallel, thus saving time as compared to sequential analysis. Typically one primer from the first set has to be BLASTed then the next one, and then the results from both BLAST queries must be compared to see if both primers land within pre-determined number of base pairs from each other (usually 1000) and are on opposite strands of the DNA pointing the correct direction for amplification to occur. If non-unique primers are found, then those specific sequences need to be re-run through the primer design process with different parameters.
[0042] In some embodiment, the BLAST Verification Module specifies a BLAST database to use before running the primer design process. As the primer design got back individual results, the Primer Designing Pipeline disclosed can automatically check them for uniqueness and try to re-run that sequence if necessary. In some embodiments, a copy of the BLAST database is created locally on the user's workstation. In other embodiments, the BLAST databases is located on a serve and accessed remotely by the Primer Designing Pipeline.
[0043] In further embodiments, the systems and/or methods provided also comprise an Adaptor Verification Module. In some embodiments, the Primer Designing Pipeline adds adapter/tag sequences to the primers in order for the sequencing machine to be able to sequence the amplicon. This adapter/tag sequence is often the same for all primers. The secondary structure of a designed primer may change significantly after adding such adaptor/tag sequence. Currently there are a few programs capable of analyzing secondary structures for nucleic acids. For example, RNAstructure (for both DNA and RNA) has been developed by the University of Rochester that can be used to predict the most likely binding structure a sequence or pair sequences can make.
[0044] In some embodiments, the Adaptor Verification Module of the Primer Designing Pipeline can automate the RNAstructure program through the command line and enable prediction whether after adding an adapter sequence, a primer set is still a scientifically good choice. In some embodiment, the Adaptor Verification Module comprises an internal scoring system for classifying primers based on the predicted secondary structure. Other programs can be used for the Adaptor Verification Module provided including Mfold (as disclosed in M. Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-3415, 2003) and UNAFold (as disclosed in N. R. Markham & M. Zuker. UNAFold: Software for Nucleic Acid Folding and Hybridization. In Data, Sequence Analysis, and Evolution, J. Keith, ed., Bioinformatics: Volume 2, Chapter 1, pp 3-31, Humana Press Inc., 2008), the content of both are hereby incorporated by reference in their entireties.
[0045] In addition, simulation of amplification and/or sequencing can be performed. Several programs have been disclosed to simulate the entire sequencing process, for example in silico PCR amplification for amplification, MetaSim for simulating sequencing, and CAP3 for assembly. The Primer Designing Pipeline can integrate the ability to simulate the entire sequencing process using a series of simulators.

Claims

A computerized system for primer/amplicon design for sequencing, comprising,
(a) an input device and an output device/interface;
(b) an analysis system interface coupled to memory of a computer;
(c) an operating system comprising a database;
(d) a load sequence module for loading nucleic acid sequences; and
(e) a primer design module for design primer pairs.
The computerized system of claim 1, further comprising at least one of format output module, BLAST verification module, and adaptor verification module.
The computerized system of claim 1, wherein the input device is selected from the group consisting of automated sequencer, sequencing data input device, and sequencing data storage device.
The computerized system of claim 1, wherein the output interface comprises interface for WebGBrowse or GenomeBrowser.
The computerized system of claim 1, wherein the database contains information selected from the group consisting of genomic sequences, previously generated primers, and sequences for BLAST analysis.
The computerized system of claim 1, wherein the load sequence module processes sequences in FASTA format.
The computerized system of claim 1, wherein the load sequence module uses random file access.
The computerized system of claim 1, wherein the primer design module performs at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse- translated amino acid sequence; and (6) selection from multiple primer sets.
9. The computerized system of claim 1, wherein the primer design module processes primer design in parallel.
10. The computerized system of claim 1, wherein the primer design module generates primers at a speed greater than 10 primers per minute.
11. The computerized system of claim 10, wherein the primers constitute overlapping amplicons for sequencing and assembly.
12. A method for use in a computerized system for primer/amplicon design for
sequencing, comprising,
(a) upload sequence data using a load sequence module;
(b) designing multiple primers in parallel using a primer design module; and
(c) outputting primer design through an output interface.
13. The method of claim 12, further comprising pre-processing sequences by modifying sequences before primer design.
14. The method of claim 12, further comprising defining target sequences.
15. The method of claim 12, further comprising defining windows for primer design.
16. The method of claim 12, wherein the computerized system comprises a system of claim 1.
17. The method of claim 12, wherein the sequence data is larger than 100 kb.
18. The method of claim 12, wherein the load sequence module processes sequences in FASTA format.
19. The method of claim 12, wherein the load sequence module uses random file access.
20. The method of claim 12, further comprising at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets.
21. The method of claim 12, wherein the method provides primers at a speed greater than 10 primers per minute.
22. The method of claim 21, wherein the primers constitute overlapping amplicons for sequence assembly.
PCT/US2013/029268 2012-03-07 2013-03-06 Primer designing pipeline for targeted sequencing WO2013134341A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261607630P 2012-03-07 2012-03-07
US61/607,630 2012-03-07

Publications (1)

Publication Number Publication Date
WO2013134341A1 true WO2013134341A1 (en) 2013-09-12

Family

ID=48045026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/029268 WO2013134341A1 (en) 2012-03-07 2013-03-06 Primer designing pipeline for targeted sequencing

Country Status (1)

Country Link
WO (1) WO2013134341A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105087771A (en) * 2014-05-06 2015-11-25 金唯智生物科技有限责任公司 Methods and kits for identifying microorganisms in a sample
WO2017218938A1 (en) 2016-06-16 2017-12-21 Life Technologies Corporation Novel compositions, methods and kits for microorganism detection
WO2019094973A1 (en) 2017-11-13 2019-05-16 Life Technologies Corporation Compositions, methods and kits for urinary tract microorganism detection
CN110491448A (en) * 2019-07-15 2019-11-22 广州奇辉生物科技有限公司 A kind of method, system, platform and storage medium handling PCR primer
US10793897B2 (en) 2017-02-08 2020-10-06 Microsoft Technology Licensing, Llc Primer and payload design for retrieval of stored polynucleotides
US11783918B2 (en) 2016-11-30 2023-10-10 Microsoft Technology Licensing, Llc DNA random access storage system via ligation

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5512458A (en) 1994-02-25 1996-04-30 W. R. Grace & Co.-Conn. Method of using mobile priming sites for DNA sequencing
US5556749A (en) 1992-11-12 1996-09-17 Hitachi Chemical Research Center, Inc. Oligoprobe designstation: a computerized method for designing optimal DNA probes
EP1136932A1 (en) 2000-03-20 2001-09-26 Hitachi, Ltd. Primer design system
JP2003079366A (en) 2001-09-11 2003-03-18 Hitachi Ltd Information processing system for assisting primer walking
US20030108919A1 (en) 2001-09-05 2003-06-12 Perlegen Sciences, Inc. Methods for amplification of nucleic acids
US20030215834A1 (en) 2002-05-15 2003-11-20 Fujitsu Limited Method of ordering synthesis of primer used for gene amplification, program therefor and recording medium of the program
US20040012633A1 (en) 2002-04-26 2004-01-22 Affymetrix, Inc., A Corporation Organized Under The Laws Of Delaware System, method, and computer program product for dynamic display, and analysis of biological sequence data
US20050032074A1 (en) 2002-09-09 2005-02-10 Affymetrix, Inc. Custom design method for resequencing arrays
US6928368B1 (en) 1999-10-26 2005-08-09 The Board Regents, The University Of Texas System Gene mining system and method
JP2005301532A (en) 2004-04-09 2005-10-27 Hitachi High-Technologies Corp Primer design apparatus and program
US20060281105A1 (en) 2002-10-07 2006-12-14 Honghua Li High throughput multiplex DNA sequence amplifications
WO2009063270A1 (en) 2007-11-12 2009-05-22 ISTITUTO TUMORI 'Giovanni Paolo II' IRCCS - Laboratorio di Oncologia Sperimentale Clinica Method for the design and engineering of oligonucleotides
US7565248B2 (en) 2000-10-04 2009-07-21 Celadon Laboratories, Inc. Computer system for designing oligonucleotides used in biochemical methods
JP2009268360A (en) 2008-04-30 2009-11-19 Yamaguchi Univ Primer for producing fused dna fragment and method for producing fused dna fragment using the same
WO2009152336A1 (en) 2008-06-13 2009-12-17 Codexis, Inc. Method of synthesizing polynucleotide variants
US20100070452A1 (en) 2006-07-04 2010-03-18 Yusuke Nakamura Device for designing nucleic acid amplification primer, program for designing primer and server device for designing primer
US7698069B2 (en) 2004-09-01 2010-04-13 Hitachi Software Engineering Co., Ltd. Method for designing primer for realtime PCR
US20100184067A1 (en) 2009-01-20 2010-07-22 Sony Corporation Primer evaluation method, primer evaluation program, and real-time polymerase chain reaction apparatus
WO2010113789A1 (en) 2009-04-01 2010-10-07 Necソフト株式会社 Method for designing primer for selex method, method for producing primer, method for producing aptamer, device for designing primer, and computer program and recording medium for designing primer
JP2011004621A (en) 2009-06-23 2011-01-13 Toyohashi Univ Of Technology Probe, probe design device, and probe design program
JP2011062085A (en) 2009-09-15 2011-03-31 National Institute Of Advanced Industrial Science & Technology Apparatus for searching primer set, method and program for searching primer set
WO2011053241A1 (en) 2009-10-29 2011-05-05 Jonas Blomberg Multiplex detection
US8014955B2 (en) 2005-06-27 2011-09-06 George Mason Intellectual Properties, Inc. Method of identifying unique target sequence

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5556749A (en) 1992-11-12 1996-09-17 Hitachi Chemical Research Center, Inc. Oligoprobe designstation: a computerized method for designing optimal DNA probes
US5512458A (en) 1994-02-25 1996-04-30 W. R. Grace & Co.-Conn. Method of using mobile priming sites for DNA sequencing
US6928368B1 (en) 1999-10-26 2005-08-09 The Board Regents, The University Of Texas System Gene mining system and method
EP1136932A1 (en) 2000-03-20 2001-09-26 Hitachi, Ltd. Primer design system
US7565248B2 (en) 2000-10-04 2009-07-21 Celadon Laboratories, Inc. Computer system for designing oligonucleotides used in biochemical methods
US20030108919A1 (en) 2001-09-05 2003-06-12 Perlegen Sciences, Inc. Methods for amplification of nucleic acids
JP2003079366A (en) 2001-09-11 2003-03-18 Hitachi Ltd Information processing system for assisting primer walking
US20040012633A1 (en) 2002-04-26 2004-01-22 Affymetrix, Inc., A Corporation Organized Under The Laws Of Delaware System, method, and computer program product for dynamic display, and analysis of biological sequence data
US20070032963A1 (en) 2002-05-15 2007-02-08 Fujitsu Limited Method of ordering synthesis of primer used for gene amplification, program therefor and recording medium of the program
US20030215834A1 (en) 2002-05-15 2003-11-20 Fujitsu Limited Method of ordering synthesis of primer used for gene amplification, program therefor and recording medium of the program
US20050032074A1 (en) 2002-09-09 2005-02-10 Affymetrix, Inc. Custom design method for resequencing arrays
US20060281105A1 (en) 2002-10-07 2006-12-14 Honghua Li High throughput multiplex DNA sequence amplifications
JP2005301532A (en) 2004-04-09 2005-10-27 Hitachi High-Technologies Corp Primer design apparatus and program
US7698069B2 (en) 2004-09-01 2010-04-13 Hitachi Software Engineering Co., Ltd. Method for designing primer for realtime PCR
US8014955B2 (en) 2005-06-27 2011-09-06 George Mason Intellectual Properties, Inc. Method of identifying unique target sequence
US20100070452A1 (en) 2006-07-04 2010-03-18 Yusuke Nakamura Device for designing nucleic acid amplification primer, program for designing primer and server device for designing primer
WO2009063270A1 (en) 2007-11-12 2009-05-22 ISTITUTO TUMORI 'Giovanni Paolo II' IRCCS - Laboratorio di Oncologia Sperimentale Clinica Method for the design and engineering of oligonucleotides
JP2009268360A (en) 2008-04-30 2009-11-19 Yamaguchi Univ Primer for producing fused dna fragment and method for producing fused dna fragment using the same
WO2009152336A1 (en) 2008-06-13 2009-12-17 Codexis, Inc. Method of synthesizing polynucleotide variants
US20100184067A1 (en) 2009-01-20 2010-07-22 Sony Corporation Primer evaluation method, primer evaluation program, and real-time polymerase chain reaction apparatus
WO2010113789A1 (en) 2009-04-01 2010-10-07 Necソフト株式会社 Method for designing primer for selex method, method for producing primer, method for producing aptamer, device for designing primer, and computer program and recording medium for designing primer
JP2011004621A (en) 2009-06-23 2011-01-13 Toyohashi Univ Of Technology Probe, probe design device, and probe design program
JP2011062085A (en) 2009-09-15 2011-03-31 National Institute Of Advanced Industrial Science & Technology Apparatus for searching primer set, method and program for searching primer set
WO2011053241A1 (en) 2009-10-29 2011-05-05 Jonas Blomberg Multiplex detection

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
BROWN ANDREW MK ET AL: "Optimus Primer: A PCR enrichment primer design program for next-generation sequencing of human exonic regions", BMC RESEARCH NOTES, BIOMED CENTRAL LTD, GB, vol. 3, no. 1, 7 July 2010 (2010-07-07), pages 185, XP021083073, ISSN: 1756-0500, DOI: 10.1186/1756-0500-3-185 *
GARIMA KUSHWAHA ET AL: "PRIMEGENSw3: A Web-Based Tool for High-Throughput Primer and Probe Design", BIOINFORMATICS AND BIOMEDICINE (BIBM), 2011 IEEE INTERNATIONAL CONFERENCE ON, IEEE, 12 November 2011 (2011-11-12), pages 345 - 351, XP032087106, ISBN: 978-1-4577-1799-4, DOI: 10.1109/BIBM.2011.43 *
KADERALI L ET AL: "Primer-design for multiplexed genotyping", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 31, no. 6, 15 March 2003 (2003-03-15), pages 1796 - 1802, XP002996256, ISSN: 0305-1048, DOI: 10.1093/NAR/GKG267 *
LI KELVIN ET AL: "Novel computational methods for increasing PCR primer design effectiveness in directed sequencing", BMC BIOINFORMATICS, BIOMED CENTRAL, LONDON, GB, vol. 9, no. 1, 11 April 2008 (2008-04-11), pages 191, XP021031763, ISSN: 1471-2105 *
M. ZUKER: "Mfold web server for nucleic acid folding and hybridization prediction", NUCLEIC ACIDS RES., vol. 31, no. 13, 2003, pages 3406 - 3415, XP002460708, DOI: doi:10.1093/nar/gkg595
N. R. MARKHAM; M. ZUKER: "Data, Sequence Analysis, and Evolution, J. Keith, ed., Bioinformatics", vol. 2, 2008, HUMANA PRESS INC., article "UNAFoId: Software for Nucleic Acid Folding and Hybridization", pages: 3 - 31
PEARSON; LIPMAN: "Improved tolls for biological sequence comparison", PROC. NATL. ACAD. SCI. USA, vol. 85, 1988, pages 2444 - 2448
RAM PODICHETI; RAJESH GOLLAPUDI; QUNFENG DONG: "WebGBrowse - a web server for GBrowse", BIOINFORMATICS, vol. 25, no. 12, 2009, pages 1550 - 1551
SIMMLER H ET AL: "Real-time primer design for DNA chips", PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2003. PROCEEDINGS. INTE RNATIONAL APRIL 22-26, 2003, PISCATAWAY, NJ, USA,IEEE, 22 April 2003 (2003-04-22), pages 153 - 160, XP010645717, ISBN: 978-0-7695-1926-5 *
STEVE ROZEN; HELEN J.; SKALETSKY: "Bioinformatics Methods and Protocols: Methods in Molecular Biology", 2000, HUMANA PRESS, article "Primer3 on the WWW for general users and for biologist programmers", pages: 365 - 386

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105087771A (en) * 2014-05-06 2015-11-25 金唯智生物科技有限责任公司 Methods and kits for identifying microorganisms in a sample
WO2017218938A1 (en) 2016-06-16 2017-12-21 Life Technologies Corporation Novel compositions, methods and kits for microorganism detection
US11783918B2 (en) 2016-11-30 2023-10-10 Microsoft Technology Licensing, Llc DNA random access storage system via ligation
US10793897B2 (en) 2017-02-08 2020-10-06 Microsoft Technology Licensing, Llc Primer and payload design for retrieval of stored polynucleotides
WO2019094973A1 (en) 2017-11-13 2019-05-16 Life Technologies Corporation Compositions, methods and kits for urinary tract microorganism detection
CN110491448A (en) * 2019-07-15 2019-11-22 广州奇辉生物科技有限公司 A kind of method, system, platform and storage medium handling PCR primer
CN110491448B (en) * 2019-07-15 2023-02-07 广州奇辉生物科技有限公司 Method, system, platform and storage medium for processing PCR primers

Similar Documents

Publication Publication Date Title
Campbell et al. MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations
Venturini et al. Leveraging multiple transcriptome assembly methods for improved gene structure annotation
WO2013134341A1 (en) Primer designing pipeline for targeted sequencing
Griffin et al. Prediction of RNA secondary structure by energy minimization
Rother et al. ModeRNA: a tool for comparative modeling of RNA 3D structure
Sallet et al. EuGene: an automated integrative gene finder for eukaryotes and prokaryotes
Yue et al. Long-read sequencing data analysis for yeasts
Meysman et al. Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli
CN103797486A (en) Method for assembly of nucleic acid sequence data
JP2015509623A (en) DNA sequence data analysis
KR100681795B1 (en) A protocol for genome sequence alignment on grid environment
EP3291114B1 (en) Genome analysis device and genome visualization method
Rother et al. RNA tertiary structure prediction with ModeRNA
Bi et al. Bipartite pattern discovery by entropy minimization-based multiple local alignment
Biswas et al. ISQuest: finding insertion sequences in prokaryotic sequence fragment data
EP1608786B1 (en) Genomic profiling of regulatory factor binding sites
Contreras-Moreira et al. RSAT:: Plants: motif discovery within clusters of upstream sequences in plant genomes
US20080274558A1 (en) Method for identifying and selecting low copy nucleic segments
Lopes et al. ProGeRF: proteome and genome repeat finder utilizing a fast parallel hash function
Sweeney et al. R2DT: computational framework for template-based RNA secondary structure visualisation across non-coding RNA types
Długosz et al. Improvements in DNA reads correction
Sinha PhyME: a software tool for finding motifs in sets of orthologous sequences
Thangadurai et al. Bioinformatics tools for the multilocus phylogenetic analysis of fungi
Fortmann-Grote et al. RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes
Farrell smallrnaseq: short non coding RNA-seq analysis with Python

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13713599

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13713599

Country of ref document: EP

Kind code of ref document: A1