WO2013134341A1 - Primer designing pipeline for targeted sequencing - Google Patents
Primer designing pipeline for targeted sequencing Download PDFInfo
- Publication number
- WO2013134341A1 WO2013134341A1 PCT/US2013/029268 US2013029268W WO2013134341A1 WO 2013134341 A1 WO2013134341 A1 WO 2013134341A1 US 2013029268 W US2013029268 W US 2013029268W WO 2013134341 A1 WO2013134341 A1 WO 2013134341A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- primer
- primers
- sequence
- module
- computerized system
- Prior art date
Links
- 238000012163 sequencing technique Methods 0.000 title claims description 31
- 238000000034 method Methods 0.000 claims abstract description 80
- 238000013461 design Methods 0.000 claims abstract description 57
- 108091093088 Amplicon Proteins 0.000 claims abstract description 47
- 230000008569 process Effects 0.000 claims abstract description 27
- 238000012795 verification Methods 0.000 claims description 19
- 238000004458 analytical method Methods 0.000 claims description 11
- 230000003321 amplification Effects 0.000 claims description 10
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 10
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 6
- 150000007523 nucleic acids Chemical group 0.000 claims description 6
- 238000013500 data storage Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000002869 basic local alignment search tool Methods 0.000 description 21
- 238000012938 design process Methods 0.000 description 5
- 238000012165 high-throughput sequencing Methods 0.000 description 5
- 108020004707 nucleic acids Proteins 0.000 description 4
- 102000039446 nucleic acids Human genes 0.000 description 4
- 108020004414 DNA Proteins 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- 238000002844 melting Methods 0.000 description 2
- 230000008018 melting Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 101000836075 Homo sapiens Serpin B9 Proteins 0.000 description 1
- 101000661807 Homo sapiens Suppressor of tumorigenicity 14 protein Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 102100025517 Serpin B9 Human genes 0.000 description 1
- 108700019146 Transgenes Proteins 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 238000005422 blasting Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 238000007403 mPCR Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000012882 sequential analysis Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
Definitions
- This invention is generally related to the field of molecular biology, and more specifically the field of primer design for targeted/high-throughput sequencing.
- Primer designing for DNA sequencing is a vital part for modern biological research.
- Traditional primer designing programs for example publically available
- Primer3 use a single input DNA sequence to process and design optimal primers under specified parameters including primer length, GC content, melting temperature (Tm) and others.
- Tm melting temperature
- these traditional primer designing programs are not suitable for handling large files with long or multiple sequences.
- the target region can be a small ( ⁇ 1 kilo bases or kb) to large contiguous or repeat-masked regions (even entire genomes provided sufficient hardware and memory to handle the processes).
- the systems and methods provided herein can be used to design multiple sets of overlapping or non-overlapping primers/amplicons for a target region.
- the primers designed using the systems and methods provided herein can also be used for multiplex PCR analysis.
- a computerized system for primer/amplicon design for sequencing comprises:
- the system further comprises at least one of format output module, and a BLAST verification module. In a further embodiment, the system further comprises at least one of format output module, BLAST verification module, and adaptor verification module.
- the input device is selected from the group consisting of automated sequencer, sequencing data input device, and sequencing data storage device.
- the output interface comprises interface for
- the database described herein contains information selected from the group consisting of genomic sequences, previously generated primers, and sequences for BLAST analysis.
- the load sequence module processes sequences in FASTA format.
- the load sequence module uses random file access.
- the load sequence module does not use sequential file access.
- the primer design module performs at least one of
- the primer design module processes primer design in parallel. In another embodiment, the primer design module does not design primers in a non-parallel or sequential manner. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 10 primers per minute. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 100 primers per minute.
- the primer design module generates primers or processes primer design at a speed between 200 and 500 primers per minute.
- the primers constitute overlapping amplicons for sequence assembly.
- the primers constitute overlapping amplicons for sequencing and assembly.
- the overlapping region of amplicons comprises at least 50 bp or minimal overlap.
- the overlapping region of amplicons comprises at least 100 bp.
- the overlapping region of amplicons comprises between 100 bp and 1000 bp.
- a method for use in a computerized system for primer/amplicon design for sequencing comprises:
- the method further comprises pre-processing sequences by modifying sequences before primer design.
- the method further comprises defining target regions/sequences.
- the method further comprises defining windows for primer design.
- the computerized system of the method comprises a system described herein.
- the sequence data is larger than 100 kilo bases (kb).
- the sequence data is larger than 10 Mega bases (mb).
- the sequence data is between 10 mb and 1 giga bases (gb).
- the load sequence module processes sequences in FASTA format.
- the load sequence module uses random file access.
- the method provided further comprises at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets.
- the method provides primers at a speed greater than 10 primers per minute. In a further or alternative embodiment, the method provides primers at a speed greater than 100 primers per minute. In a further or alternative embodiment, the method provides primers at a speed between 200 and 500 primers per minute. In another embodiment, the primers constitute overlapping amplicons for sequence assembly. In a further embodiment, the overlapping region of amplicons comprises at least 50 bp or minimal overlap. In a further embodiment, the overlapping region of amplicons comprises at least 100 bp. In a further embodiment, the overlapping region of amplicons comprises between 100 bp and 1000 bp.
- the method further comprises verifying the primers using a BLAST verification module. In another embodiment, the method further comprises verifying secondary structure of primers using an adaptor verification module. In another embodiment, the method further comprises simulating the sequencing using a sequencing simulation module. In another embodiment, the method further comprises an output format module for outputting for visualization using WebGBrowse. BRIEF DESCRIPTION OF THE DRAWINGS
- Figure 1 shows an exemplary flowchart of the Primer Designing Pipeline provided herein.
- Figure 2 shows an exemplary embodiment for overlapping amplicon design for high-throughput sequencing.
- Figure 3 shows an exemplary automated process for the systems and methods provide herein.
- Figure 4 shows an exemplary process for the BLAST verification modules and methods provided herein.
- Figure 5 shows exemplary FASTA sequences to be loaded into the Primer Designing Pipeline provided herein.
- Figure 6 shows an exemplary screen shot when the Primer Designing Pipeline provided loads FASTA files for downstream analysis.
- a publically available "Primer3" program is incorporated by the systems and methods provided to process the overlapping primer designing task in targeted regions while also combining the utility of Batch-Primer3 using a customized and compiled program.
- the "Primer3" program has been previously described in Steve Rozen and Helen J.
- the Primer Designing Pipeline provided is programmed using .NET framework and allows multiple "Primer3" processes to be performed in parallel while validating for overlap.
- the Primer Designing Pipeline described herein provides at least one of the advantages below: (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplex amplifications; (4) designing a tiling of amplicons across a sequence; and (5) picking primers from a reverse-translated amino acid sequence.
- the systems and methods provided herein enable primer designing especially for "targeted re- sequencing” applications, for example, using high- throughput (HTP) next-generation sequencing (NGS) instruments.
- the Primer Designing Pipeline provided can be modular to take small to large sequences as input and also allows changes of the amplicon lengths to suit various NGS platforms as requested by users.
- primers designed using the systems and/or methods provided herein can be used with Fluidigm AccessArray system, a HTP multiplexed amplicon library generation system for efficient and cost-effective generation of sequencing data for further analysis.
- the systems and methods described herein can provide HTP overlapping primer design for complementing the utility of Fluidigm AccessArray system for marker development, gene confirmation, transgene region validation for regulatory affairs, QTL mining and genotyping-by-sequencing.
- FIG. 1 An exemplary primer design workflow is illustrated in Figure 1.
- the users can select sequence files for which they want to design primers. Files are typically not loaded into memory but instead analyzed for random access.
- the FASTA file format is a common platform for displaying biological sequences.
- the target sequence(s) can typically be provided in a FASTA format.
- the FASTA file can contain one or multiple sequences.
- Each sequence is always preceded by a header line which is prefixed with ">" followed by the ID, description, and/or other pertinent information about the sequence.
- the sequence information is then listed on subsequent lines and usually wraps (carriage return/line feed) every 70 or 80 characters depending on the program that generated the file.
- wraps carriage return/line feed
- DNA sequences in FASTA format are shown in Figure 5.
- FASTA files can become quite large when dealing with long sequences, a large number sequences, or a combination of both.
- most FASTA files which represent the entire genome of complex eukaryotes may easily exceed 2 gigabytes.
- This size issue poses a problem for the primer design process because normal file handling methods involve starting at the beginning and reading the file into memory until the data of interest is found. For example, a sequential file access for a file larger than 2 gigabytes may take longer than 30 seconds to sequentially read to the spot in the file where the data of interest is. Consequently, this process has to take place for each set of primers which need to be designed for traditional primer designing programs.
- a Load Sequence Module which runs the sequence loading processes in parallel. Random file access is combined with sequential file access to speed up the process. Each character in the file resides at a specific addressable location on a disk. In addition to starting at the beginning of the file and reading each character in order (sequential file access), the Load Sequence Module provides a means to access any location in the file at random as long as the address is known. Random file access can speed processes up considerably because the process does not have to read all the characters/data that came before the data in the file that it's interesting in extracting. Instead of being obligated to perform sequential file access faster, the Load Sequence Module provided is able to determine the starting address in the file at which the data of interest is located.
- the Primer Designing Pipeline provided initially reads the entire file sequentially through once, analyzes it to determine how it' s formatted, and stores that analysis in a SQL Server Compact database. Then when the Primer Designing Pipeline provided is designing the primers and needs to extract a sequence from one of the files, the Primer Designing Pipeline provided uses the analysis results stored in the SQL Server Compact database to calculate the location within the file (or address) of the data of interest. Thus, that address is used to extract/read the data using random file access.
- each block has the same line length, and it stores information about how many lines, characters per line, and file start and stop position for each block. For example as shown in Figure 5, the first block would start at file position 0, contain 1 line with 13 characters per line (a newline and carriage return exist at the end of the line), and the block would end at position 12. The block after that would start at file position 13, contain 2 lines with 72 characters long, and end at position 156. Additionally, since Load Sequence Module assumes that each header block is unique, each header block can be loaded into a hash table (dictionary) in memory for quick access when working with the file. The blocks following the header which contain the sequence are then linked to the header block in the hash table.
- a hash table dictionary
- the Primer Designing Pipeline provided would look up the header up in the hash table first. Then iterates through each block sequentially to see if the sequence starts in that block. If the file format follows the normal FASTA format, then the Primer Designing Pipeline provided should at most only have to check two blocks because there should be only be two blocks for each sequence in the file. Once the block containing the starting position is determined, the position within that block can be calculated because each character takes up one position/byte in the file.
- each block is also analyzed for what type of newline characters as well as other whitespace characters occur at the end each line in the block. For example, if the following target "SEQUENCE_2 I Corn Sequence Gene A45: 136,8" (header : start, length) is needed to be extracted, then the Primer Designing Pipeline provided would determine that it fell in the first block following the header. For example, the block starts at file position 222, contains 3 lines with 72 characters per line, and there are 2 ending whitespace characters in each line.
- Figure 6 shows an exemplary screenshot illustrating the part of the Load Sequence Module used to load the example FASTA file as shown in Figure 5, where the first block of each section is the header block.
- Targets step segments for which primers to be designed are define by the user(s) (for example all sequences or only masked regions greater than a specified length) or loaded from a file such as a GFF file.
- the GFF format is useful as input files for programs like WebGBrowse, which is previously disclosed in Ram Podicheti, Rajesh Gollapudi, and Qunfeng Dong. (2009) "WebGBrowse - a web server for GBrowse.” Bioinformatics, 25(12): 1550-1551, the content of which is incorporated by reference in its entirety.
- Step 1 is the Load / Define Window step, where the area in which primers to be placed is defined (for example 100 bps up and downstream from the target) or loaded from a file.
- the next step is Enter Primer Parameters, where parameters including primer length, melting temperature, GC content, 3' stability, and/or estimated secondary structure.
- Primer3 is used as the primary design engine and use(s) can enter parameters as required by the Primer3 program.
- a Primer Design Module having two major functions: (1) selecting and processing the target and (2) saving the results.
- the algorithm of the Primer Design Module may adjust settings internally. For example with HTP primer design, the Primer Design Module can start at the beginning of the target area and design overlapping primer sets until it reaches the end. In one embodiment, additional evaluation may be needed such as BLASTing or secondary structure prediction before moving to the next set of primers.
- Basic Local Alignment Search Tool (BLAST) is a commonly used sequence alignment tool. See Altschul et al. (1990) /. Mol. Biol. 215: 403- 410, the content of which is hereby incorporated by reference in its entirety.
- the Primer Designing Pipeline provided automatically generates and adds specified adaptor sequences to the designed primers.
- the user(s) When performing targeted genome sequencing where only specific sub- sequences within a genome are desired, the user(s) must manually design primers to create overlapping amplicons where the targeted region is larger than the maximum read length of the sequencing equipment. To date, most high throughput sequencing machines can only sequentially read a limited number of base pairs in one run. Thus, the source genetic material needs to be chopped up into segments that are less than the maximum read length. In order to assembly these segments back into one sequence, the segments need to have some overlap sequence (usually at least 20 base pairs).
- Figure 2 shows an exemplary embodiment of high-throughput sequencing where a sequencer is used to sequence the target region.
- the target region is 5,000 base pairs long and the sequencer can only read segments of DNA up to 700 base pairs - i.e., the 5,000 bp sequence needs to be "chopped" in 700 bp (base pairs) or less segments.
- the source sequence isn't chopped but the 5,000 bp sequence is amplified into 700 bp segments for sequencing.
- shorter overlapping copies are made instead.
- the reads are stored in a data file and an assembly program assembles them back into one continuous sequence.
- primers In order to make the shorter copies or amplicons during the amplification stage of the sequencing, primers, short sequences that mark the beginning and end of an amplicon, have to be designed and created.
- Traditional tools for designing primers can only design one set of primers (or one amplicon) at a time. These overlapping amplicons have to be designed serially (one after the other) and usually in a fairly manual process.
- the automated systems and methods for designing primers for overlapping amplicons.
- the automated systems and methods start from a traditional primer design program/software, for example Primer3, which can be downloaded locally and run from a command line interface from either a Linux or Windows machine.
- the automated systems and methods provided are generated using Perl script (parallelized or non-parallelized).
- the automated systems and methods provided are generated using Microsoft .NET 4.0. , which contains functionality for parallelizing processes.
- the systems and methods provided using Microsoft .NET 4.0 can design large batches of primers in a few minutes as compared to hours using non-parallelized Perl script or days with the traditional approaches.
- the automated systems and methods provided use a parallelized approach.
- the automated systems and methods provided does not use a non-parallelized approach.
- Figure 3 shows an exemplary system provided using Microsoft .NET 4.0.
- the output from the .NET program is a tab delimited test file containing the primer sequences and information about the quality of the primers.
- the Primer Designing Pipeline provides also saves copies of the input sent to Primer3 and the output Primer3 generates. This can be useful in troubleshooting any issues that may arise or manually re-running one portion of the process if necessary.
- a Format Output Module which reads the output from the automation program and generate a general feature file (GFF) formatted file that can be used by other programs including GBrowse to visually overlay the primers and amplicons on the source sequence.
- GFF general feature file
- the raw results from the Primer Designing Pipeline are compiled and formatted into a tab delimited format as well as optionally a GFF format for feeding into GBrowse for visualization. For one example, 14 pairs of
- primers/amplicons are created within four minutes using the systems and methods provided. These 14 pairs of primers form overlapping amplicons over one broad targeted sequence. For another example, 9 pairs of primers/amplicons (i.e., 18 primers) are created within two minutes using the systems and methods provided. These 9 pairs of primers from overlapping amplicons over two separated targeted sequences, where the two targeted sequences are still within the same genome (i.e., skipping one region in between for sequencing).
- the systems and/or methods provided also comprise a BLAST Verification Module.
- An exemplary BLAST Verification Module is illustrated in Figure 4.
- the BLAST Verification Module verifies target redundancy for amplicon primer design.
- the BLAST Verification Module will take the outputs and BLAST them against the targeted genome or sequence library available in database.
- the BLAST Verification Module allows a primer set (pair) or amplicon to be unique based on BLAST analysis.
- the BLAST Verification Module provides BLAST analysis in parallel, thus saving time as compared to sequential analysis.
- primers from the first set typically have to be BLASTed then the next one, and then the results from both BLAST queries must be compared to see if both primers land within pre-determined number of base pairs from each other (usually 1000) and are on opposite strands of the DNA pointing the correct direction for amplification to occur. If non-unique primers are found, then those specific sequences need to be re-run through the primer design process with different parameters.
- the BLAST Verification Module specifies a BLAST database to use before running the primer design process. As the primer design got back individual results, the Primer Designing Pipeline disclosed can automatically check them for uniqueness and try to re-run that sequence if necessary.
- a copy of the BLAST database is created locally on the user's workstation. In other embodiments, the BLAST databases is located on a serve and accessed remotely by the Primer Designing Pipeline.
- the systems and/or methods provided also comprise an Adaptor Verification Module.
- the Primer Designing Pipeline adds adapter/tag sequences to the primers in order for the sequencing machine to be able to sequence the amplicon. This adapter/tag sequence is often the same for all primers.
- the secondary structure of a designed primer may change significantly after adding such adaptor/tag sequence.
- RNAstructure for both DNA and RNA
- RNA has been developed by the University of Rochester that can be used to predict the most likely binding structure a sequence or pair sequences can make.
- the Adaptor Verification Module of the Primer Designing Pipeline can automate the RNAstructure program through the command line and enable prediction whether after adding an adapter sequence, a primer set is still a scientifically good choice.
- the Adaptor Verification Module comprises an internal scoring system for classifying primers based on the predicted secondary structure.
- Other programs can be used for the Adaptor Verification Module provided including Mfold (as disclosed in M. Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-3415, 2003) and UNAFold (as disclosed in N. R. Markham & M. Zuker. UNAFold: Software for Nucleic Acid Folding and Hybridization. In Data, Sequence Analysis, and Evolution, J. Keith, ed., Bioinformatics: Volume 2, Chapter 1, pp 3-31, Humana Press Inc., 2008), the content of both are hereby incorporated by reference in their entireties.
- simulation of amplification and/or sequencing can be performed.
- Several programs have been disclosed to simulate the entire sequencing process, for example in silico PCR amplification for amplification, MetaSim for simulating sequencing, and CAP3 for assembly.
- the Primer Designing Pipeline can integrate the ability to simulate the entire sequencing process using a series of simulators.
Abstract
Provided are systems and methods for customized primer/amplicon designing programs which enable users to design overlapping primers/amplicons in a target region or multiple targeted regions. The target region can be a small (<1 kb) to large contiguous or repeat-masked regions (even entire genomes provided sufficient hardware and memory to handle the processes). The systems and methods provided herein can be used to design multiple sets of overlapping or non-overlapping primers/amplicons for a target region.
Description
PRIMER DESIGNING PIPELINE FOR TARGETED SEQUENCING
FIELD OF THE INVENTION
[0001] This invention is generally related to the field of molecular biology, and more specifically the field of primer design for targeted/high-throughput sequencing.
BACKGROUND OF THE INVENTION
[0002] Primer designing for DNA sequencing is a vital part for modern biological research. Traditional primer designing programs, for example publically available
"Primer3," use a single input DNA sequence to process and design optimal primers under specified parameters including primer length, GC content, melting temperature (Tm) and others. However, there is no dynamic primer designing program which can design overlapping primers simultaneously to facilitate coverage across an entire region or multiple targeted regions. In addition, these traditional primer designing programs are not suitable for handling large files with long or multiple sequences. Thus, there remains a need for an efficient primer designing pipeline for targeted sequence spanning a broad region or multiple targeted regions.
SUMMARY OF THE INVENTION
[0003] Provided are systems and methods for customized primer/amplicon designing programs which enable users to design overlapping primers/amplicons in a target region or multiple targeted regions. The target region can be a small (<1 kilo bases or kb) to large contiguous or repeat-masked regions (even entire genomes provided sufficient hardware and memory to handle the processes). The systems and methods provided herein can be used to design multiple sets of overlapping or non-overlapping primers/amplicons for a target region. In some embodiments, the primers designed using the systems and methods provided herein can also be used for multiplex PCR analysis.
[0004] In one aspect, provided is a computerized system for primer/amplicon design for sequencing. The system comprises:
(a) an input device and an output device/interface;
(b) an analysis system interface coupled to memory of a computer;
(c) an operating system optionally comprising a database;
(d) a load sequence module for loading nucleic acid sequences; and
(e) a primer design module for design primer pairs.
[0005] In one embodiment, the system further comprises at least one of format output module, and a BLAST verification module. In a further embodiment, the system further
comprises at least one of format output module, BLAST verification module, and adaptor verification module. In another embodiment, the input device is selected from the group consisting of automated sequencer, sequencing data input device, and sequencing data storage device. In another embodiment, the output interface comprises interface for
WebGBrowse or GenomeBrowser.
[0006] In one embodiment, the database described herein contains information selected from the group consisting of genomic sequences, previously generated primers, and sequences for BLAST analysis. In another embodiment, the load sequence module processes sequences in FASTA format. In another embodiment, the load sequence module uses random file access. In another embodiment, the load sequence module does not use sequential file access.
[0007] In one embodiment, the primer design module performs at least one of
(1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets. In another embodiment, the primer design module processes primer design in parallel. In another embodiment, the primer design module does not design primers in a non-parallel or sequential manner. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 10 primers per minute. In another embodiment, the primer design module generates primers or processes primer design at a speed greater than 100 primers per minute. In a further or alternative embodiment, the primer design module generates primers or processes primer design at a speed between 200 and 500 primers per minute. In another embodiment, the primers constitute overlapping amplicons for sequence assembly. In another embodiment, the primers constitute overlapping amplicons for sequencing and assembly. In a further embodiment, the overlapping region of amplicons comprises at least 50 bp or minimal overlap. In a further embodiment, the overlapping region of amplicons comprises at least 100 bp. In a further embodiment, the overlapping region of amplicons comprises between 100 bp and 1000 bp.
[0008] In another aspect, provided is a method for use in a computerized system for primer/amplicon design for sequencing. The method comprises:
(a) upload sequence data using a load sequence module;
(b) designing multiple primers in parallel using a primer design module; and
(c) outputting primer design through an output interface.
[0009] In one embodiment, the method further comprises pre-processing sequences by modifying sequences before primer design. In a further or alternative embodiment, the method further comprises defining target regions/sequences. In a further or alternative embodiment, the method further comprises defining windows for primer design.
[0010] In one embodiment, the computerized system of the method comprises a system described herein. In another embodiment, the sequence data is larger than 100 kilo bases (kb). In a further or alternative embodiment, the sequence data is larger than 10 Mega bases (mb). In a further or alternative embodiment, the sequence data is between 10 mb and 1 giga bases (gb).
[0011] In one embodiment, the load sequence module processes sequences in FASTA format. In another embodiment, the load sequence module uses random file access. In another embodiment, the method provided further comprises at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets.
[0012] In one embodiment, the method provides primers at a speed greater than 10 primers per minute. In a further or alternative embodiment, the method provides primers at a speed greater than 100 primers per minute. In a further or alternative embodiment, the method provides primers at a speed between 200 and 500 primers per minute. In another embodiment, the primers constitute overlapping amplicons for sequence assembly. In a further embodiment, the overlapping region of amplicons comprises at least 50 bp or minimal overlap. In a further embodiment, the overlapping region of amplicons comprises at least 100 bp. In a further embodiment, the overlapping region of amplicons comprises between 100 bp and 1000 bp.
[0013] In one embodiment, the method further comprises verifying the primers using a BLAST verification module. In another embodiment, the method further comprises verifying secondary structure of primers using an adaptor verification module. In another embodiment, the method further comprises simulating the sequencing using a sequencing simulation module. In another embodiment, the method further comprises an output format module for outputting for visualization using WebGBrowse.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Figure 1 shows an exemplary flowchart of the Primer Designing Pipeline provided herein.
[0015] Figure 2 shows an exemplary embodiment for overlapping amplicon design for high-throughput sequencing.
[0016] Figure 3 shows an exemplary automated process for the systems and methods provide herein.
[0017] Figure 4 shows an exemplary process for the BLAST verification modules and methods provided herein.
[0018] Figure 5 shows exemplary FASTA sequences to be loaded into the Primer Designing Pipeline provided herein.
[0019] Figure 6 shows an exemplary screen shot when the Primer Designing Pipeline provided loads FASTA files for downstream analysis.
DETAILED DESCRIPTION OF THE INVENTION
[0020] Various algorithms have been descried previous and can be incorporated in the systems and methods provided to design multiple pairs of primers simultaneously. For example, primer design methods have been disclosed in U.S. Patent Nos. 5,512,458,
5,556,749, 6,928,368, 7,565,248, 7,698,069, and 8,014,955; patent applications
US2003/0108919, US2003/0215834, US2003/0215834, US2004/0012633, US2005/0032074, US2006/0281105, US2007/0032963, US2010/0070452, US2010/0184067, JP2003079366, JP2005301532, JP2009268360, JP2011004621, JP2011062085, and EP1136932;
international patent applications WO2009/063270, WO2009/152336, WO2010/113789, and WO 2011/053241, the content of which are incorporated by reference in their entireties. In some embodiments, a publically available "Primer3" program is incorporated by the systems and methods provided to process the overlapping primer designing task in targeted regions while also combining the utility of Batch-Primer3 using a customized and compiled program. The "Primer3" program has been previously described in Steve Rozen and Helen J.
Skaletsky. (2000) "Primer3 on the WWW for general users and for biologist programmers." In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in
Molecular Biology. Humana Press, Totowa, NJ, pp 365-386, the content of which is hereby incorporated by reference in its entirety. In some embodiments, the Primer Designing Pipeline provided is programmed using .NET framework and allows multiple "Primer3" processes to be performed in parallel while validating for overlap. In some embodiments, the
Primer Designing Pipeline described herein provides at least one of the advantages below: (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplex amplifications; (4) designing a tiling of amplicons across a sequence; and (5) picking primers from a reverse-translated amino acid sequence.
[0021] In some embodiments, the systems and methods provided herein enable primer designing especially for "targeted re- sequencing" applications, for example, using high- throughput (HTP) next-generation sequencing (NGS) instruments. The Primer Designing Pipeline provided can be modular to take small to large sequences as input and also allows changes of the amplicon lengths to suit various NGS platforms as requested by users.
[0022] In some embodiments, primers designed using the systems and/or methods provided herein can be used with Fluidigm AccessArray system, a HTP multiplexed amplicon library generation system for efficient and cost-effective generation of sequencing data for further analysis. For targeted re-sequencing projects, the systems and methods described herein can provide HTP overlapping primer design for complementing the utility of Fluidigm AccessArray system for marker development, gene confirmation, transgene region validation for regulatory affairs, QTL mining and genotyping-by-sequencing.
[0023] An exemplary primer design workflow is illustrated in Figure 1. In the Load Sequence step, the users can select sequence files for which they want to design primers. Files are typically not loaded into memory but instead analyzed for random access. First introduced by Bill Pearson and David Lipman in 1988 for representing either nucleotide or amino acid sequences (see Pearson and Lipman, "Improved tolls for biological sequence comparison" (1988) Proc. Natl. Acad. Sci. USA 85:2444-2448; the content of which is hereby incorporated by reference in its entirety), the FASTA file format is a common platform for displaying biological sequences. When designing primers, the target sequence(s) can typically be provided in a FASTA format. The FASTA file can contain one or multiple sequences. Each sequence is always preceded by a header line which is prefixed with ">" followed by the ID, description, and/or other pertinent information about the sequence. The sequence information is then listed on subsequent lines and usually wraps (carriage return/line feed) every 70 or 80 characters depending on the program that generated the file. Typically DNA sequences in FASTA format are shown in Figure 5.
[0024] Especially for re-sequence projects and/or high-throughput sequencing, FASTA files can become quite large when dealing with long sequences, a large number sequences, or a combination of both. For example most FASTA files which represent the entire genome of
complex eukaryotes may easily exceed 2 gigabytes. This size issue poses a problem for the primer design process because normal file handling methods involve starting at the beginning and reading the file into memory until the data of interest is found. For example, a sequential file access for a file larger than 2 gigabytes may take longer than 30 seconds to sequentially read to the spot in the file where the data of interest is. Consequently, this process has to take place for each set of primers which need to be designed for traditional primer designing programs.
[0025] Accordingly, provided is a Load Sequence Module which runs the sequence loading processes in parallel. Random file access is combined with sequential file access to speed up the process. Each character in the file resides at a specific addressable location on a disk. In addition to starting at the beginning of the file and reading each character in order (sequential file access), the Load Sequence Module provides a means to access any location in the file at random as long as the address is known. Random file access can speed processes up considerably because the process does not have to read all the characters/data that came before the data in the file that it's interesting in extracting. Instead of being obligated to perform sequential file access faster, the Load Sequence Module provided is able to determine the starting address in the file at which the data of interest is located.
[0026] In some embodiments, the Primer Designing Pipeline provided initially reads the entire file sequentially through once, analyzes it to determine how it' s formatted, and stores that analysis in a SQL Server Compact database. Then when the Primer Designing Pipeline provided is designing the primers and needs to extract a sequence from one of the files, the Primer Designing Pipeline provided uses the analysis results stored in the SQL Server Compact database to calculate the location within the file (or address) of the data of interest. Thus, that address is used to extract/read the data using random file access.
[0027] In a further embodiment, when the program initially reads through a file sequentially it breaks the file into blocks where each block has the same line length, and it stores information about how many lines, characters per line, and file start and stop position for each block. For example as shown in Figure 5, the first block would start at file position 0, contain 1 line with 13 characters per line (a newline and carriage return exist at the end of the line), and the block would end at position 12. The block after that would start at file position 13, contain 2 lines with 72 characters long, and end at position 156. Additionally, since Load Sequence Module assumes that each header block is unique, each header block can be loaded into a hash table (dictionary) in memory for quick access when working with
the file. The blocks following the header which contain the sequence are then linked to the header block in the hash table.
[0028] Later when the Primer Designing Pipeline provided needs to access a sequence or sub-sequence within the file, the Primer Designing Pipeline provided would look up the header up in the hash table first. Then it iterates through each block sequentially to see if the sequence starts in that block. If the file format follows the normal FASTA format, then the Primer Designing Pipeline provided should at most only have to check two blocks because there should be only be two blocks for each sequence in the file. Once the block containing the starting position is determined, the position within that block can be calculated because each character takes up one position/byte in the file.
[0029] The challenging part of calculating the starting position is taking into account the newline characters that occur at the end of each line. In some embodiments, the newline character can also be preceded by a carriage return character. In some embodiments, each block is also analyzed for what type of newline characters as well as other whitespace characters occur at the end each line in the block. For example, if the following target "SEQUENCE_2 I Corn Sequence Gene A45: 136,8" (header : start, length) is needed to be extracted, then the Primer Designing Pipeline provided would determine that it fell in the first block following the header. For example, the block starts at file position 222, contains 3 lines with 72 characters per line, and there are 2 ending whitespace characters in each line. Using these statistics the Primer Designing Pipeline provided can divide the sequence position by the number of sequence characters per line: 136/70 = 1 line plus remainder 66. It then takes the dividend and multiplies it by the whitespace character count: 1 x 2 = 2. And then it adds that to the remainder: 66 + 2 = 68. That result is then added to the starting file position for the block to determine the actually starting file position for the sub-sequence. 222 + 68 = 290. The consequence is that the file position is determined random file access which can be used to read the sub-sequence more quickly than with sequentially file access.
[0030] Figure 6 shows an exemplary screenshot illustrating the part of the Load Sequence Module used to load the example FASTA file as shown in Figure 5, where the first block of each section is the header block.
[0031] Back to Figure 1, in the Pre-Process Sequences step, modifications may be added to the original sequences including masking and/or converting bases for methylation. In the Load / Define Targets step, segments for which primers to be designed are define by the user(s) (for example all sequences or only masked regions greater than a specified length) or
loaded from a file such as a GFF file. The GFF format is useful as input files for programs like WebGBrowse, which is previously disclosed in Ram Podicheti, Rajesh Gollapudi, and Qunfeng Dong. (2009) "WebGBrowse - a web server for GBrowse." Bioinformatics, 25(12): 1550-1551, the content of which is incorporated by reference in its entirety.
[0032] Next in Figure 1 is the Load / Define Window step, where the area in which primers to be placed is defined (for example 100 bps up and downstream from the target) or loaded from a file. The next step is Enter Primer Parameters, where parameters including primer length, melting temperature, GC content, 3' stability, and/or estimated secondary structure. In some embodiment, Primer3 is used as the primary design engine and use(s) can enter parameters as required by the Primer3 program.
[0033] Provided is a Primer Design Module having two major functions: (1) selecting and processing the target and (2) saving the results. In some embodiments, when processing a target, the algorithm of the Primer Design Module may adjust settings internally. For example with HTP primer design, the Primer Design Module can start at the beginning of the target area and design overlapping primer sets until it reaches the end. In one embodiment, additional evaluation may be needed such as BLASTing or secondary structure prediction before moving to the next set of primers. Basic Local Alignment Search Tool (BLAST) is a commonly used sequence alignment tool. See Altschul et al. (1990) /. Mol. Biol. 215: 403- 410, the content of which is hereby incorporated by reference in its entirety. In some embodiments, the Primer Designing Pipeline provided automatically generates and adds specified adaptor sequences to the designed primers.
[0034] When performing targeted genome sequencing where only specific sub- sequences within a genome are desired, the user(s) must manually design primers to create overlapping amplicons where the targeted region is larger than the maximum read length of the sequencing equipment. To date, most high throughput sequencing machines can only sequentially read a limited number of base pairs in one run. Thus, the source genetic material needs to be chopped up into segments that are less than the maximum read length. In order to assembly these segments back into one sequence, the segments need to have some overlap sequence (usually at least 20 base pairs).
[0035] Figure 2 shows an exemplary embodiment of high-throughput sequencing where a sequencer is used to sequence the target region. The target region is 5,000 base pairs long and the sequencer can only read segments of DNA up to 700 base pairs - i.e., the 5,000 bp sequence needs to be "chopped" in 700 bp (base pairs) or less segments. In reality, the
source sequence isn't chopped but the 5,000 bp sequence is amplified into 700 bp segments for sequencing. In the amplification step of the sequencing process where the multiple copies of the sequence are made to be sequenced, shorter overlapping copies are made instead. After the sequencing machine finishes reading/sequencing all the short copies, the reads are stored in a data file and an assembly program assembles them back into one continuous sequence.
[0036] In order to make the shorter copies or amplicons during the amplification stage of the sequencing, primers, short sequences that mark the beginning and end of an amplicon, have to be designed and created. Traditional tools for designing primers can only design one set of primers (or one amplicon) at a time. These overlapping amplicons have to be designed serially (one after the other) and usually in a fairly manual process. First the user designs primers for the first amplicon using some software, then sees where that amplicon ends and then designs primers for the next amplicon making sure the beginning of that amplicon overlaps the end of the previous. This is very tedious and requires considerable amounts of copying and pasting and calculating overlaps by hand. Additionally for targeted sequencing, there can be more than one target so this design process has to be performed for each target.
[0037] Provided is an automated systems and methods for designing primers for overlapping amplicons. In some embodiments, the automated systems and methods provided start from a traditional primer design program/software, for example Primer3, which can be downloaded locally and run from a command line interface from either a Linux or Windows machine.
[0038] In some embodiments, the automated systems and methods provided are generated using Perl script (parallelized or non-parallelized). In other embodiment, the automated systems and methods provided are generated using Microsoft .NET 4.0. , which contains functionality for parallelizing processes. In some embodiments, the systems and methods provided using Microsoft .NET 4.0 can design large batches of primers in a few minutes as compared to hours using non-parallelized Perl script or days with the traditional approaches. In some embodiments, the automated systems and methods provided use a parallelized approach. In some embodiments, the automated systems and methods provided does not use a non-parallelized approach.
[0039] Figure 3 shows an exemplary system provided using Microsoft .NET 4.0. The output from the .NET program is a tab delimited test file containing the primer sequences and information about the quality of the primers. In some embodiments, in addition to the final
file containing the aggregated results, the Primer Designing Pipeline provides also saves copies of the input sent to Primer3 and the output Primer3 generates. This can be useful in troubleshooting any issues that may arise or manually re-running one portion of the process if necessary.
[0040] Back to Figure 1, provided is a Format Output Module which reads the output from the automation program and generate a general feature file (GFF) formatted file that can be used by other programs including GBrowse to visually overlay the primers and amplicons on the source sequence. In some embodiments, the raw results from the Primer Designing Pipeline are compiled and formatted into a tab delimited format as well as optionally a GFF format for feeding into GBrowse for visualization. For one example, 14 pairs of
primers/amplicons (i.e., 28 primers) are created within four minutes using the systems and methods provided. These 14 pairs of primers form overlapping amplicons over one broad targeted sequence. For another example, 9 pairs of primers/amplicons (i.e., 18 primers) are created within two minutes using the systems and methods provided. These 9 pairs of primers from overlapping amplicons over two separated targeted sequences, where the two targeted sequences are still within the same genome (i.e., skipping one region in between for sequencing).
[0041] In further embodiments, the systems and/or methods provided also comprise a BLAST Verification Module. An exemplary BLAST Verification Module is illustrated in Figure 4. In one embodiment, the BLAST Verification Module verifies target redundancy for amplicon primer design. In another embodiment, after the previous steps of the primer design process, the BLAST Verification Module will take the outputs and BLAST them against the targeted genome or sequence library available in database. Typically the BLAST Verification Module allows a primer set (pair) or amplicon to be unique based on BLAST analysis. In some embodiments, the BLAST Verification Module provides BLAST analysis in parallel, thus saving time as compared to sequential analysis. Typically one primer from the first set has to be BLASTed then the next one, and then the results from both BLAST queries must be compared to see if both primers land within pre-determined number of base pairs from each other (usually 1000) and are on opposite strands of the DNA pointing the correct direction for amplification to occur. If non-unique primers are found, then those specific sequences need to be re-run through the primer design process with different parameters.
[0042] In some embodiment, the BLAST Verification Module specifies a BLAST
database to use before running the primer design process. As the primer design got back individual results, the Primer Designing Pipeline disclosed can automatically check them for uniqueness and try to re-run that sequence if necessary. In some embodiments, a copy of the BLAST database is created locally on the user's workstation. In other embodiments, the BLAST databases is located on a serve and accessed remotely by the Primer Designing Pipeline.
[0043] In further embodiments, the systems and/or methods provided also comprise an Adaptor Verification Module. In some embodiments, the Primer Designing Pipeline adds adapter/tag sequences to the primers in order for the sequencing machine to be able to sequence the amplicon. This adapter/tag sequence is often the same for all primers. The secondary structure of a designed primer may change significantly after adding such adaptor/tag sequence. Currently there are a few programs capable of analyzing secondary structures for nucleic acids. For example, RNAstructure (for both DNA and RNA) has been developed by the University of Rochester that can be used to predict the most likely binding structure a sequence or pair sequences can make.
[0044] In some embodiments, the Adaptor Verification Module of the Primer Designing Pipeline can automate the RNAstructure program through the command line and enable prediction whether after adding an adapter sequence, a primer set is still a scientifically good choice. In some embodiment, the Adaptor Verification Module comprises an internal scoring system for classifying primers based on the predicted secondary structure. Other programs can be used for the Adaptor Verification Module provided including Mfold (as disclosed in M. Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-3415, 2003) and UNAFold (as disclosed in N. R. Markham & M. Zuker. UNAFold: Software for Nucleic Acid Folding and Hybridization. In Data, Sequence Analysis, and Evolution, J. Keith, ed., Bioinformatics: Volume 2, Chapter 1, pp 3-31, Humana Press Inc., 2008), the content of both are hereby incorporated by reference in their entireties.
[0045] In addition, simulation of amplification and/or sequencing can be performed. Several programs have been disclosed to simulate the entire sequencing process, for example in silico PCR amplification for amplification, MetaSim for simulating sequencing, and CAP3 for assembly. The Primer Designing Pipeline can integrate the ability to simulate the entire sequencing process using a series of simulators.
Claims
A computerized system for primer/amplicon design for sequencing, comprising,
(a) an input device and an output device/interface;
(b) an analysis system interface coupled to memory of a computer;
(c) an operating system comprising a database;
(d) a load sequence module for loading nucleic acid sequences; and
(e) a primer design module for design primer pairs.
The computerized system of claim 1, further comprising at least one of format output module, BLAST verification module, and adaptor verification module.
The computerized system of claim 1, wherein the input device is selected from the group consisting of automated sequencer, sequencing data input device, and sequencing data storage device.
The computerized system of claim 1, wherein the output interface comprises interface for WebGBrowse or GenomeBrowser.
The computerized system of claim 1, wherein the database contains information selected from the group consisting of genomic sequences, previously generated primers, and sequences for BLAST analysis.
The computerized system of claim 1, wherein the load sequence module processes sequences in FASTA format.
The computerized system of claim 1, wherein the load sequence module uses random file access.
The computerized system of claim 1, wherein the primer design module performs at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse- translated amino acid sequence; and (6) selection from multiple primer sets.
9. The computerized system of claim 1, wherein the primer design module processes primer design in parallel.
10. The computerized system of claim 1, wherein the primer design module generates primers at a speed greater than 10 primers per minute.
11. The computerized system of claim 10, wherein the primers constitute overlapping amplicons for sequencing and assembly.
12. A method for use in a computerized system for primer/amplicon design for
sequencing, comprising,
(a) upload sequence data using a load sequence module;
(b) designing multiple primers in parallel using a primer design module; and
(c) outputting primer design through an output interface.
13. The method of claim 12, further comprising pre-processing sequences by modifying sequences before primer design.
14. The method of claim 12, further comprising defining target sequences.
15. The method of claim 12, further comprising defining windows for primer design.
16. The method of claim 12, wherein the computerized system comprises a system of claim 1.
17. The method of claim 12, wherein the sequence data is larger than 100 kb.
18. The method of claim 12, wherein the load sequence module processes sequences in FASTA format.
19. The method of claim 12, wherein the load sequence module uses random file access.
20. The method of claim 12, further comprising at least one of (1) automatically adding standard 5' tag or tail to each primer; (2) selecting nested primer pairs; (3) selecting primers for multiplexed amplifications; (4) designing a tiling of amplicons across a sequence; (5) picking primers from a reverse-translated amino acid sequence; and (6) selection from multiple primer sets.
21. The method of claim 12, wherein the method provides primers at a speed greater than 10 primers per minute.
22. The method of claim 21, wherein the primers constitute overlapping amplicons for sequence assembly.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261607630P | 2012-03-07 | 2012-03-07 | |
US61/607,630 | 2012-03-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013134341A1 true WO2013134341A1 (en) | 2013-09-12 |
Family
ID=48045026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2013/029268 WO2013134341A1 (en) | 2012-03-07 | 2013-03-06 | Primer designing pipeline for targeted sequencing |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2013134341A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105087771A (en) * | 2014-05-06 | 2015-11-25 | 金唯智生物科技有限责任公司 | Methods and kits for identifying microorganisms in a sample |
WO2017218938A1 (en) | 2016-06-16 | 2017-12-21 | Life Technologies Corporation | Novel compositions, methods and kits for microorganism detection |
WO2019094973A1 (en) | 2017-11-13 | 2019-05-16 | Life Technologies Corporation | Compositions, methods and kits for urinary tract microorganism detection |
CN110491448A (en) * | 2019-07-15 | 2019-11-22 | 广州奇辉生物科技有限公司 | A kind of method, system, platform and storage medium handling PCR primer |
US10793897B2 (en) | 2017-02-08 | 2020-10-06 | Microsoft Technology Licensing, Llc | Primer and payload design for retrieval of stored polynucleotides |
US11783918B2 (en) | 2016-11-30 | 2023-10-10 | Microsoft Technology Licensing, Llc | DNA random access storage system via ligation |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5512458A (en) | 1994-02-25 | 1996-04-30 | W. R. Grace & Co.-Conn. | Method of using mobile priming sites for DNA sequencing |
US5556749A (en) | 1992-11-12 | 1996-09-17 | Hitachi Chemical Research Center, Inc. | Oligoprobe designstation: a computerized method for designing optimal DNA probes |
EP1136932A1 (en) | 2000-03-20 | 2001-09-26 | Hitachi, Ltd. | Primer design system |
JP2003079366A (en) | 2001-09-11 | 2003-03-18 | Hitachi Ltd | Information processing system for assisting primer walking |
US20030108919A1 (en) | 2001-09-05 | 2003-06-12 | Perlegen Sciences, Inc. | Methods for amplification of nucleic acids |
US20030215834A1 (en) | 2002-05-15 | 2003-11-20 | Fujitsu Limited | Method of ordering synthesis of primer used for gene amplification, program therefor and recording medium of the program |
US20040012633A1 (en) | 2002-04-26 | 2004-01-22 | Affymetrix, Inc., A Corporation Organized Under The Laws Of Delaware | System, method, and computer program product for dynamic display, and analysis of biological sequence data |
US20050032074A1 (en) | 2002-09-09 | 2005-02-10 | Affymetrix, Inc. | Custom design method for resequencing arrays |
US6928368B1 (en) | 1999-10-26 | 2005-08-09 | The Board Regents, The University Of Texas System | Gene mining system and method |
JP2005301532A (en) | 2004-04-09 | 2005-10-27 | Hitachi High-Technologies Corp | Primer design apparatus and program |
US20060281105A1 (en) | 2002-10-07 | 2006-12-14 | Honghua Li | High throughput multiplex DNA sequence amplifications |
WO2009063270A1 (en) | 2007-11-12 | 2009-05-22 | ISTITUTO TUMORI 'Giovanni Paolo II' IRCCS - Laboratorio di Oncologia Sperimentale Clinica | Method for the design and engineering of oligonucleotides |
US7565248B2 (en) | 2000-10-04 | 2009-07-21 | Celadon Laboratories, Inc. | Computer system for designing oligonucleotides used in biochemical methods |
JP2009268360A (en) | 2008-04-30 | 2009-11-19 | Yamaguchi Univ | Primer for producing fused dna fragment and method for producing fused dna fragment using the same |
WO2009152336A1 (en) | 2008-06-13 | 2009-12-17 | Codexis, Inc. | Method of synthesizing polynucleotide variants |
US20100070452A1 (en) | 2006-07-04 | 2010-03-18 | Yusuke Nakamura | Device for designing nucleic acid amplification primer, program for designing primer and server device for designing primer |
US7698069B2 (en) | 2004-09-01 | 2010-04-13 | Hitachi Software Engineering Co., Ltd. | Method for designing primer for realtime PCR |
US20100184067A1 (en) | 2009-01-20 | 2010-07-22 | Sony Corporation | Primer evaluation method, primer evaluation program, and real-time polymerase chain reaction apparatus |
WO2010113789A1 (en) | 2009-04-01 | 2010-10-07 | Necソフト株式会社 | Method for designing primer for selex method, method for producing primer, method for producing aptamer, device for designing primer, and computer program and recording medium for designing primer |
JP2011004621A (en) | 2009-06-23 | 2011-01-13 | Toyohashi Univ Of Technology | Probe, probe design device, and probe design program |
JP2011062085A (en) | 2009-09-15 | 2011-03-31 | National Institute Of Advanced Industrial Science & Technology | Apparatus for searching primer set, method and program for searching primer set |
WO2011053241A1 (en) | 2009-10-29 | 2011-05-05 | Jonas Blomberg | Multiplex detection |
US8014955B2 (en) | 2005-06-27 | 2011-09-06 | George Mason Intellectual Properties, Inc. | Method of identifying unique target sequence |
-
2013
- 2013-03-06 WO PCT/US2013/029268 patent/WO2013134341A1/en active Application Filing
Patent Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5556749A (en) | 1992-11-12 | 1996-09-17 | Hitachi Chemical Research Center, Inc. | Oligoprobe designstation: a computerized method for designing optimal DNA probes |
US5512458A (en) | 1994-02-25 | 1996-04-30 | W. R. Grace & Co.-Conn. | Method of using mobile priming sites for DNA sequencing |
US6928368B1 (en) | 1999-10-26 | 2005-08-09 | The Board Regents, The University Of Texas System | Gene mining system and method |
EP1136932A1 (en) | 2000-03-20 | 2001-09-26 | Hitachi, Ltd. | Primer design system |
US7565248B2 (en) | 2000-10-04 | 2009-07-21 | Celadon Laboratories, Inc. | Computer system for designing oligonucleotides used in biochemical methods |
US20030108919A1 (en) | 2001-09-05 | 2003-06-12 | Perlegen Sciences, Inc. | Methods for amplification of nucleic acids |
JP2003079366A (en) | 2001-09-11 | 2003-03-18 | Hitachi Ltd | Information processing system for assisting primer walking |
US20040012633A1 (en) | 2002-04-26 | 2004-01-22 | Affymetrix, Inc., A Corporation Organized Under The Laws Of Delaware | System, method, and computer program product for dynamic display, and analysis of biological sequence data |
US20070032963A1 (en) | 2002-05-15 | 2007-02-08 | Fujitsu Limited | Method of ordering synthesis of primer used for gene amplification, program therefor and recording medium of the program |
US20030215834A1 (en) | 2002-05-15 | 2003-11-20 | Fujitsu Limited | Method of ordering synthesis of primer used for gene amplification, program therefor and recording medium of the program |
US20050032074A1 (en) | 2002-09-09 | 2005-02-10 | Affymetrix, Inc. | Custom design method for resequencing arrays |
US20060281105A1 (en) | 2002-10-07 | 2006-12-14 | Honghua Li | High throughput multiplex DNA sequence amplifications |
JP2005301532A (en) | 2004-04-09 | 2005-10-27 | Hitachi High-Technologies Corp | Primer design apparatus and program |
US7698069B2 (en) | 2004-09-01 | 2010-04-13 | Hitachi Software Engineering Co., Ltd. | Method for designing primer for realtime PCR |
US8014955B2 (en) | 2005-06-27 | 2011-09-06 | George Mason Intellectual Properties, Inc. | Method of identifying unique target sequence |
US20100070452A1 (en) | 2006-07-04 | 2010-03-18 | Yusuke Nakamura | Device for designing nucleic acid amplification primer, program for designing primer and server device for designing primer |
WO2009063270A1 (en) | 2007-11-12 | 2009-05-22 | ISTITUTO TUMORI 'Giovanni Paolo II' IRCCS - Laboratorio di Oncologia Sperimentale Clinica | Method for the design and engineering of oligonucleotides |
JP2009268360A (en) | 2008-04-30 | 2009-11-19 | Yamaguchi Univ | Primer for producing fused dna fragment and method for producing fused dna fragment using the same |
WO2009152336A1 (en) | 2008-06-13 | 2009-12-17 | Codexis, Inc. | Method of synthesizing polynucleotide variants |
US20100184067A1 (en) | 2009-01-20 | 2010-07-22 | Sony Corporation | Primer evaluation method, primer evaluation program, and real-time polymerase chain reaction apparatus |
WO2010113789A1 (en) | 2009-04-01 | 2010-10-07 | Necソフト株式会社 | Method for designing primer for selex method, method for producing primer, method for producing aptamer, device for designing primer, and computer program and recording medium for designing primer |
JP2011004621A (en) | 2009-06-23 | 2011-01-13 | Toyohashi Univ Of Technology | Probe, probe design device, and probe design program |
JP2011062085A (en) | 2009-09-15 | 2011-03-31 | National Institute Of Advanced Industrial Science & Technology | Apparatus for searching primer set, method and program for searching primer set |
WO2011053241A1 (en) | 2009-10-29 | 2011-05-05 | Jonas Blomberg | Multiplex detection |
Non-Patent Citations (11)
Title |
---|
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410 |
BROWN ANDREW MK ET AL: "Optimus Primer: A PCR enrichment primer design program for next-generation sequencing of human exonic regions", BMC RESEARCH NOTES, BIOMED CENTRAL LTD, GB, vol. 3, no. 1, 7 July 2010 (2010-07-07), pages 185, XP021083073, ISSN: 1756-0500, DOI: 10.1186/1756-0500-3-185 * |
GARIMA KUSHWAHA ET AL: "PRIMEGENSw3: A Web-Based Tool for High-Throughput Primer and Probe Design", BIOINFORMATICS AND BIOMEDICINE (BIBM), 2011 IEEE INTERNATIONAL CONFERENCE ON, IEEE, 12 November 2011 (2011-11-12), pages 345 - 351, XP032087106, ISBN: 978-1-4577-1799-4, DOI: 10.1109/BIBM.2011.43 * |
KADERALI L ET AL: "Primer-design for multiplexed genotyping", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 31, no. 6, 15 March 2003 (2003-03-15), pages 1796 - 1802, XP002996256, ISSN: 0305-1048, DOI: 10.1093/NAR/GKG267 * |
LI KELVIN ET AL: "Novel computational methods for increasing PCR primer design effectiveness in directed sequencing", BMC BIOINFORMATICS, BIOMED CENTRAL, LONDON, GB, vol. 9, no. 1, 11 April 2008 (2008-04-11), pages 191, XP021031763, ISSN: 1471-2105 * |
M. ZUKER: "Mfold web server for nucleic acid folding and hybridization prediction", NUCLEIC ACIDS RES., vol. 31, no. 13, 2003, pages 3406 - 3415, XP002460708, DOI: doi:10.1093/nar/gkg595 |
N. R. MARKHAM; M. ZUKER: "Data, Sequence Analysis, and Evolution, J. Keith, ed., Bioinformatics", vol. 2, 2008, HUMANA PRESS INC., article "UNAFoId: Software for Nucleic Acid Folding and Hybridization", pages: 3 - 31 |
PEARSON; LIPMAN: "Improved tolls for biological sequence comparison", PROC. NATL. ACAD. SCI. USA, vol. 85, 1988, pages 2444 - 2448 |
RAM PODICHETI; RAJESH GOLLAPUDI; QUNFENG DONG: "WebGBrowse - a web server for GBrowse", BIOINFORMATICS, vol. 25, no. 12, 2009, pages 1550 - 1551 |
SIMMLER H ET AL: "Real-time primer design for DNA chips", PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2003. PROCEEDINGS. INTE RNATIONAL APRIL 22-26, 2003, PISCATAWAY, NJ, USA,IEEE, 22 April 2003 (2003-04-22), pages 153 - 160, XP010645717, ISBN: 978-0-7695-1926-5 * |
STEVE ROZEN; HELEN J.; SKALETSKY: "Bioinformatics Methods and Protocols: Methods in Molecular Biology", 2000, HUMANA PRESS, article "Primer3 on the WWW for general users and for biologist programmers", pages: 365 - 386 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105087771A (en) * | 2014-05-06 | 2015-11-25 | 金唯智生物科技有限责任公司 | Methods and kits for identifying microorganisms in a sample |
WO2017218938A1 (en) | 2016-06-16 | 2017-12-21 | Life Technologies Corporation | Novel compositions, methods and kits for microorganism detection |
US11783918B2 (en) | 2016-11-30 | 2023-10-10 | Microsoft Technology Licensing, Llc | DNA random access storage system via ligation |
US10793897B2 (en) | 2017-02-08 | 2020-10-06 | Microsoft Technology Licensing, Llc | Primer and payload design for retrieval of stored polynucleotides |
WO2019094973A1 (en) | 2017-11-13 | 2019-05-16 | Life Technologies Corporation | Compositions, methods and kits for urinary tract microorganism detection |
CN110491448A (en) * | 2019-07-15 | 2019-11-22 | 广州奇辉生物科技有限公司 | A kind of method, system, platform and storage medium handling PCR primer |
CN110491448B (en) * | 2019-07-15 | 2023-02-07 | 广州奇辉生物科技有限公司 | Method, system, platform and storage medium for processing PCR primers |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Campbell et al. | MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations | |
Venturini et al. | Leveraging multiple transcriptome assembly methods for improved gene structure annotation | |
WO2013134341A1 (en) | Primer designing pipeline for targeted sequencing | |
Griffin et al. | Prediction of RNA secondary structure by energy minimization | |
Rother et al. | ModeRNA: a tool for comparative modeling of RNA 3D structure | |
Sallet et al. | EuGene: an automated integrative gene finder for eukaryotes and prokaryotes | |
Yue et al. | Long-read sequencing data analysis for yeasts | |
Meysman et al. | Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli | |
CN103797486A (en) | Method for assembly of nucleic acid sequence data | |
JP2015509623A (en) | DNA sequence data analysis | |
KR100681795B1 (en) | A protocol for genome sequence alignment on grid environment | |
EP3291114B1 (en) | Genome analysis device and genome visualization method | |
Rother et al. | RNA tertiary structure prediction with ModeRNA | |
Bi et al. | Bipartite pattern discovery by entropy minimization-based multiple local alignment | |
Biswas et al. | ISQuest: finding insertion sequences in prokaryotic sequence fragment data | |
EP1608786B1 (en) | Genomic profiling of regulatory factor binding sites | |
Contreras-Moreira et al. | RSAT:: Plants: motif discovery within clusters of upstream sequences in plant genomes | |
US20080274558A1 (en) | Method for identifying and selecting low copy nucleic segments | |
Lopes et al. | ProGeRF: proteome and genome repeat finder utilizing a fast parallel hash function | |
Sweeney et al. | R2DT: computational framework for template-based RNA secondary structure visualisation across non-coding RNA types | |
Długosz et al. | Improvements in DNA reads correction | |
Sinha | PhyME: a software tool for finding motifs in sets of orthologous sequences | |
Thangadurai et al. | Bioinformatics tools for the multilocus phylogenetic analysis of fungi | |
Fortmann-Grote et al. | RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes | |
Farrell | smallrnaseq: short non coding RNA-seq analysis with Python |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13713599 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13713599 Country of ref document: EP Kind code of ref document: A1 |