WO2002081490A2 - Computer-directed assembly of a polynucleotide encoding a target polypeptide - Google Patents

Computer-directed assembly of a polynucleotide encoding a target polypeptide Download PDF

Info

Publication number
WO2002081490A2
WO2002081490A2 PCT/US2002/001649 US0201649W WO02081490A2 WO 2002081490 A2 WO2002081490 A2 WO 2002081490A2 US 0201649 W US0201649 W US 0201649W WO 02081490 A2 WO02081490 A2 WO 02081490A2
Authority
WO
WIPO (PCT)
Prior art keywords
polynucleotide
initiating
sequence
overhang
oligonucleotide
Prior art date
Application number
PCT/US2002/001649
Other languages
French (fr)
Other versions
WO2002081490A3 (en
WO2002081490A8 (en
Inventor
Glen A. Evans
Original Assignee
Egea Biosciences, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Egea Biosciences, Inc. filed Critical Egea Biosciences, Inc.
Priority to US10/250,894 priority Critical patent/US20040241650A1/en
Priority to JP2002579476A priority patent/JP2004533228A/en
Priority to KR1020037009589A priority patent/KR100860291B1/en
Priority to EP02739079A priority patent/EP1385950B1/en
Priority to DK02739079T priority patent/DK1385950T3/en
Priority to CA002433463A priority patent/CA2433463A1/en
Priority to MXPA03006344A priority patent/MXPA03006344A/en
Priority to DE60227361T priority patent/DE60227361D1/en
Publication of WO2002081490A2 publication Critical patent/WO2002081490A2/en
Publication of WO2002081490A3 publication Critical patent/WO2002081490A3/en
Publication of WO2002081490A8 publication Critical patent/WO2002081490A8/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00603Making arrays on substantially continuous surfaces
    • B01J2219/00659Two-dimensional arrays
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/0068Means for controlling the apparatus of the process
    • B01J2219/00686Automatic
    • B01J2219/00689Automatic using computers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/0068Means for controlling the apparatus of the process
    • B01J2219/00695Synthesis control routines, e.g. using computer programs
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/0068Means for controlling the apparatus of the process
    • B01J2219/007Simulation or vitual synthesis
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00718Type of compounds synthesised
    • B01J2219/0072Organic compounds
    • B01J2219/00722Nucleotides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07BGENERAL METHODS OF ORGANIC CHEMISTRY; APPARATUS THEREFOR
    • C07B2200/00Indexing scheme relating to specific properties of organic compounds
    • C07B2200/11Compounds covalently bound to a solid support
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof

Definitions

  • the present invention relates generally to the area of bioinformatics and more specifically to methods, algorithms and apparatus for computer directed polynucleotide assembly.
  • the invention further relates to the production of polypeptides encoded by polynucleotides assembled by the invention.
  • Enzymes, antibodies, receptors and ligands are polypeptides that have evolved by selective pressure to perform very specific biological functions within the milieu of a living organism.
  • the use of a polypeptide for specific technological applications may require the polypeptide to function in environments or on substrates for which it was not evolutionarily selected.
  • Polypeptides isolated from microorganisms that thrive in extreme environments provide ample evidence that these molecules are, in general, malleable with regard to structure and function.
  • the process for isolating a polypeptide from its native environment is expensive and time consuming.
  • new methods for synthetically evolving genetic material encoding a polypeptide possessing a desired activity are needed.
  • oligonucleotides covering the entire sequence to be synthesized are first allowed to anneal, and then the nicks are repaired with ligase. The fragment is then cloned directly, or cloned after amplification by the polymerase chain reaction (PCR) . The polynucleotide is subsequently used for in vitro assembly into longer sequences.
  • the second general method for gene synthesis utilizes polymerase to fill in single-stranded gaps in the annealed pairs of oligonucleotides.
  • the present invention addresses the limitations in present recombinant nucleic acid manipulations by providing a fast, efficient means for generating a nucleic acid sequence, including entire genes, chromosomal segments, chromosomes and genomes . Because this approach is based on a completely synthetic approach, there are no limitations, such as the availability of existing nucleic acids, to hinder the construction of even very large segments of nucleic acid.
  • the invention provides a method of synthesizing a target polynucleotide sequence including; a) providing a target polynucleotide sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide which includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang and a 3' overhang; c) identifying a second polynucleotide present in the target polynucleotide which is contiguous with the initiating polynucleotide and includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5 * " overhang, a 3' overhang
  • the invention further provides a method of synthesizing a target polynucleotide including: a) providing a target polynucleotide sequence derived from a model sequence; b) identifying at least one initiating polynucleotide sequence present in the target polynucleotide sequence of a) , wherein the initiating polynucleotide including: 1) a first plus strand oligonucleotide; 2) a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide; and 3) a minus strand oligonucleotide including a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide; c) annealing the first plus strand oligonucleotide and the second plus strand oligonucleotide to the
  • the invention provides a method a method for synthesizing a target polynucleotide, including; a) providing a target polynucleotide sequence derived from a model sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide which includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide; c) contacting the initiating polynucleotide under conditions suitable for primer annealing with a first oligonucleotide having partial complementarity to the 3 ' portion of the plus strand of the initiating polynucleotide, and a second oligonucleotide having partial complementarity to the 3' portion of the minus strand of the initiating polynucleotide; d) catalyzing under conditions suitable for primer extension: 1) polynucleotide synthesis from the 3'- hydroxyl
  • the invention further provides a method for isolating a target polypeptide encoded by a target polynucleotide generated by a method of the invention by; a) incorporating the target polynucleotide in an expression vector; b) introducing the expression vector into a suitable host cell; c) culturing the cell under conditions and for such time as to promote the expression of the target polypeptide encoded by the target polynucleotide; and d) isolating the target polypeptide.
  • the invention further provides a method of synthesizing a target polynucleotide including; a) providing a target polynucleotide sequence derived from a model sequence; b) chemically synthesizing a plurality of single-stranded oligonucleotides each of which is partially complementary to at least one oligonucleotide present in the plurality, where the sequence of the plurality of oligonucleotides is a contiguous sequence of the target polynucleotide; c) contacting the partially complementary oligonucleotides under conditions and for such time suitable for annealing, the contacting resulting in a plurality of partially double- stranded polynucleotides, where each double-stranded polynucleotide includes a 5' overhang and a 3' overhang; d) identifying at least one initiating polynucleotide derived from the model sequence present in the plurality of double- stranded polynu
  • the invention further provides a computer program, stored on a computer-readable medium, for generating a target polynucleotide sequence derived from a model sequence, the computer program comprising instructions for causing a computer system to: a) identify an initiating polynucleotide sequence contained in the target polynucleotide sequence; b) parse the target polynucleotide sequence into multiply distinct, partially complementary, oligonucleotides; c) control assembly of the target polynucleotide sequence by controlling the bi-directional extension of the initiating polynucleotide sequence by the sequential addition of partially complementary oligonucleotides resulting in a contiguous double-stranded polynucleotide .
  • the invention further provides a method for automated synthesis of a target polynucleotide sequence, including: a) providing the user with an opportunity to communicate a desired target polynucleotide sequence; b) allowing the user to transmit the desired target polynucleotide sequence to a server; c) providing the user with a unique designation; d) obtaining the transmitted target polynucleotide sequence provided by the user.
  • the invention further provides a method for automated synthesis of a polynucleotide sequence, including: a) providing a user with a mechanism for communicating a model polynucleotide sequence; b) optionally providing the user with an opportunity to communicate at least one desired modification to the model sequence if desired; c) allowing the user to transmit the model sequence and desired modification to a server; d) providing user with a unique designation; e) obtaining the transmitted model sequence and optional desired modification provided by the user; f) inputting into a programmed computer, through an input device, data including at least a portion of the model polynucleotide sequence; g) determining, using the processor, the sequence of the model polynucleotide sequence containing the desired modification; h) further determining, using the processor, at least one initiating polynucleotide sequence present in the model polynucleotide sequence; i) selecting, using the processor, a model for synthesizing the modified model polynucleotide
  • Figure 1 depicts 96 well plates for of F (i.e.,
  • Figure 2 depicts the oligonucleotide pooling plan where F oligonucleotides and R oligonucleotides are annealed to form a contiguous polynucleotide.
  • Figure 3 depicts the schematic of assembly of a target polynucleotide sequence defining a gene, genome, set of genes or polypeptide sequence.
  • the sequence is designed by computer and used to generate a set of parsed oligonucleotide fragments covering the + and - strand of a target polynucleotide sequence encoding a target polypeptide .
  • Figure 4 depicts a schematic of the polynuceotide synthesis modules.
  • a nanodispensing head with a plurality of valves will deposit synthesis chemicals in assembly vessels. Chemical distribution from the reagent reservoir can be controlled using a syringe pump. Underlying the reaction chambers is a set of assembly vessels linked to microchannels that will move fluids by microfluidics.
  • Figure 5 depicts that oligonucleotide synthesis, oligonucleotide assembly by pooling and annealing, and ligation can be accomplished using microfluidic mixing.
  • Figure 6 depicts the sequential pooling of oligonucleotides synthesized in arrays.
  • Figure 7 depicts the pooling stage of the oligonucleotide components through the manifold assemblies resulting in the complete assembly of all oligonucleotides from the array.
  • Figure 8 depicts an example of an assembly module comprising a complete set of pooling manifolds produced using microfabrication in a single unit.
  • Various configurations of the pooling manifold will allow assembly of increased numbers of well arrays of parsed component oligonucleotides .
  • Figure 9 depicts the configuration for the assembly of oligonucleotides synthesized in a pre-defined array. Passage through the assembly device in the presence of DNA ligase and other appropriate buffer and chemical components will facilitate double stranded polynucleotide assembly.
  • Figure 10 depicts an example of the pooling device design. Microgrooves or microfluidic channels are etched into the surface of the pooling device. The device provides a microreaction vessel at the junction of two channels for 1) mixing of the two streams, 2) controlled temperature maintenance or cycling a the site of the junction and 3) expulsion of the ligated mixture from the exit channel into the next set of pooling and ligation chambers.
  • Figure 11 depicts the design of a polynucleotide synthesis platform comprising microwell plates addressed with a plurality of channels for microdispensing.
  • Figure 12 depicts an example of a high capacity polynucleotide synthesis platform using high density microwell microplates capable of synthesizing in excess of 1536 component oligonucleotides per plate.
  • Figure 13 depicts a polynucleotide assembly format using surface-bound oligonucleotide synthesis rather than soluble synthesis.
  • oligonucleotides are synthesized with a linker that allows attachment to a solid support .
  • Figure 14 depicts a diagram of systematic polynucleotide assembly on a solid support.
  • a set of parsed component oligonucleotides are arranged in an array with a stabilizer oligonucletoide attached.
  • a set of ligation substrate oligonucleotides are placed in the solution and systematic assembly is carried out in the solid phase by sequential annealing, ligation and melting.
  • Figure 15 depicts polynucleotide assembly using component oligonucleotides bound to a set of metal electrodes on a microelectronic chip. Each electrode can be controlled independently with respect to current and voltage.
  • Figure 16 depicts generally a primer extension assembly method of the invention.
  • Figure 17 provides a system diagram of the invention.
  • Figure 18 depicts a perspective view of an instrument of the invention.
  • the complete sequence of complex genomes make large scale functional approaches to genetics possible.
  • the present invention outlines a novel approach to utilizing the results of genomic sequence information by computer-directed polynucleotide assembly based upon information available in databases such as the human genome database.
  • the present invention may be used to synthesize, assemble and select a novel, synthetic target polynucleotide sequence encoding a target polypeptide.
  • the target polynucleotide may encode a target polypeptide that exhibits enhanced or altered biological activity as compared to a model polypeptide encoded by a natural (wild-type) or model polynucleotide sequence.
  • standard assays may be used to survey the activity of an expressed target polypeptide.
  • the expressed target polypeptide can be assayed to determine its ability to carry out the function of the corresponding model polypeptide or to determine whether a target polypeptide exhibiting a new function has been produced.
  • the present invention provides a means for the synthetically evolving a model polypeptide by synthesizing, in a computer-directed fashion, polynucleotides encoding a target polypeptide derived from a model polypeptide.
  • the invention provides a method of synthesizing a target polynucleotide by providing a target polynucleotide sequence and identifying at least one initiating polynucleotide present in the target polynucleotide which includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang and a 3' overhang.
  • a "target polynucleotide sequence" includes any nucleic acid sequence suitable for encoding a target polypeptide that can be synthesized by a method of the invention.
  • a target polynucleotide sequence can be used to generate a target polynucleotide using an apparatus capable of assembling nucleic sequences.
  • a target polynucleotide sequence is a linear segment of DNA having a double-stranded region; the segment may be of any length sufficiently long to be created by the hybridization of at least two oligonucleotides have complementary regions. It is contemplated that a target polynucleotide can be 100, 200, 300, 400, 800, 100, 1500, 200, 4000, 8000, 10000, 12000, 18,000, 20,000, 40,000, 80,000 or more base pairs in length.
  • the methods of the present invention will be able to create entire artificial genomes of lengths comparable to known bacterial, yeast, viral, mammalian, amphibian, reptilian, or avian genomes.
  • the target polynucleotide is a gene encoding a polypeptide of interest .
  • the target polynucleotide may further include non-coding elements such as origins of replication, telomeres, promoters, enhancers, transcription and translation start and stop signals, introns, exon splice sites, chromatin scaffold components and other regulatory sequences.
  • the target polynucleotide may comprises multiple genes, chromosomal segments, chromosomes and even entire genomes.
  • a polynucleotide of the invention may be derived from prokaryotic or eukaryotic sequences including bacterial, yeast, viral, mammalian, amphibian, reptilian, avian, plants, archebacteria and other
  • oligonucleotide is defined as a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, preferably more than three. Its exact size will depend on many factors, such as the reaction temperature, salt concentration, the presence of denaturants such as formamide, and the degree of complementarity with the sequence to which the oligonucleotide is intended to hybridize .
  • nucleotide can refer to nucleotides present in either DNA or RNA and thus includes nucleotides which incorporate adenine, cytosine, guanine, thymine and uracil as base, the sugar moiety being deoxyribose or ribose .
  • modified bases capable of base pairing with one of the conventional bases, adenine, cytosine, guanine, thymine and uracil, may be used in an oligonucleotide employed in the present invention.
  • modified bases include for example 8- azaguanine and hypoxanthine .
  • the nucleotides may carry a label or marker so that on incorporation into a primer extension product, they augment the signal associated with the primer extension product, for example for capture on to solid phase. '
  • a "plus strand” oligonucleotide by convention, includes a short, single-stranded DNA segment that starts with the 5 ' end to the left as one reads the sequence .
  • a "minus strand” oligonucleotide includes a short, single- stranded DNA segment that starts with the 3 ' end to the left as one reads the sequence.
  • Solid-phase synthesis techniques have been provided for the synthesis of several peptide sequences on, for example, a number of "pins" (See e.g., Geysen et al., J. Immun. Meth. (1987) 102:259-274, incorporated herein by reference in its entirety) .
  • an “initiating polynucleotide sequence,” as used herein, is a sequence contained in a target polynucleotide sequence and identified by an algorithm of the invention.
  • An “initiating polynucleotide” is the physical embodiment of an initiating polynucleotide sequence. For ligation assembly of a target polynucleotide, an initiating polynucleotide begins assembly by providing an anchor for hybridization of subsequent polynucleotides contiguous with the initiating polynucleotide.
  • an initiating polynucleotide is partially double-stranded nucleic acid thereby providing single-stranded overhang (s) for annealing of a contiguous, double-stranded nucleic acid molecule.
  • an initiating polynucleotide begins assembly by providing a template for hybridization of subsequent oligonucleotides contiguous with the initiating polynucleotide.
  • an initiating polynucleotide can be partiallydouble-stranded or fully double-stranded.
  • an initiating polynucleotide of the invention can be bound to a solid support for improved efficiency.
  • the solid phase allows for the efficient separation of the assembled target polynucleotide from other components of the reaction.
  • Different supports can be applied in the method.
  • supports can be magnetic latex beads or magnetic control pore glass beads that allows the desirable product from the reaction mixture to be magnetically separated. Binding the initiating polynucleotide to such beads can be accomplished by a variety of known methods, for example carbodiimide treatment (Gilham, Biochemistry 7:2809-2813 (1968); Mizutani and Tachbana, J. Chromatography 356:202-205 (1986); Wolf et al . , Nucleic Acids Res. 15:2911-2926 (1987); Musso, Nucleic Acids Res. 15:5353-5372 (1987); Lund et al . , Nucleic Acids Res. 16:10861-10880 (1988) ) .
  • the initiating polynucleotide attached to the solid phase can act as an anchor for the continued synthesis of the target polynucleotide.
  • Assembly can be accomplished by addition of contiguous polynucleotides together with ligase for ligation assembly or by addition of oligonucleotides together with polymerase for primer extension assembly. After the appropriate incubation time, unbound components of the method can be washed out and the reaction can be repeated again to improve the efficiency of template utilization. Alternatively, another set of polynucleotides or oligonucleotides can be added to continue the assembly.
  • Solid phase to be efficiently used for the synthesis, can contain pores with sufficient room for synthesis of the long nucleic acid molecules.
  • the solid phase can be composed of material that cannot non-specifically bind any undesired components of the reaction.
  • One way to solve the problem is to use control pore glass beads appropriate for long DNA molecules.
  • the initiating polynucleotide can be attached to the beads through a long connector. The role of the connector is to position the initiating polynucleotide from the surface of the solid support at a desirable distance.
  • the method of the invention further includes identifying a second polynucleotide sequence present in the target polynucleotide which is contiguous with the initiating polynucleotide and includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, where at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating polynucleotide.
  • oligonucleotides having complementary regions will "anneal" (i.e., base pair) under the appropriate conditions, thereby producing a double-stranded region.
  • oligonucleotides In order to anneal (i.e., hybridize), oligonucleotides must be at least partially complementary.
  • complementary to is used herein in relation to nucleotides to mean a nucleotide that will base pair with another specific nucleotide.
  • adenosine triphosphate is complementary to uridine triphosphate or t ymidine triphosphate and guanosine triphosphate is complementary to cytidine triphosphate.
  • a 5' or 3' "overhang” means a region on the 5' or 3', or 5' and 3', end of a polynucleotide that is single-stranded, i.e. not base paired.
  • An overhang provides a means for the subsequent annealing of a contiguous polynucleotide containing an overhang that is complementary to the overhang of the contiguous polynucleotide.
  • relatively stringent conditions For applications requiring high selectivity, one typically will desire to employ relatively stringent conditions to form the hybrids, e.g., one will select relatively low salt and/or high temperature conditions, such as provided by about 0.02 M to about 0. 10 M NaCl at temperatures of about 50°C to about 70°C. Such high stringency conditions tolerate little, if any, mismatch between the oligonucleotide and the template or target strand. It generally is appreciated that conditions can be rendered more stringent by the addition of increasing amounts of formamide .
  • lower stringency conditions may be used. Under these conditions, hybridization may occur even though the sequences of probe and target strand are not perfectly complementary, but are mismatched at one or more positions. Conditions may be rendered less stringent by increasing salt concentration and decreasing temperature. For example, a medium stringency condition could be provided by about 0.1 to 0.25 M NaCl at temperatures of about 37°C to about 55°C, while a low stringency condition could be provided by about 0. 15 M to about 0.9 M salt, at temperatures ranging from about 20°C to about 55°C. Thus, hybridization conditions can be readily manipulated depending on the desired results.
  • oligonucleotides it will be advantageous to determine the hybridization of oligonucleotides by employing a label.
  • appropriate labels are known in the art, including fluorescent, radioactive, enzymatic or other ligands, such as avidin/biotin, which are capable of being detected.
  • enzyme tags colorimetric indicator substrates are known that can be employed to provide a means for detection visible to the human eye or spectrophotometrically to identify whether specific hybridization with complementary oligonucleotide has occurred.
  • At least one oligonucleotide of an initiating polynucleotide is adsorbed or otherwise affixed to a selected matrix or surface.
  • This fixed, single-stranded nucleic acid is then subjected to hybridization with the complementary oligonucleotides under desired conditions.
  • the selected conditions will also depend on the particular circumstances based on the particular criteria required (depending, for example, on the G+C content, type of target nucleic acid, source of nucleic acid, size of hybridization probe, etc.) .
  • the hybridization may be detected, or even quantified, by means of the label.
  • the method of the invention further provides a third polynucleotide present in the target polynucleotide which is contiguous with the initiating sequence and provides a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, where at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating polynucleotide which is not complementary to an overhang of the second polynucleotide.
  • the method further provides contacting the initiating polynucleotide with the second polynucleotide and the third polynucleotide under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, resulting in the bidirectional extension of the initiating polynucleotide.
  • the annealed polynucleotides are optionally contacted with a ligase under conditions suitable for ligation.
  • the method discussed above is optionally repeated to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation.
  • a target polynucleotide sequence can be designed de novo or derived from a "model polynucleotide sequence".
  • a "model polynucleotide sequence” includes any nucleic acid sequence that encodes a model polypeptide sequence.
  • a model polypeptide sequence provides a basis for designing a modified polynucleotide such that a target polynucleotide incorporating the desired modification is synthesized.
  • the present invention provides also provides methods that can be used to synthesize, de novo, polynucleotides that encode sets of genes, either naturally occurring genes expressed from natural or artificial promoter constructs or artificial genes derived from synthetic DNA sequences, which encode elements of biological systems that perform a specified function or attribution of an artificial organism as well as entire genomes.
  • the present invention provides the synthesis of a replication-competent , double-stranded polynucleotide, wherein the polynucleotide has an origin of replication, a first coding region and a first regulatory element directing the expression of the first coding region.
  • replication competent it is meant that the polynucleotide is capable of directing its own replication.
  • the polynucleotide will possess all the cis-acting signals required to facilitate its own synthesis.
  • the polynucleotide will be similar to a plasmid or a virus, such that once placed within a cell, it is capable of replication by a combination of the polynucleotide * ' s and cellular functions.
  • a polynucleotide sequence defining a gene, genome, set of genes or protein sequence can be designed in a computer- assisted manner (discussed below) and 'used to generate a set of parsed oligonucleotides covering the plus (+) and minus (-) strand of the sequence.
  • a "parsed" means a target polynucleotide sequence has been delineated in a computer-assisted manner such that a series of contiguous oligonucleotide sequences are identified.
  • the oligonucleotide sequences are individually synthesized and used in a method of the invention to generate a target polynucleotide.
  • the length of an oligonucleotide is quite variable.
  • oligonucleotides used in the methods of the invention are between about 15 and 100 bases and more preferably between about 20 and 50 bases. Specific lengths include, but are not limited to 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64.
  • the overlap between the oligonucleotides having partial complementarity may be designed to be between 5 and 75 bases per oligonucleotide pair.
  • the oligonucleotides preferably are treated with polynucleotide kinase, for example, T4 polynucleotide kinase.
  • the kinasing can be performed prior to, or after, mixing of the oligonucleotides set or after, but before annealing.
  • the oligonucleotides are treated with an enzyme having a ligating function.
  • an enzyme having a ligating function For example, a DNA ligase typically will be employed for this function.
  • topoisomerase which does not require 5' phosphorylation, is rapid and operates at room temperature, and may be used instead of ligase.
  • oligonucleotide array synthesizer For example, 50 base pair oligonucleotides overlapping by 25 bases can be synthesized by an oligonucleotide array synthesizer (OAS) .
  • a 5' (+) strand set of oligonucleotides is synthesized in one 96-well plate and the second 3' or (-) strand set is synthesized in a second 96-well microtiter plate.
  • Synthesis can be carried out using phosphoramidite chemistry modified to miniaturize the reaction size and generate small reaction volumes and yields in the range of 2 to 5 nmole.
  • Synthesis is done on controlled pore glass beads (CPGs) , then the completed oligonucleotides are deblocked, deprotected and removed from the beads.
  • the oligonucleotides are lyophilized, re-suspended in water and 5' phosphorylated using polynucleotide kinase and
  • the set of arrayed oligonucleotide sequences in the plate can be assembled using a mixed pooling strategy.
  • a mixed pooling strategy For example, systematic pooling of component oligonucleotides can be performed using a modified Beckman Biomek automated pipetting robot, or another automated lab workstation.
  • the fragments can be combined with buffer and enzyme (Taq I DNA ligase or Egea AssemblaseTM, for example) . Pooling can be performed in microwell plates. After each step of pooling, the temperature is ramped to enable annealing and ligation, then additional pooling carried out.
  • Target polynucleotide assembly involves forming a set of intermediates .
  • a set of intermediates can include a plus strand oligonucleotide annealed to a minus strand oligonucleotide, as described above.
  • the annealed intermediate can be formed by providing a single plus strand oligonucleotide annealed to a single minus strand oligonucleotide .
  • two or more oligonucleotides may comprise the plus strand or the minus strand.
  • a polynucleotide e.g., an initiating polynucleotide
  • three or more oligonucleotides can be annealed.
  • a first plus strand oligonucleotide, a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide, and a minus strand oligonucleotide having a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide can be annealed to form a partially double-stranded polynucleotide.
  • the polynucleotide can include a 5' overhang, a 3' overhang, or a 5' overhang and a 3 ' overhang.
  • the first plus strand oligonucleotide and second plus strand oligonucleotide are contiguous sequences such that they are ligatable.
  • the minus strand oligonucleotide is partially complementary to both plus strand oligonucleotides and acts as a "bridge” or “stabilizer” sequence by annealing to both oligonucleotides.
  • Subsequent polynucleotides comprised of more than two oligonucleotides annealed as previously described, can be used to assemble a target polynucleotide in a manner resulting in a contiguous double-stranded polynucleotide.
  • FIG. 3 An example of using two or more plus strand oligonucleotides to assemble a polynucleotide is shown in Figure 3.
  • Two of these oligonucleotides provide a ligation substrate joined by ligase and the third oligonucleotide is a stabilizer that brings together two specific sequences by annealing resulting in the formation of a part of the final polynucleotide construct.
  • This intermediate provides a substrate for DNA ligase which, through its nick sealing activity, joins the two 50-base pair oligonucleotides into a single 100 base single-stranded polynucleotide .
  • the products are assembled into increasingly larger polynucleotides.
  • sets of triplexes are systematically joined, ligated, and assembled.
  • Each step can be mediated by robotic pooling, ligation and thermal cycling to achieve annealing and denaturation.
  • the final step joins assembled pieces into a complete sequence representing all of the fragments in the array. Since the efficiency of yield at each step is less than 100%, the mass amount of completed product in the final mixture may be very small.
  • additional specific oligonucleotide primers usually 15 to 20 bases and complementary to the extreme ends of the assembly, can be annealed and PCR amplification carried out, thereby amplifying and purifying the final full-length product.
  • synthesis can utilize microdispensing piezioelectric or microsolenoid nanodispensors allowing very fast synthesis, much smaller reaction volumes and higher density plates as synthesis vessels.
  • the instrument will use up to 1536 well plates giving a very high capacity.
  • controlled pooling can be performed by a microfluidic manifold that will move individual oligonucleotides though microchannels and mix/ligate in a controlled way. This will obviate the need for robotic pipetting and increases speed and efficiency.
  • an apparatus that accomplishes a method of the invention will have a greater capability for simultaneous reactions giving an overall larger capacity for gene length.
  • target polynucleotide have been synthesized using a method of the present invention, it may be necessary to screen the sequences for analysis of function.
  • chip-based DNA technologies Specifically contemplated by the present inventor are chip-based DNA technologies. Briefly, these techniques involve quantitative methods for analyzing large numbers of genes rapidly and accurately. By tagging genes with oligonucleotides or using fixed probe arrays, one can employ chip technology to segregate target molecules as high- density arrays and screen these molecules on the basis of hybridization.
  • the invention provides a method of synthesizing a target polynucleotide by providing a target polynucleotide sequence and identifying at least one initiating polynucleotide sequence present in the target polynucleotide sequence that includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a double-stranded polynucleotide.
  • the initiating polynucleotide is contacted under conditions suitable for primer annealing with a first oligonucleotide having partial complementarity to the 3' portion of the plus strand of the initiating polynucleotide, and a second oligonucleotide having partial complementarity to the 3' portion of the minus strand of the initiating polynucleotide.
  • Primer extension subsequently performed using polynucleotide synthesis from the 3' -hydroxyl of: 1) the plus strand of the initiating polynucleotide; 2) the annealed first oligonucleotide; 3) the minus strand of the initiating polynucleotide; and 4) the annealed second oligonucleotide.
  • the synthesis results in the initiating sequence being extended bi-directionally thereby forming a nascent extended initiating polynucleotide.
  • the extended initiating sequence can be further extended by repeated cycles of annealing and primer extension.
  • oligonucleotides can be used as building blocks to assemble polynucleotides through annealing and ligation reactions.
  • oligonucleotides can be used as primers to manufacture polynucleotides through annealing and primer extension reactions.
  • primer is used herein to refer to a binding element which comprises an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, i.e., in the presence of appropriate nucleotides and an agent for polymerization such as a DNA polymerase in an appropriate buffer ("buffer” includes pH, ionic strength, cofactors, etc.) and at a suitable temperature.
  • buffer includes pH, ionic strength, cofactors, etc.
  • the primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products.
  • the primer is an oligodeoxyribonucleotide.
  • the primer must be sufficiently long to prime the synthesis of extension products in the presence of the agent for polymerization. The exact lengths of the primers will depend on many factors, including temperature and source of primer and use of the method. Primers having only short sequences capable of hybridization to the target nucleotide sequence generally require lower temperatures to form sufficiently stable hybrid complexes with the template.
  • the primers herein are selected to be “substantially” complementary to the different strands of each specific sequence to be amplified. This means that the primers must be sufficiently complementary to hybridize with their respective strands. Therefore, the primer sequence need not reflect the exact sequence of the template. Commonly, however, the primers have exact complementarity except with respect to analyses effected according to the method described in Nucleic Acids Research 17 (7) 2503-2516 (1989) or a corresponding method employing linear amplification or an amplification technique other than the polymerase chain reaction.
  • the agent for primer extension of an oligonucleotide may be any compound or system that will function to accomplish the synthesis of primer extension products, including enzymes .
  • Suitable enzymes for this purpose include, for example, ⁇ . coli DNA Polymerase I, Klenow fragment of E. coli DNA polymerase I, T4 DNA polymerase, other available DNA polymerases, reverse transcriptase, and other enzymes, including thermostable enzymes.
  • thermostable enzyme refers to any enzyme that is stable to heat and is heat resistant and catalyses (facilitates) combination of the nucleotides in the proper manner to form the primer extension products which are complementary to each nucleic acid strand.
  • thermostable enzyme that may be employed in the process of the present invention is that which can be extracted and purified from Thermus aquations. Such an enzyme has a molecular weight of about 86,000- 90,000 daltons.
  • Thermus aquaticus strain YT1 is available without restriction from the American Type Culture
  • K. Kleppe et al in J. Mol . Biol . , (1971), 56, 341-361 disclose a method for the amplification of a desired DNA sequence.
  • the method involves denaturation of a DNA duplex to form single strands.
  • the denaturation step is carried out in the presence of a sufficiently large excess of two nucleic acid primers that hybridize to regions adjacent to the desired DNA sequence.
  • DNA polymerase and a sufficient amount of each required nucleoside triphosphate are added whereby two molecules of the original duplex are obtained.
  • the above cycle of denaturation, primer addition and extension are repeated until the appropriate number of copies of the desired target polynucleotide is obtained.
  • the present invention further provides a method for the expression and isolation of a target polypeptide encoded by a target polynucleotide.
  • the method includes incorporating a target polynucleotide synthesized by a method of the invention into an expression vector; introducing the expression vector of into a suitable host cell; culturing the host cell under conditions and for such time as to promote the expression of the target polypeptide encoded by the target polynucleotide; and isolating the target polypeptide .
  • the invention can be used to modify certain functional, structural, or phylogenic features of a model polynucleotide encoding a model polypeptide resulting in an altered target polypeptide.
  • An input or model polynucleotide sequence encoding a model polypeptide can be electronically manipulated to determine a potential for an effect of an amino acid change (or variance) at a particular site or multiple sites in the model polypeptide.
  • a novel target polynucleotide sequence is assembled by a method of the invention such that the target polynucleotide encodes a target polypeptide possessing a characteristic different from that of the model polypeptide.
  • the methods of the invention may rely on the use of public sequence and structure databases . These databases become more robust as more and more sequences and structures are added.
  • Information regarding the amino acid sequence of a target polypeptide and the tertiary structure of the polypeptide can be used to synthesize oligonucleotides that can be assembled into a target polynucleotide encoding a target polypeptide.
  • a model polypeptide should have sufficient structural information to analyze the amino acids involved in the function of the polypeptide.
  • the structural information can be derived from x-ray crystallography, NMR, or some other technique for determining the structure of a protein at the amino acid or atomic level.
  • the sequence and structural information obtained from the model polypeptide can be used to generate a plurality of polynucleotides encoding a plurality of variant amino acid sequences that comprise a target polypeptide.
  • a model polypeptide can be selected based on overall sequence similarity to the target protein or based on the presence of a portion having sequence similarity to a portion of the target polypeptide.
  • a "polypeptide”, as used herein, is a polymer in which the monomers are alpha amino acids and are joined together through amide bonds.
  • Amino acids may be the L-optical isomer or the D-optical isomer.
  • Polypeptides are two or more amino acid monomers long and are often more than 20 amino acid monomers long. Standard abbreviations for amino acids are used (e.g., P for proline) . These abbreviations are included in Stryer, Biochemistry, Third Ed., 1988, which is incorporated herein by reference for all purposes.
  • isolated refers to a polypeptide that constitutes the major component in a mixture of components, e.g., 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, or 95% or more by weight.
  • Isolated polypeptides typically are obtained by purification from an organism in which the polypeptide has been produced, although chemical synthesis is also possible. Method of polypeptide purification includes, for example, chromatography or immunoaffinity techniques.
  • Polypeptides of the invention may be detected by sodium dodecyl sulphate (SDS) -polyacrylamide gel electrophoresis followed by Coomassie Blue-staining or Western blot analysis using monoclonal or polyclonal antibodies that have binding affinity for the polypeptide to be detected.
  • SDS sodium dodecyl sulphate
  • a "chimeric polypeptide,” as used herein, is a polypeptide containing portions of amino acid sequence derived from two or more different proteins, or two or more regions of the same protein that are not normally contiguous .
  • a "ligand”, as used herein, is a molecule that is recognized by a receptor.
  • ligands that can be investigated by this invention include, but are not restricted to, agonists and antagonists for cell membrane receptors, toxins and venoms, viral epitopes, hormones, opiates, steroids, peptides, enzyme substrates, cofactors, drugs, lectins, sugars, oligonucleotides, nucleic acids, oligosaccharides, and proteins.
  • a "receptor”, as used herein, is a molecule that has an affinity for a ligand. Receptors may be naturally-occurring or manmade molecules . They can be employed in their unaltered state or as aggregates with other species.
  • Receptors may be attached, covalently or noncovalently, to a binding member, either directly or via a specific binding substance .
  • receptors which can be employed by this invention include, but are not restricted to, antibodies, cell membrane receptors, monoclonal antibodies and antisera reactive with specific antigenic determinants, viruses, cells, drugs, polynucleotides, nucleic acids, peptides, cofactors, lectins, sugars, polysaccharides, cellular membranes, and organelles.
  • a "ligand receptor pair" is formed when two molecules have combined through molecular recognition to form a complex.
  • polypeptides which can synthesized by this invention include but are not restricted to: a) Microorganism receptors: Determination of ligands that bind to microorganism receptors such as specific transport proteins or enzymes essential to survival of microorganisms would be a useful tool for discovering new classes of antibiotics. Of particular value would be antibiotics against opportunistic fungi, protozoa, and bacteria resistant to antibiotics in current use.
  • a receptor can comprise a binding site of an enzyme such as an enzyme responsible for cleaving a neurotransmitter; determination of ligands for this type of receptor to modulate the action of an enzyme that cleaves a neurotransmitter is useful in developing drugs that can be used in the treatment of disorders of neurotransmission.
  • an enzyme such as an enzyme responsible for cleaving a neurotransmitter
  • determination of ligands for this type of receptor to modulate the action of an enzyme that cleaves a neurotransmitter is useful in developing drugs that can be used in the treatment of disorders of neurotransmission.
  • the invention may be useful in investigating a receptor that comprises a ligand- binding site on an antibody molecule which combines with an epitope of an antigen of interest; determining a sequence that mimics an antigenic epitope may lead to the development of vaccines in which the immunogen is based on one or more of such sequences or lead to the development of related diagnostic agents or compounds useful in therapeutic treatments such as for autoimmune diseases (e.g., by blocking the binding of the "self" antibodies) .
  • Polynucleotides Sequences of polynucleotides may be synthesized to establish DNA or RNA binding sequences that act as receptors for synthesized sequence.
  • Catalytic Polypeptides Polymers, preferably antibodies, which are capable of promoting a chemical reaction involving the conversion of one or more reactants to one or more products. Such polypeptides generally include a binding site specific for at least one reactant or reaction intermediate and an active functionality proximate to the binding site, which functionality is capable of chemically modifying the bound reactant. Catalytic polypeptides and others are described in, for example, PCT
  • Hormone receptors Identification of the ligands that bind with high affinity to a receptor such as the receptors for insulin and growth hormone is useful in the development of, for example, an oral replacement of the daily injections which diabetics must take to relieve the symptoms of diabetes or a replacement for growth hormone .
  • hormone receptors include the vasoconstrictive hormone receptors; determination of ligands for these receptors may lead to the development of drugs to control blood pressure.
  • Opiate receptors Determination of ligands which bind to the opiate receptors in the brain is useful in the development of less-addictive replacements for morphine and related drugs .
  • structure refers to the three dimensional arrangement of atoms in the protein.
  • Fusion refers to any measurable property of a protein. Examples of protein function include, but are not limited to, catalysis, binding to other proteins, binding to non-protein molecules (e.g., drugs), and isomerization between two or more structural forms.
  • Biologically relevant protein refers to any protein playing a role in the life of an organism. To identify significant structural motifs, the sequence of the model polypeptide is examined for matches to the entries in one or more databases of recognized domains, e.g., the PROSITE database domains (Bairoch, Nucl. Acids.
  • the PROSITE database is a compilation of two types of sequence signatures-profiles, typically representing whole protein domains, and patterns typically representing just the most highly conserved functional or structural aspects of protein domains.
  • the methods of the invention can be used to generate polypeptides containing polymorphisms that have an effect on a catalytic activity of a target polypeptide or a non- catalytic activity of the target polypeptide (e.g., structure, stability, binding to a second protein or polypeptide chain, binding to a nucleic acid molecule, binding to a small molecule, and binding to a macromolecule that is neither a protein nor a nucleic acid) .
  • the invention provides a means for assembling any polynucleotide sequence encoding a target polypeptide such that the encoded polypeptide can be expressed and screened for a particular activity.
  • the methods of the invention can be used to identify amino acid substitutions that can be made to engineer the structure or function of a polypeptide of interest (e.g., to increase or decrease a selected activity or to add or remove a selective activity) .
  • the methods of the invention can be used in the identification and analysis of candidate polymorphisms for polymorphism-specific targeting by pharmaceutical or diagnostic agents, for the identification and analysis of candidate polymorphisms for pharmacogenomic applications, and for experimental biochemical and structural analysis of pharmaceutical targets that exhibit amino acid polymorphism.
  • a library of target polynucleotides encoding a plurality of target polypeptides can be prepared by the present invention.
  • Host cells are transformed by artificial introduction of the vectors containing the target polynucleotide by inoculation under conditions conducive for such transformation.
  • the resultant libraries of transformed clones are then screened for clones which display activity for the polypeptide of interest in a phenotypic assay for activity.
  • a target polynucleotide of the invention can be incorporated (i.e., cloned) into an appropriate vector.
  • the target sequences encoding a target polypeptide of the invention may be inserted into a recombinant expression vector.
  • the term "recombinant expression vector” refers to a plasmid, virus, or other vehicle known in the art that has been manipulated by insertion or incorporation of the polynucleotide sequence encoding a target polypeptide of the invention.
  • the expression vector typically contains an origin of replication, a promoter, as well as specific genes that allow phenotypic selection of the transformed cells.
  • Vectors suitable for use in the present invention include, but are not limited to, the T7-based expression vector for expression in bacteria (Rosenberg et al . , Gene, 56:125, 1987) , the pMSXND expression vector for expression in mammalian cells (Lee and Nathans, J. Biol . Chem. , 263:3521, 1988) , baculovirus-derived vectors for expression in insect cells, cauliflower mosaic virus, CaMV, tobacco mosaic virus, TMV.
  • any of a number of suitable transcription and translation elements including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see, e.g., Bitter et al . , Methods in Enzymology, 153:516-544, 1987). These elements are well known to one of skill in the art .
  • operably linked refers to functional linkage between the regulatory sequence and the polynucleotide sequence regulated by the regulatory sequence.
  • the operably linked regulatory sequence controls the expression of the product expressed by the polynucleotide sequence.
  • the functional linkage also includes an enhancer element.
  • Promoter means a nucleic acid regulatory sequence sufficient to direct transcription. Also included in the invention are those promoter elements that are sufficient to render promoter-dependent polynucleotide sequence expression controllable for cell-type specific, tissue specific, or inducible by external signals or agents; such elements may be located in the 5' or 3 ' regions of the native gene, or in the introns .
  • Gene expression or “polynucleotide sequence expression” means the process by which a nucleotide sequence undergoes successful transcription and translation such that detectable levels of the delivered nucleotide sequence are expressed in an amount and over a time period so that a functional biological effect is achieved.
  • yeast a number of vectors containing constitutive or inducible promoters may be used.
  • Current Protocols in Molecular Biology Vol. 2, Ed. Ausubel et al . , Greene Publish. Assoc. & Wiley Interscience, Ch. 13, 1988; Grant et al., "Expression and Secretion Vectors for Yeast,” in Methods in Enzymology, Eds. Wu & Grossman, Acad. Press,
  • a constitutive yeast promoter such as ADH or LEU2
  • an inducible promoter such as GAL
  • vectors may be used which promote integration of foreign DNA sequences into the yeast chromosome .
  • telomeres are repeated sequences found at chromosome ends and it has long been known that chromosomes with truncated ends are unstable, tend to fuse with other chromosomes and are otherwise lost during cell division. Some data suggest that telomeres interact with the nucleoprotein complex and the nuclear matrix. One putative role for telomeres includes stabilizing chromosomes and shielding the ends from degradative enzyme.
  • telomeres Another possible role for telomeres is in replication.
  • telomeres may provide a buffer against this effect, at least until they are themselves eliminated by this effect.
  • a further structure that may be included in target polynucleotide is a centromere .
  • the delivery of a nucleic acid in a cell may be identified in vitro or in vivo by including a marker in the expression construct .
  • the marker would result in an identifiable change to the transfected cell permitting easy identification of expression.
  • An expression vector of the invention can be used to transform a target cell.
  • transformation is meant a genetic change induced in a cell following incorporation of new DNA (i.e., DNA exogenous to the cell) .
  • new DNA i.e., DNA exogenous to the cell
  • the genetic change is generally achieved by introduction of the DNA into the genome of the cell.
  • transformed cell is meant a cell into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques. Transformation of a host cell with recombinant DNA may be carried out by conventional techniques as are well known to those skilled in the art.
  • competent cells that are capable of DNA uptake can be prepared from cells harvested after exponential growth phase and subsequently treated by the CaCl 2 method by procedures well known in the art.
  • CaCl 2 the CaCl 2 method
  • MgCl 2 or RbCl can be used.
  • Transformation can also be performed after forming a protoplast of the host cell or by electroporation.
  • a target polypeptide of the invention can be produced in prokaryotes by expression of nucleic acid encoding the polypeptide.
  • nucleic acid encoding the polypeptide.
  • These include, but are not limited to, microorganisms, such as bacteria transformed with recombinant bacteriophage DNA, plasmid DNA, or cosmid DNA expression vectors encoding a polypeptide of the invention.
  • the constructs can be expressed in E. coli in large scale for in vitro assays. Purification from bacteria is simplified when the sequences include tags for one-step purification by nickel-chelate chromatography. The construct can also contain a tag to simplify isolation of the polypeptide.
  • a polyhistidine ' tag of, e.g., six histidine residues, can be incorporated at the amino terminal end, or carboxy terminal end, of the protein.
  • the polyhistidine tag allows convenient isolation of the protein in a single step by nickel-chelate chromatography.
  • the target polypeptide of the invention can also be engineered to contain a cleavage site to aid in protein recovery.
  • the polypeptides of the inventi"on can be expressed directly in a desired host cell for assays in situ.
  • Eukaryotic cells can also be cotransfected with DNA sequences encoding a polypeptide of the invention, and a second foreign DNA molecule encoding a selectable phenotype, such as the herpes simplex thymidine kinase gene .
  • Another method is to use a eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papilloma virus, to transiently infect or transform eukaryotic cells and express the protein.
  • SV40 simian virus 40
  • bovine papilloma virus bovine papilloma virus
  • a eukaryotic host is utilized as the host cell, as described herein.
  • Eukaryotic systems and preferably mammalian expression systems, allow for proper post-translational modifications of expressed mammalian proteins to occur.
  • Eukaryotic cells that possess the cellular machinery for proper processing of the primary transcript, glycosylation, phosphorylation, and advantageously secretion of the gene product should be used as host cells for the expression of the polypeptide of the invention.
  • host cell lines may include, but are not limited to, CHO, VERO, BHK, HeLa, COS, MDCK, Jurkat, HEK- 293, and WI38.
  • host cells can be transformed with the cDNA encoding a target polypeptide of the invention controlled by appropriate expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, polyadenylation sites, etc.), and a selectable marker.
  • expression control elements e.g., promoter, enhancer, sequences, transcription terminators, polyadenylation sites, etc.
  • selectable marker in the recombinant plasmid confers resistance to the selection and allows cells to stably integrate the plasmid into their chromosomes and grow to form foci that, in turn, can be cloned and expanded into cell lines.
  • engineered cells may be allowed to grow for 1-2 days in an enriched media, and then are switched to a selective media.
  • a number of selection systems may be used, including, but not limited to, the herpes simplex virus thymidine kinase (Wigler et al . , Cell, 11:223, 1977), hypoxanthine-guanine phosphoribosyltransferase (Szybalska & Szybalski, Proc . Natl . Acad. Sci . USA, 48:2026, 1962), and adenine phosphoribosyltransferase (Lowy et al .
  • genes can be employed in tk- , hgprt- or aprt- cells, respectively.
  • antimetabolite resistance can be used as the basis of selection for dhfr, which confers resistance to methotrexate (Wigler et al . , Proc. Natl. Acad. Sci. USA, 77:3567, 1980; O 'Hare et al . , Proc. Natl. Acad. Sci. USA, 8:1527, 1981); gpt , which confers resistance to mycophenolic acid (Mulligan & Berg, Proc. Natl. Acad. Sci.
  • a target polynucleotide, or expression construct containing a target polynucleotide, may be entrapped in a liposome.
  • Liposomes are vesicular structures characterized by a phospholipid bilayer membrane and an inner aqueous medium. Multilarnellar liposomes have multiple lipid layers separated by aqueous medium and form spontaneously when phospholipids are suspended in an excess of aqueous solution. The lipid components undergo self-rearrangement before the formation of closed structures and entrap water and dissolved solutes between the lipid bilayers.
  • the liposome may be complexed with a hernagglutinating virus (HVJ) .
  • HVJ hernagglutinating virus
  • the liposome may be complexed or employed in conjunction with nuclear non- histone chromosomal proteins (HMG-1) .
  • the liposome may be complexed or employed in conjunction with both HVJ and HMG-1.
  • the present invention describes methods for enabling the creation of a target polynucleotide based upon information only, i.e., without the requirement for existing genes, DNA molecules or genomes.
  • a virtual polynucleotide in the computer.
  • This polynucleotide consists of a string of DNA bases, G, A, T or C, comprising for example an entire artificial polynucleotide sequence in a linear string.
  • computer software is then used to parse the target sequence breaking it down into a set of overlapping oligonucleotides of specified length. This results in a set of shorter DNA sequences that overlap to cover the entire length of the target polynucleotide in overlapping sets.
  • a gene of 1000 bases pairs would be broken down into 20 100- mers where 10 of these comprise one strand and 10 of these comprise the other strand. They would be selected to overlap on each strand by 25 to 50 base pairs.
  • the degeneracy of the genetic code permits substantial freedom in the choice of codons for any particular amino acid sequence.
  • Transgenic organisms such as plants frequently prefer particular codons that, though they encode the same protein, may differ from the codons in the organism from which the gene was derived.
  • U.S. Pat. No. 5,380,831 to Adang et al describes the creation of insect resistant transgenic plants that express the Bacillus thuringiensis (Bt) toxin gene.
  • the Bt crystal protein, an insect toxin is encoded by a full-length gene that is poorly expressed in transgenic plants.
  • a synthetic gene encoding the protein containing codons preferred in plants was substituted for the natural sequence.
  • the invention disclosed therein comprised a chemically synthesized gene encoding an insecticidal protein which is frequently equivalent to a native insecticidal protein of Bt .
  • the synthetic gene was designed to be expressed in plants at a level higher than a native Bt gene .
  • the hydropathic index of amino acids may be considered.
  • the importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art.
  • Each amino acid has been assigned a hydropathic index on the basis of their hydrophobicity and charge characteristics, these are: Isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (47); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate (-3.5); glutamine (- 3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and arginine (45) .
  • amino acids may be substituted by other amino acids having a similar hydropathic index or score and still result in a protein with similar biological activity, i.e., still obtain a biological functionally equivalent protein.
  • substitution of amino acids whose hydropathic indices are within ⁇ 2 is preferred, those which are within ⁇ I are particularly preferred, and those within ⁇ 0.5 are even more particularly preferred.
  • an amino acid can be substituted for another having a similar hydrophilicity value and still obtain a biologically equivalent and immunologically equivalent polypeptide.
  • substitution of amino acids whose hydrophilicity values are within ⁇ 2 is preferred, those that are within +1 are particularly preferred, and those within +0.5 are even more particularly preferred.
  • amino acid substitutions are generally based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like.
  • Exemplary substitutions that take various of the foregoing characteristics into consideration are well known to those of skill in the art and include: arginine and lysine; glutarnate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine.
  • aspects of the invention may be implemented in hardware or software, or a combination of both.
  • the algorithms and processes of the invention are implemented in one or more computer programs executing on programmable computers each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device, and at least one output device.
  • Program code is applied to input data to perform the functions described herein and generate output information.
  • the output information is applied to one or more output devices, in known fashion.
  • Each program may be implemented in any desired computer language (including machine, assembly, high level procedural, or object oriented programming languages) to communicate with a computer system.
  • the language may be a compiled or interpreted language.
  • Each such computer program is preferably stored on a storage medium or device (e.g., ROM, CD-ROM, tape, or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein.
  • a storage medium or device e.g., ROM, CD-ROM, tape, or magnetic diskette
  • the inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • the invention provides a computer program, stored on a computer-readable medium, for generating a target polynucleotide sequence.
  • the computer program includes instructions for causing a computer system to: 1) identify an initiating polynucleotide sequence contained in the target polynucleotide sequence; 2) parse the target polynucleotide sequence into multiply distinct, partially complementary, oligonucleotides; and 3) control assembly of the target polynucleotide sequence by controlling the bi-directional extension of the initiating polynucleotide sequence by the sequential addition of partially complementary oligonucleotides resulting in a contiguous double-stranded polynucleotide.
  • the computer program will contain an algorithm for parsing the sequence of the target polynucleotide by generating a set of oligonucleotides corresponding to a polypeptide sequence.
  • the algorithm utilizes a polypeptide sequence to generate a DNA sequence using a specified codon table.
  • the algorithm then generates a set of parsed oligonucleotides corresponding to the (+) and (-) strands of the DNA sequence in the following manner:
  • the DNA sequence GENE [] an array of bases, is generated from the protein sequence AA[], an array of amino acids, using a specified codon table.
  • oligonucleotide set assembly is established by the following algorithm:
  • Step 2 a Do the following until only a single reaction remains i.
  • I 1 to W/3 ii.
  • Ligate T[I] , T[I+1] iii. 1 1 + 2 iv. Go to ii CODON TABLE (E. coli Class II preferred usage)
  • Algorithms of the invention useful for assembly of a target polynucleotide can further be described as Perl script as set forth below.
  • ALGORITHM 1 provides a method for converting a protein sequence into a polynucleotide sequence using E. Coli codons: #$sequence is the protein sequence in single letter amino acid code
  • #$seqlen is the length of the protein sequence
  • #$amino acid is the individual amino acid in the sequence
  • #$codon is the individual DNA triplet codon in the Gene sequence
  • #$DNAsquence is the gene sequence in DNA bases
  • #$baselen is the length of the DNA sequence in bases
  • $aminoacid substr ($sequence, $n, 1) ;
  • ALGORITHM 2 provides a method for parsing a polynucleotide sequence into component forward and reverse oligonucleotides that can be reassembled into a complete target polynucleotide encoding a target polypeptide: #$oligoname is the identifier name for the list and for each component #oligonucleotide
  • #$OL is the length of each component oligonucleotide #$Overlap is the length of the overlap in bases between each forward and each #reverse oligonucleotide #$sequence is the DNA sequence in bases #$seqlen is the length of the DNA sequence in bases #$bas is the individual base in a sequence #$forseq is the sequence of a forward oligonucleotide #$revseq is the sequence of a reverse oligonucleotide #$revcomp is the reverse complemented sequence of the gene #$oligonameF- [] is the list of parsed forward oligos #$oligonameR- [] is the list of parsed reverse oligos
  • $seqlen length ($sequence) ; #convert forward sequence to upper case if lower case
  • $oligo substr ($revcomp, $i, $OL) ; print OUT "$oligname R- $r $oligo” ; print "$oligname R- $r $oligo";
  • the invention further provides a computer-assisted method for synthesizing a target polynucleotide encoding a target polypeptide derived from a model sequence using a programmed computer including a processor, an input device, and an output device, by inputting into the programmed computer, through the input device, data including at least a portion of the target polynucleotide sequence encoding a target polypeptide. Subsequently, the sequence of at least one initiating polynucleotide present in the target polynucleotide sequence is determined and a model for synthesizing the target polynucleotide sequence is derived.
  • the model is based on the position of the initiating sequence in the target polynucleotide sequence using overall sequence parameters necessary for expression of the target polypeptide in a biological system.
  • the information is outputted to an output device which provides the means for synthesizing and assembling to target polynucleotide.
  • any apparatus suitable for polynucleotide synthesis can be used in the present invention.
  • a nanodispensing head with up to 16 valves can be used to deposit synthesis chemicals in assembly vessels ( Figure 4) .
  • Chemicals can be controlled using a syringe pump from the reagent reservoir. Because of the speed and capability of the ink-jet dispensing system, synthesis can be made very small and very rapid.
  • Underlying the reaction chambers is a set of assembly vessels linked to microchannels that will move fluids by microfluidics . The configuration of the channels will pool pairs and triplexes of oligonucleotides systematically using, for example, a robotic device. However, pooling can be accomplished using fluidics and without moving parts.
  • oligonucleotide synthesis, oligonucleotide assembly by pooling and annealing, and ligation can be done using microfluidic mixing, resulting in the same set of critical triplex intermediates that serves as the substrate for annealing, ligation and oligonucleotide joining.
  • DNA ligase and other components can be placed in the buffer fluid moving through the instrument microchambers .
  • synthesis and assembly can be carried out in a highly controlled way in the same instrument.
  • the pooling manifold can be produced from non-porous plastic and designed to control sequential pooling of oligonucleotides synthesized in arrays.
  • Oligonucleotide parsing from a gene sequence designed in the computer can be programmed for synthesis where (+) and (-) strands are placed in alternating wells of the array.
  • the 12 row sequences of the gene are directed into the pooling manifold that systematically pools three wells into reaction vessels forming the critical triplex structure.
  • four sets of triplexes are pooled into 2 sets of 6 oligonucleotide products, then 1 set of 12 oligonucleotide products.
  • Each row of the synthetic array is associated with a similar manifold resulting in the first stage of assembly of 8 sets of assembled oligonucleotides representing 12 oligonucleotides each.
  • the second manifold pooling stage is controlled by a single manifold that pools the 8 row assemblies into a single complete assembly. Passage of the oligonucleotide components through the two manifold assemblies (the first 8 and the second single) results in the complete assembly of all 96 oligonucleotides from the array.
  • the assembly module ( Figure 8) of GenewriterTM can include a complete set of 7 pooling manifolds produced using microfabrication in a single plastic block that sits below the synthesis vessels .
  • Various configurations of the pooling manifold will allow assembly of 96,384 or 1536 well arrays of parsed component oligonucleotides .
  • the initial configuration is designed for the assembly of 96 oligonucleotides synthesized in a pre-defined array, composed of 48 pairs of overlapping 50 mers. Passage through the assembly device in the presence of DNA ligase and other appropriate buffer and chemical components, and with appropriate temperature controls on the device, will assembly these into a single 2400base double stranded gene assembly ( Figure 9) .
  • the basic pooling device design can be made of
  • a temperature control element such as a Peltier circuit underlying the junction of the channels.
  • the assembly platform design can consist of 8 synthesis microwell plates in a 96 well configuration, addressed with 16 channels of microdispensing.
  • each plate below each plate is: 1) an evacuation manifold for removing synthesis components; and 2) an assembly manifold based on the schematic in Figure 9 for assembling component oligonucleotides from each 96-well array.
  • Figure 12 shows a higher capacity assembly format using 1536-well microplates and capable of synthesis of 1536 component oligonucleotides per plate.
  • Below each plate is: 1) an evacuation manifold for removing synthesis components; and 2) an assembly manifold assembly for assembling 1536 component oligonucleotides from each 1536-well array. Pooling and assembly strategies can be based on the concepts used for 96-well plates.
  • An alternative assembly format includes using surface- bound oligonucleotide synthesis rather than soluble synthesis on CPG glass beads ( Figure 13) .
  • oligonucleotides are synthesized with a hydrocarbon linker that allows attachment to a solid support.
  • the synthesized oligonucleotides are covalently attached to a solid support such that the stabilizer is attached and the two ligation substrates added to the overlying solution. Ligation occurs as mediated by DNA ligase in the solution and increasing temperature above the Tm removes the linked oligonucleotides by thermal melting.
  • the systematic assembly on a solid support of a set of parsed component oligonucleotides can be arranged in an array with the set of stabilizer oligonucletoide attached.
  • the set of ligation substrate oligonucleotides are placed in the solution and, systematic assembly is carried out in the solid phase by sequential annealing, ligation and melting which moves the growing DNA molecules across the membrane surface.
  • Figure 15 shows an additional alternative means for oligonucleotide assembly, by binding the component oligonucleotides to a set of metal electrodes on a microelectronic chip, where each electrode can be controlled independently with respect to current and voltage.
  • the array contains the set of minus strand oligonucleotides. Placing a positive change on the electrode will move by electrophoresis the component ligase substrate oligonucleotide onto the surface where annealing takes place. The presence of DNA ligase mediates covalent joining or ligation of the components. The electrode is then turned off or a negative charge is applied and the DNA molecule expulsed from the electrode.
  • next array element containing the next stabilizer oligonucleotide from the parsed set is turned on with a positive charge and a second annealing, joining and ligation with the next oligonucleotide in the set carried out.
  • Systematic and repetitive application of voltage control, annealing, ligation and denaturation will result in the movement of the growing chain across the surface as well as assembly of the components into a complete DNA molecule .
  • a desired sequence can be ordered by any means of communication available to a user wishing to order such a sequence.
  • a "user", as used herein, is any entity capable of communicating a desired polynucleotide sequence to a server.
  • the sequence may be transmitted by any means of communication available to the user and receivable by a server.
  • the user can be provided with a unique designation such that the user can obtain information regarding the synthesis of the polynucleotide during synthesis.
  • the transmitted target polynucleotide sequence can be synthesized by any method set forth in the present invention.
  • the invention further provides a method for automated synthesis of a polynucleotide, by providing a user with a mechanism for communicating a model polynucleotide sequence and optionally providing the user with an opportunity to communicate at least one desired modification to the model sequence.
  • the invention envisions a user providing a model sequence and a desired modification to that sequence which results in the alteration of the model sequence. Any modification that alters the expression, function or activity of a target polynucleotide or encoded target polypeptide can be communicated by the user such that a modified polynucleotide or polypeptide is synthesized or expressed according to a method of the invention.
  • a model polynucleotide encoding a polypeptide normally expressed in a eukaryotic system can be altered such that the codons of the resulting target polynucleotide are conducive for expression of the polypeptide in a prokaryotic system.
  • the user can indicate a desired modified activity of a polypeptide encoded by a model polynucleotide.
  • the algorithms and methods of the present invention can be used to synthesize a target polynucleotide encoding a target polypeptide believed to have the desired modified activity.
  • the methods of the invention can be further utilized to express the target polypeptide and to screen for the desired activity. It is understood that the methods of the invention provide a means for synthetic evolution whereby any parameter of polynucleotide expression and/or polypeptide activity can be altered as desired.
  • the data including at least a portion of the model polynucleotide sequence is inputted into a programmed computer, through an input device.
  • the algorithms of the invention are used to determine the sequence of the model polynucleotide sequence containing the desired modification and resulting in a target polynucleotide containing the modification.
  • the processor and algorithms of the invention is used to identify at least one initiating polynucleotide sequence present in the polynucleotide sequence.
  • a target polynucleotide i.e., a modified model polynucleotide
  • a model polypeptide sequence or nucleic acid sequence is obtained and analyzed using a suitable DNA analysis package, such as, for example, MacVector or DNA Star. If the target protein will be expressed in a bacterial system, for example, the model sequence can be converted to a sequence encoding a polypeptide utilizing E. coli preferred codons (i.e., Type I, Type II or Type II codon preference) .
  • E. coli preferred codons i.e., Type I, Type II or Type II codon preference
  • the present invention provides the conversion programs Codon I, Codon II or Codon III.
  • a nucleic acid sequence of the invention can be designed to accommodate any codon preference of any prokaryotic or eucaryotic organism.
  • specific promoter, enhancer, replication or drug resistance sequences can be included in a synthetic nucleic acid sequence of the invention.
  • the length of the construction can be adjusted by padding to give a round number of bases based on about 25 to 100 bp synthesis.
  • the synthesis of sequences of about 25 to 100 bp in length can be manufactured and assembled using the array synthesizer system and may be used without further purification. For example, two 96-well plates containing 100 -mers could give a 9600 bp construction of a target sequence .
  • the oligonucleotides are parsed using ParseOligoTM, a proprietary computer program that optimizes nucleic acid sequence assembly. Optional steps in sequence assembly include identifying and eliminating sequences that may give rise to hairpins, repeats or other difficult sequences.
  • the parsed oligonucleotide list is transferred to the Synthesizer driver software. The individual oligonucleotides are pasted into the wells and oligonucleotide synthesis is accomplished.
  • oligonucleotide concentration is from 250 nM (250 ⁇ M/ml) .
  • 50 base oligos give T m s from 75 to 85 degrees C, 6 to 10 od 260 , 11 to 15 nanomoles, 150 to 300 ⁇ g.
  • the PCR reaction includes :
  • oligonucleotide concentration is from 250 nM (250 ⁇ M/ml) .
  • 50 base oligos give T m s from 75 to 85 degrees C, 6 to 10 od 260 , 11 to 15 nanomoles, 150 to 300 ⁇ g. Resuspend in 50 to 100 ml of H 2 0 to make 250 nM/ml.
  • Steps 2-7 of Table 1 Carry out pooling Step 2 mixing each successive well with the next. Add 1 ⁇ l of Taql ligase to each mixed well. Cycle once at 94 degrees for 30 sec; 52 degrees for 30s ; then 72 degrees for 10 minutes.
  • Carry out step 3 (Table 1) of pooling scheme and cycle according to the temperature scheme above .
  • Carry out pooling scheme step 6 and take 10 ⁇ l of each mix into a fresh microwell.
  • Carry out step 7 pooling scheme by pooling the remaining three wells. Reaction volumes will be: Initial plate has 20 ul per well.
  • a final PCR amplification was then performed by taking 2 ul of final ligation mix and add to 20 ul of PCR mix containing 10 mM TRIS-HC1, pH 9.0, 2.2 mM MgCl 2 , 50 mM KCl, 0.2 mM each dNTP and 0.1% Triton X-100
  • oligonucleotide concentration is from 250 nM (250 ⁇ M/ml) .
  • 50 base oligos give T m s from 75 to 85 degrees C, 6 to 10 od 26 o, 11 to 15 nanomoles, 150 to 300 ⁇ g. Resuspend in 50 to 100 ml of H 2 0 to make 250 nM/ml.
  • the invention envisions using a robotic workstation to accomplish nucleic acid assembly.
  • two working plates containing forward and reverse oligonucleotides in a PCR mix at 2.5 mM are prepared and 1 ⁇ l of each oligo are added to 100 ⁇ l of PCR mix in a fresh microwell providing one plate of forward and one of reverse oligos in an array.
  • Cycling assembly is then initiated as follows according to the pooling scheme outlined in Table 1. In the present example, 96 cycles of assembly can be accomplished according to this scheme.
  • a PCR amplification is then performed by taking 2 Dl of final reaction mix and adding it to 20 ⁇ l of a PCR mix comprising: 10 mM TRIS-HC1, pH 9.0
  • Outside primers are prepared by taking 1 ⁇ l of FI and 1 ⁇ l of R96 at 250 mM (250 nm/ml - .250 nmole/ml) and add to the 100 ⁇ l PCR reaction. This gives a final concentration of 2.5 ⁇ M each oligo. 1 U Taql polymerase is subsequently added and the reaction is cycled for about 23 to 35 cycles under the following conditions:
  • reaction is subsequently extracted with phenol/chloroform, precipitated with ethanol and resuspend in 10 ml of dH20 for analysis on an agarose gel.
  • Equal amounts of forward and reverse oligos pairwise are added by taking 10 ⁇ l of forward and 10 ⁇ l of reverse oligo and mix in a new 96-well v-bottom plate.
  • This provides one array with sets of duplex oligonucleotides at 250 mM, according to pooling scheme Step 1 in Table 1.
  • An assembly plate was prepared by taking 2 ⁇ l of each oligomer pair and adding them to the plate containing 100 ⁇ l of ligation mix in each well. This gives an effective concentration of 2.5 ⁇ M or 2.5 nM/ml. About 20 ⁇ l of each well is transferred to a fresh microwell plate in addition to 1 ⁇ l of T4 polynucleotide kinase and 1 ⁇ l of 1 mM ATP. Each reaction will have 50 pmoles of oligonucleotide and 1 nmole ATP. Incubate at 37 degrees for 30 minutes.
  • Step 2 pooling is carried out by mixing each well with the next well in succession. 1 ⁇ l of Taql ligase to is added to each mixed well and cycled once as follows : 94 degrees 30 sec
  • Step 3 of pooling scheme is carried out and cycled according to the temperature scheme above .
  • Steps 4 and 5 of the pooling scheme are carried out and cycled according to the temperature scheme above.
  • Step 7 pooling scheme is carried out by pooling the remaining three wells.
  • the reaction volumes will be (initial plate has 20 ⁇ l per well) :
  • a final PCR amplification is performed by taking 2 ⁇ l of the final ligation mix and adding it to 20 ⁇ l of PCR mix comprising:
  • Triton X-100 Outside primers are prepared by taking 1 ⁇ l of FI and 1 ⁇ l of R96 at 250 mM (250 nm/ml - .250 nmole/ml) and adding them to the 100 ⁇ l PCR reaction giving a final concentration of 2.5 uM for each oligo. Subsequentlly, 1 U of Taql polymerase is added and cycled for about 23 to 35 cycles under the following conditions: 94 degrees 30s
  • the product is extracted with phenol/chloroform, precipitate with ethanol, resuspend in 10 ⁇ l of dH20 and analyzed on an agarose gel .
  • F-C7 - R-F7 denature anneal, polymerase extension 32.
  • F-C8 ⁇ R-F6 denature , anneal , polymerase extension
  • F- -DI L ⁇ R-E3 denature, anneal , polymerase extension
  • F- F4 ⁇ R-C10 denature , anneal , polymerase extension 65.
  • F--F5 ⁇ R-C9 denature, anneal, polymerase extension
  • F- -F10- R-C4 denature , anneal , polymerase extension
  • nucleic acid molecules listed in Table 4 have been produced using the methods described herein. The features and characteristics of each nucleic acid molecule is also described in Table 4.
  • the plasmid comprises 192 oligonucleotides (two sets of 96 overlapping 50 mers; 25 bp overlap) .
  • the plasmid is essentially pUC containing kanamycin resistance instead of ampicillin resistance.
  • the synthetic plasmid also contains lux A and B genes from the Vibrio fisheri bacterial luciferase gene.
  • the SynPucl9 plasmid is 2700 bp in length comprising a sequence essentially identical to pUC19 only shortened to precisely 2700 bp. Two sets of 96 50 mers were used to assemble the plasmid.
  • the Synlux4 pUC19 plasmid was shortened and luxA gene was added. 54 100-mer oligonucleotides comprising two sets of 27 oligonucleotides were used to assemble the plasmid.
  • the miniQElO plasmid comprising 2400 bp was assembled using 48 50 mer oligonucleotides.
  • MiniQElO is an expression plasmid containing a 6X His tag and bacterial promoter for high-level polypeptide expression. MiniQElO was assembled and synthesized using the Taql polymerase amplification method of the invention.
  • microQE plasmid is a minimal plasmid containing only an ampicillin gene, an origin of replication and a linker of pQE plasmids .
  • MicroQE was assembled using either combinatoric ligation with 24 50- mers or with one tube PCR amplification. The SynFibl,
  • SynFibB and SynFibG nucleic acid sequences are synthetic human fibrinogens manufactured using E. coli codons to optimize expression in a prokaryotic expression system.
  • Table 4 Synthetic nucleic acid molecules produced using the methods of the invention.

Abstract

The present invention outlines a novel approach to utilizing the results of genomic sequence information by computer-directed polynucleotide assembly based upon information available in databases such as the human genome database. Specifically, the present invention may be used to select, synthesize and assemble a novel, synthetic target polynucleotide sequence encoding a target polypeptide. The target polynucleotide may encode a target polypeptide that exhibits enhanced or altered biological activity as compared to a model polypeptide encoded by a natural (wild-type) or model polynucleotide sequence.

Description

COMPUTER-DIRECTED ASSEMBLY OF A POLYNUCLEOTIDE ENCODING
A TARGET POLYPEPTIDE
TECHNICAL FIELD
The present invention relates generally to the area of bioinformatics and more specifically to methods, algorithms and apparatus for computer directed polynucleotide assembly. The invention further relates to the production of polypeptides encoded by polynucleotides assembled by the invention.
BACKGROUND
Enzymes, antibodies, receptors and ligands are polypeptides that have evolved by selective pressure to perform very specific biological functions within the milieu of a living organism. The use of a polypeptide for specific technological applications may require the polypeptide to function in environments or on substrates for which it was not evolutionarily selected. Polypeptides isolated from microorganisms that thrive in extreme environments provide ample evidence that these molecules are, in general, malleable with regard to structure and function. However, the process for isolating a polypeptide from its native environment is expensive and time consuming. Thus, new methods for synthetically evolving genetic material encoding a polypeptide possessing a desired activity are needed.
There are two ways to obtain genetic material for genetic engineering manipulations: (1) isolation and purification of a polynucleotide in the form of DNA or RNA from natural sources or (2) the synthesis of a polynucleotide using various chemical-enzymatic approaches. The former approach is limited to naturally-occurring sequences that do not easily lend themselves to specific modification. The latter approach is much more complicated and labor-intensive. However, the chemical-enzymatic approach has many attractive features including the possibility of preparing, without any significant limitations, any desirable polynucleotide sequence.
Two general methods currently exist for the synthetic assembly of oligonucleotides into long polynucleotide fragments. First, oligonucleotides covering the entire sequence to be synthesized are first allowed to anneal, and then the nicks are repaired with ligase. The fragment is then cloned directly, or cloned after amplification by the polymerase chain reaction (PCR) . The polynucleotide is subsequently used for in vitro assembly into longer sequences. The second general method for gene synthesis utilizes polymerase to fill in single-stranded gaps in the annealed pairs of oligonucleotides. After the polymerase reaction, single-stranded regions of oligonucleotides become double-stranded, and after digestion with restriction endonuclease, can be cloned directly or used for further assembly of longer sequences by ligating different double- stranded fragments. Typically, subsequent to the polymerase reaction, each segment must be cloned which significantly delays the synthesis of long DNA fragments and greatly decreases the efficiency of this approach.
The creation of entirely novel polynucleotides, or the substantial modification of existing polynucleotides, is extremely time consuming, expensive, requires complex and multiple steps, and in some cases is impossible. Therefore, there exists a great need for an efficient means to assemble synthetic polynucleotides of any desired sequence. Such a method could be universally applied. For example, the method could be used to efficiently make an array of polynucleotides having specific substitutions in a known sequence that is expressed and screened for improved function. The present invention satisfies these needs by providing efficient and powerful methods and compositions for the synthesis of a target polynucleotide encoding a target polypeptide.
SUMMARY
The present invention addresses the limitations in present recombinant nucleic acid manipulations by providing a fast, efficient means for generating a nucleic acid sequence, including entire genes, chromosomal segments, chromosomes and genomes . Because this approach is based on a completely synthetic approach, there are no limitations, such as the availability of existing nucleic acids, to hinder the construction of even very large segments of nucleic acid.
In one embodiment, the invention provides a method of synthesizing a target polynucleotide sequence including; a) providing a target polynucleotide sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide which includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang and a 3' overhang; c) identifying a second polynucleotide present in the target polynucleotide which is contiguous with the initiating polynucleotide and includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5*" overhang, a 3' overhang, or a 5' overhang and a 3' overhang, where at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating polynucleotide; d) identifying a third polynucleotide present in the target polynucleotide which is contiguous with the initiating sequence and includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, where at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating polynucleotide which is not complementary to an overhang of the second polynucleotide; e) contacting the initiating polynucleotide with the second polynucleotide and the third polynucleotide under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, resulting in the bi-directional extension of the initiating polynucleotide; f) in the absence of primer extension, optionally contacting the mixture of e) with a ligase under conditions suitable for ligation; and g) optionally repeating b) through f) to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation, whereby a target polynucleotide is synthesized.
The invention further provides a method of synthesizing a target polynucleotide including: a) providing a target polynucleotide sequence derived from a model sequence; b) identifying at least one initiating polynucleotide sequence present in the target polynucleotide sequence of a) , wherein the initiating polynucleotide including: 1) a first plus strand oligonucleotide; 2) a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide; and 3) a minus strand oligonucleotide including a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide; c) annealing the first plus strand oligonucleotide and the second plus strand oligonucleotide to the minus strand oligonucleotide of b) resulting in a partially double-stranded initiating polynucleotide including a 5' overhang and a 3' overhang; d) identifying a second polynucleotide sequence present in the target polynucleotide sequence of a) , wherein the second polynucleotide sequence is contiguous with the initiating polynucleotide sequence and includes: 1) a first plus strand oligonucleotide; 2) a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide; and 3) a minus strand oligonucleotide comprising a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide; e) annealing the first plus strand oligonucleotide and the second plus strand oligonucleotide to the minus strand oligonucleotide of d) resulting in a partially double- stranded second polynucleotide, wherein at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating polynucleotide; f) identifying a third polynucleotide present in the target polynucleotide of a) , wherein the third polynucleotide is contiguous with the initiating sequence and comprises: 1) a first plus strand oligonucleotide; 2) a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide; and 3) a minus strand oligonucleotide comprising a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide; g) annealing the first plus strand oligonucleotide and the second plus strand oligonucleotide to the minus strand oligonucleotide of f) resulting in a partially double-stranded second polynucleotide, wherein at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating polynucleotide and not complementary to an overhang of the second polynucleotide; h) contacting the initiating polynucleotide of c) with the second polynucleotide of e) and the third polynucleotide of g) under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, wherein the initiating sequence is extended bi-directionally; i) in the absence of primer extension, optionally contacting the mixture of h) with a ligase under conditions suitable for ligation; and j) optionally repeating b) through i) to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation, whereby a target polynucleotide is synthesized.
In another embodiment, the invention provides a method a method for synthesizing a target polynucleotide, including; a) providing a target polynucleotide sequence derived from a model sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide which includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide; c) contacting the initiating polynucleotide under conditions suitable for primer annealing with a first oligonucleotide having partial complementarity to the 3 ' portion of the plus strand of the initiating polynucleotide, and a second oligonucleotide having partial complementarity to the 3' portion of the minus strand of the initiating polynucleotide; d) catalyzing under conditions suitable for primer extension: 1) polynucleotide synthesis from the 3'- hydroxyl of the plus strand of the initiating polynucleotide; 2) polynucleotide synthesis from the 3'- hydroxyl of the annealed first oligonucleotide; 3) polynucleotide synthesis from the 3** -hydroxyl of the minus strand of the initiating polynucleotide; and 4) polynucleotide synthesis from the 3' -hydroxyl of the annealed second oligonucleotide, resulting in the bidirectional extension of the initiating sequence thereby forming a nascent extended initiating polynucleotide; e) contacting the extended initiating polynucleotide of d) under conditions suitable for primer annealing with a third oligonucleotide having partial complementarity to the 3' portion of the plus strand of the extended initiating polynucleotide, and a fourth oligonucleotide having partial complementarity to the 3' portion of the minus strand of the extended initiating polynucleotide; f) catalyzing under conditions suitable for primer extension: 1) polynucleotide synthesis from the 3' -hydroxyl of the plus strand of the extended initiating polynucleotide; 2) polynucleotide synthesis from the 3'-hydroxyl of the annealed third oligonucleotide; 3) polynucleotide synthesis from the 3'- hydroxyl of the minus strand of the extended initiating polynucleotide; and 4) polynucleotide synthesis from the 3 '-hydroxyl of the annealed fourth oligonucleotide, resulting in the bi-directional extension of the initiating sequence thereby forming a nascent extended initiating polynucleotide; and g) optionally repeating e) through f) as desired, resulting in formation of the target polynucleotide sequence . The invention further provides a method for isolating a target polypeptide encoded by a target polynucleotide generated by a method of the invention by; a) incorporating the target polynucleotide in an expression vector; b) introducing the expression vector into a suitable host cell; c) culturing the cell under conditions and for such time as to promote the expression of the target polypeptide encoded by the target polynucleotide; and d) isolating the target polypeptide.
The invention further provides a method of synthesizing a target polynucleotide including; a) providing a target polynucleotide sequence derived from a model sequence; b) chemically synthesizing a plurality of single-stranded oligonucleotides each of which is partially complementary to at least one oligonucleotide present in the plurality, where the sequence of the plurality of oligonucleotides is a contiguous sequence of the target polynucleotide; c) contacting the partially complementary oligonucleotides under conditions and for such time suitable for annealing, the contacting resulting in a plurality of partially double- stranded polynucleotides, where each double-stranded polynucleotide includes a 5' overhang and a 3' overhang; d) identifying at least one initiating polynucleotide derived from the model sequence present in the plurality of double- stranded polynucleotides; e) in the absence of primer extension, subjecting a mixture including the initiating polynucleotide and 1) a double-stranded polynucleotide that will anneal to the 5' portion of said initiating and sequence; 2) a double-stranded polynucleotide that will anneal to the 3' portion of the initiating polynucleotide; and 3) a DNA ligase under conditions suitable for annealing and ligation, wherein the initiating polynucleotide is extended bi-directionally; f) sequentially annealing double- stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing, whereby the target polynucleotide is produced.
The invention further provides a computer program, stored on a computer-readable medium, for generating a target polynucleotide sequence derived from a model sequence, the computer program comprising instructions for causing a computer system to: a) identify an initiating polynucleotide sequence contained in the target polynucleotide sequence; b) parse the target polynucleotide sequence into multiply distinct, partially complementary, oligonucleotides; c) control assembly of the target polynucleotide sequence by controlling the bi-directional extension of the initiating polynucleotide sequence by the sequential addition of partially complementary oligonucleotides resulting in a contiguous double-stranded polynucleotide .
The invention further provides a method for automated synthesis of a target polynucleotide sequence, including: a) providing the user with an opportunity to communicate a desired target polynucleotide sequence; b) allowing the user to transmit the desired target polynucleotide sequence to a server; c) providing the user with a unique designation; d) obtaining the transmitted target polynucleotide sequence provided by the user.
The invention further provides a method for automated synthesis of a polynucleotide sequence, including: a) providing a user with a mechanism for communicating a model polynucleotide sequence; b) optionally providing the user with an opportunity to communicate at least one desired modification to the model sequence if desired; c) allowing the user to transmit the model sequence and desired modification to a server; d) providing user with a unique designation; e) obtaining the transmitted model sequence and optional desired modification provided by the user; f) inputting into a programmed computer, through an input device, data including at least a portion of the model polynucleotide sequence; g) determining, using the processor, the sequence of the model polynucleotide sequence containing the desired modification; h) further determining, using the processor, at least one initiating polynucleotide sequence present in the model polynucleotide sequence; i) selecting, using the processor, a model for synthesizing the modified model polynucleotide sequence based on the position of the initiating sequence in the model polynucleotide sequence; and j) outputting, to the output device, the results of the at least one determination.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. For example, the one letter and three letter abbreviations for amino acids and the one-letter abbreviations for nucleotides are commonly understood. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. In addition, the materials, methods and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
DESCRIPTION OF DRAWINGS
Like reference symbols in the various drawings indicate like elements.
Figure 1 depicts 96 well plates for of F (i.e.,
"forward" or "plus strand") oligonucleotide synthesis, R (i.e., "reverse" or "minus strand") oligonucleotide synthesis, and a T (i.e., "temperature") plate for the annealing of F and T oligonucleotides.
Figure 2 depicts the oligonucleotide pooling plan where F oligonucleotides and R oligonucleotides are annealed to form a contiguous polynucleotide.
Figure 3 depicts the schematic of assembly of a target polynucleotide sequence defining a gene, genome, set of genes or polypeptide sequence. The sequence is designed by computer and used to generate a set of parsed oligonucleotide fragments covering the + and - strand of a target polynucleotide sequence encoding a target polypeptide .
Figure 4 depicts a schematic of the polynuceotide synthesis modules. A nanodispensing head with a plurality of valves will deposit synthesis chemicals in assembly vessels. Chemical distribution from the reagent reservoir can be controlled using a syringe pump. Underlying the reaction chambers is a set of assembly vessels linked to microchannels that will move fluids by microfluidics.
Figure 5 depicts that oligonucleotide synthesis, oligonucleotide assembly by pooling and annealing, and ligation can be accomplished using microfluidic mixing.
Figure 6 depicts the sequential pooling of oligonucleotides synthesized in arrays.
Figure 7 depicts the pooling stage of the oligonucleotide components through the manifold assemblies resulting in the complete assembly of all oligonucleotides from the array.
Figure 8 depicts an example of an assembly module comprising a complete set of pooling manifolds produced using microfabrication in a single unit. Various configurations of the pooling manifold will allow assembly of increased numbers of well arrays of parsed component oligonucleotides .
Figure 9 depicts the configuration for the assembly of oligonucleotides synthesized in a pre-defined array. Passage through the assembly device in the presence of DNA ligase and other appropriate buffer and chemical components will facilitate double stranded polynucleotide assembly.
Figure 10 depicts an example of the pooling device design. Microgrooves or microfluidic channels are etched into the surface of the pooling device. The device provides a microreaction vessel at the junction of two channels for 1) mixing of the two streams, 2) controlled temperature maintenance or cycling a the site of the junction and 3) expulsion of the ligated mixture from the exit channel into the next set of pooling and ligation chambers.
Figure 11 depicts the design of a polynucleotide synthesis platform comprising microwell plates addressed with a plurality of channels for microdispensing.
Figure 12 depicts an example of a high capacity polynucleotide synthesis platform using high density microwell microplates capable of synthesizing in excess of 1536 component oligonucleotides per plate.
Figure 13 depicts a polynucleotide assembly format using surface-bound oligonucleotide synthesis rather than soluble synthesis. In this configuration, oligonucleotides are synthesized with a linker that allows attachment to a solid support .
Figure 14 depicts a diagram of systematic polynucleotide assembly on a solid support. A set of parsed component oligonucleotides are arranged in an array with a stabilizer oligonucletoide attached. A set of ligation substrate oligonucleotides are placed in the solution and systematic assembly is carried out in the solid phase by sequential annealing, ligation and melting.
Figure 15 depicts polynucleotide assembly using component oligonucleotides bound to a set of metal electrodes on a microelectronic chip. Each electrode can be controlled independently with respect to current and voltage.
Figure 16 depicts generally a primer extension assembly method of the invention. Figure 17 provides a system diagram of the invention.
Figure 18 depicts a perspective view of an instrument of the invention.
DETAILED DESCRIPTION
The complete sequence of complex genomes, including the human genome, make large scale functional approaches to genetics possible. The present invention outlines a novel approach to utilizing the results of genomic sequence information by computer-directed polynucleotide assembly based upon information available in databases such as the human genome database. Specifically, the present invention may be used to synthesize, assemble and select a novel, synthetic target polynucleotide sequence encoding a target polypeptide. The target polynucleotide may encode a target polypeptide that exhibits enhanced or altered biological activity as compared to a model polypeptide encoded by a natural (wild-type) or model polynucleotide sequence. Subsequently, standard assays may be used to survey the activity of an expressed target polypeptide. For example, the expressed target polypeptide can be assayed to determine its ability to carry out the function of the corresponding model polypeptide or to determine whether a target polypeptide exhibiting a new function has been produced. Thus, the present invention provides a means for the synthetically evolving a model polypeptide by synthesizing, in a computer-directed fashion, polynucleotides encoding a target polypeptide derived from a model polypeptide.
In one embodiment, the invention provides a method of synthesizing a target polynucleotide by providing a target polynucleotide sequence and identifying at least one initiating polynucleotide present in the target polynucleotide which includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang and a 3' overhang. As used herein, a "target polynucleotide sequence" includes any nucleic acid sequence suitable for encoding a target polypeptide that can be synthesized by a method of the invention. A target polynucleotide sequence can be used to generate a target polynucleotide using an apparatus capable of assembling nucleic sequences. Generally, a target polynucleotide sequence is a linear segment of DNA having a double-stranded region; the segment may be of any length sufficiently long to be created by the hybridization of at least two oligonucleotides have complementary regions. It is contemplated that a target polynucleotide can be 100, 200, 300, 400, 800, 100, 1500, 200, 4000, 8000, 10000, 12000, 18,000, 20,000, 40,000, 80,000 or more base pairs in length. Indeed, it is contemplated that the methods of the present invention will be able to create entire artificial genomes of lengths comparable to known bacterial, yeast, viral, mammalian, amphibian, reptilian, or avian genomes. In more particular embodiments, the target polynucleotide is a gene encoding a polypeptide of interest . The target polynucleotide may further include non-coding elements such as origins of replication, telomeres, promoters, enhancers, transcription and translation start and stop signals, introns, exon splice sites, chromatin scaffold components and other regulatory sequences. The target polynucleotide may comprises multiple genes, chromosomal segments, chromosomes and even entire genomes. A polynucleotide of the invention may be derived from prokaryotic or eukaryotic sequences including bacterial, yeast, viral, mammalian, amphibian, reptilian, avian, plants, archebacteria and other
DNA containing living organisms.
An "oligonucleotide", as used herein, is defined as a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, preferably more than three. Its exact size will depend on many factors, such as the reaction temperature, salt concentration, the presence of denaturants such as formamide, and the degree of complementarity with the sequence to which the oligonucleotide is intended to hybridize .
The term "nucleotide" as used herein can refer to nucleotides present in either DNA or RNA and thus includes nucleotides which incorporate adenine, cytosine, guanine, thymine and uracil as base, the sugar moiety being deoxyribose or ribose . It will be appreciated however that other modified bases capable of base pairing with one of the conventional bases, adenine, cytosine, guanine, thymine and uracil, may be used in an oligonucleotide employed in the present invention. Such modified bases include for example 8- azaguanine and hypoxanthine . If desired the nucleotides may carry a label or marker so that on incorporation into a primer extension product, they augment the signal associated with the primer extension product, for example for capture on to solid phase. '
A "plus strand" oligonucleotide, by convention, includes a short, single-stranded DNA segment that starts with the 5 ' end to the left as one reads the sequence . A "minus strand" oligonucleotide includes a short, single- stranded DNA segment that starts with the 3 ' end to the left as one reads the sequence. Methods of synthesizing oligonucleotides are found in, for example, Oligonucleotide Synthesis: A Practical Approach, Gate, ed., IRL Press, Oxford (1984) , incorporated herein by reference in its entirety. Solid-phase synthesis techniques have been provided for the synthesis of several peptide sequences on, for example, a number of "pins" (See e.g., Geysen et al., J. Immun. Meth. (1987) 102:259-274, incorporated herein by reference in its entirety) .
Additional methods of forming large arrays of oligonucleotides and other polymer sequences in a short period of time have been devised. Of particular note, Pirrung et al . , U.S. Pat. No. 5,143,854 (see also PCT Application No. WO 90/15070), Fodor et al . , PCT Publication No. WO 92/10092 and Winkler et al . , U.S. Pat No. 6,136,269, all incorporated herein by reference, disclose methods of forming vast arrays of polymer sequences using, for example, light-directed synthesis techniques. See also, Fodor et al., Science (1991) 251:767-777, also incorporated herein by reference in its entirety. Some work has been done to automate synthesis of polymer arrays. For example,
Southern, PCT Application No. WO 89/10977, describes the use of a conventional pen plotter to deposit three different monomers at twelve distinct locations on a substrate.
An "initiating polynucleotide sequence," as used herein, is a sequence contained in a target polynucleotide sequence and identified by an algorithm of the invention. An "initiating polynucleotide" is the physical embodiment of an initiating polynucleotide sequence. For ligation assembly of a target polynucleotide, an initiating polynucleotide begins assembly by providing an anchor for hybridization of subsequent polynucleotides contiguous with the initiating polynucleotide. Thus, for ligation assembly, an initiating polynucleotide is partially double-stranded nucleic acid thereby providing single-stranded overhang (s) for annealing of a contiguous, double-stranded nucleic acid molecule. For primer extension assembly of a target polynucleotide, an initiating polynucleotide begins assembly by providing a template for hybridization of subsequent oligonucleotides contiguous with the initiating polynucleotide. Thus, for primer extension assembly, an initiating polynucleotide can be partiallydouble-stranded or fully double-stranded.
In one embodiment, an initiating polynucleotide of the invention can be bound to a solid support for improved efficiency. The solid phase allows for the efficient separation of the assembled target polynucleotide from other components of the reaction. Different supports can be applied in the method. For example, supports can be magnetic latex beads or magnetic control pore glass beads that allows the desirable product from the reaction mixture to be magnetically separated. Binding the initiating polynucleotide to such beads can be accomplished by a variety of known methods, for example carbodiimide treatment (Gilham, Biochemistry 7:2809-2813 (1968); Mizutani and Tachbana, J. Chromatography 356:202-205 (1986); Wolf et al . , Nucleic Acids Res. 15:2911-2926 (1987); Musso, Nucleic Acids Res. 15:5353-5372 (1987); Lund et al . , Nucleic Acids Res. 16:10861-10880 (1988) ) .
The initiating polynucleotide attached to the solid phase can act as an anchor for the continued synthesis of the target polynucleotide. Assembly can be accomplished by addition of contiguous polynucleotides together with ligase for ligation assembly or by addition of oligonucleotides together with polymerase for primer extension assembly. After the appropriate incubation time, unbound components of the method can be washed out and the reaction can be repeated again to improve the efficiency of template utilization. Alternatively, another set of polynucleotides or oligonucleotides can be added to continue the assembly.
Solid phase, to be efficiently used for the synthesis, can contain pores with sufficient room for synthesis of the long nucleic acid molecules. The solid phase can be composed of material that cannot non-specifically bind any undesired components of the reaction. One way to solve the problem is to use control pore glass beads appropriate for long DNA molecules. The initiating polynucleotide can be attached to the beads through a long connector. The role of the connector is to position the initiating polynucleotide from the surface of the solid support at a desirable distance.
The method of the invention further includes identifying a second polynucleotide sequence present in the target polynucleotide which is contiguous with the initiating polynucleotide and includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, where at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating polynucleotide. Two or more oligonucleotides having complementary regions, where they are permitted, will "anneal" (i.e., base pair) under the appropriate conditions, thereby producing a double-stranded region. In order to anneal (i.e., hybridize), oligonucleotides must be at least partially complementary. The term "complementary to" is used herein in relation to nucleotides to mean a nucleotide that will base pair with another specific nucleotide. Thus adenosine triphosphate is complementary to uridine triphosphate or t ymidine triphosphate and guanosine triphosphate is complementary to cytidine triphosphate.
As used herein, a 5' or 3' "overhang" means a region on the 5' or 3', or 5' and 3', end of a polynucleotide that is single-stranded, i.e. not base paired. An overhang provides a means for the subsequent annealing of a contiguous polynucleotide containing an overhang that is complementary to the overhang of the contiguous polynucleotide. Depending on the application envisioned, one will desire to employ varying conditions of annealing to achieve varying degrees of annealing selectivity.
For applications requiring high selectivity, one typically will desire to employ relatively stringent conditions to form the hybrids, e.g., one will select relatively low salt and/or high temperature conditions, such as provided by about 0.02 M to about 0. 10 M NaCl at temperatures of about 50°C to about 70°C. Such high stringency conditions tolerate little, if any, mismatch between the oligonucleotide and the template or target strand. It generally is appreciated that conditions can be rendered more stringent by the addition of increasing amounts of formamide .
For certain applications, for example, by analogy to substitution of nucleotides by site-directed mutagenesis, it is appreciated that lower stringency conditions may be used. Under these conditions, hybridization may occur even though the sequences of probe and target strand are not perfectly complementary, but are mismatched at one or more positions. Conditions may be rendered less stringent by increasing salt concentration and decreasing temperature. For example, a medium stringency condition could be provided by about 0.1 to 0.25 M NaCl at temperatures of about 37°C to about 55°C, while a low stringency condition could be provided by about 0. 15 M to about 0.9 M salt, at temperatures ranging from about 20°C to about 55°C. Thus, hybridization conditions can be readily manipulated depending on the desired results.
In certain embodiments, it will be advantageous to determine the hybridization of oligonucleotides by employing a label. A wide variety of appropriate labels are known in the art, including fluorescent, radioactive, enzymatic or other ligands, such as avidin/biotin, which are capable of being detected. In preferred embodiments, one may desire to employ a fluorescent label or an enzyme tag such as urease, alkaline phosphatase or peroxidase, instead of radioactive or other environmentally undesirable reagents . In the case of enzyme tags, colorimetric indicator substrates are known that can be employed to provide a means for detection visible to the human eye or spectrophotometrically to identify whether specific hybridization with complementary oligonucleotide has occurred.
In embodiments involving a solid phase, for example, at least one oligonucleotide of an initiating polynucleotide is adsorbed or otherwise affixed to a selected matrix or surface. This fixed, single-stranded nucleic acid is then subjected to hybridization with the complementary oligonucleotides under desired conditions. The selected conditions will also depend on the particular circumstances based on the particular criteria required (depending, for example, on the G+C content, type of target nucleic acid, source of nucleic acid, size of hybridization probe, etc.) . Following washing of the hybridized surface to remove non- specifically bound oligonucleotides, the hybridization may be detected, or even quantified, by means of the label.
The method of the invention further provides a third polynucleotide present in the target polynucleotide which is contiguous with the initiating sequence and provides a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, where at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating polynucleotide which is not complementary to an overhang of the second polynucleotide.
The method further provides contacting the initiating polynucleotide with the second polynucleotide and the third polynucleotide under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, resulting in the bidirectional extension of the initiating polynucleotide. The annealed polynucleotides are optionally contacted with a ligase under conditions suitable for ligation. The method discussed above is optionally repeated to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation.
A target polynucleotide sequence can be designed de novo or derived from a "model polynucleotide sequence". As used herein, a "model polynucleotide sequence" includes any nucleic acid sequence that encodes a model polypeptide sequence. A model polypeptide sequence provides a basis for designing a modified polynucleotide such that a target polynucleotide incorporating the desired modification is synthesized. The present invention provides also provides methods that can be used to synthesize, de novo, polynucleotides that encode sets of genes, either naturally occurring genes expressed from natural or artificial promoter constructs or artificial genes derived from synthetic DNA sequences, which encode elements of biological systems that perform a specified function or attribution of an artificial organism as well as entire genomes. In producing such systems and genomes, the present invention provides the synthesis of a replication-competent , double-stranded polynucleotide, wherein the polynucleotide has an origin of replication, a first coding region and a first regulatory element directing the expression of the first coding region. By replication competent, it is meant that the polynucleotide is capable of directing its own replication. Thus, it is envisioned that the polynucleotide will possess all the cis-acting signals required to facilitate its own synthesis. In this respect, the polynucleotide will be similar to a plasmid or a virus, such that once placed within a cell, it is capable of replication by a combination of the polynucleotide*' s and cellular functions.
A polynucleotide sequence defining a gene, genome, set of genes or protein sequence can be designed in a computer- assisted manner (discussed below) and 'used to generate a set of parsed oligonucleotides covering the plus (+) and minus (-) strand of the sequence. As used herein, a "parsed" means a target polynucleotide sequence has been delineated in a computer-assisted manner such that a series of contiguous oligonucleotide sequences are identified. The oligonucleotide sequences are individually synthesized and used in a method of the invention to generate a target polynucleotide. The length of an oligonucleotide is quite variable. Preferably, oligonucleotides used in the methods of the invention are between about 15 and 100 bases and more preferably between about 20 and 50 bases. Specific lengths include, but are not limited to 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64. 65, 66 , 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 and 100 bases. Depending on the size, the overlap between the oligonucleotides having partial complementarity may be designed to be between 5 and 75 bases per oligonucleotide pair.
The oligonucleotides preferably are treated with polynucleotide kinase, for example, T4 polynucleotide kinase. The kinasing can be performed prior to, or after, mixing of the oligonucleotides set or after, but before annealing. After annealing, the oligonucleotides are treated with an enzyme having a ligating function. For example, a DNA ligase typically will be employed for this function. However, topoisomerase, which does not require 5' phosphorylation, is rapid and operates at room temperature, and may be used instead of ligase. For example, 50 base pair oligonucleotides overlapping by 25 bases can be synthesized by an oligonucleotide array synthesizer (OAS) . A 5' (+) strand set of oligonucleotides is synthesized in one 96-well plate and the second 3' or (-) strand set is synthesized in a second 96-well microtiter plate. Synthesis can be carried out using phosphoramidite chemistry modified to miniaturize the reaction size and generate small reaction volumes and yields in the range of 2 to 5 nmole. Synthesis is done on controlled pore glass beads (CPGs) , then the completed oligonucleotides are deblocked, deprotected and removed from the beads. The oligonucleotides are lyophilized, re-suspended in water and 5' phosphorylated using polynucleotide kinase and ATP to enable ligation.
The set of arrayed oligonucleotide sequences in the plate can be assembled using a mixed pooling strategy. For example, systematic pooling of component oligonucleotides can be performed using a modified Beckman Biomek automated pipetting robot, or another automated lab workstation. The fragments can be combined with buffer and enzyme (Taq I DNA ligase or Egea Assemblase™, for example) . Pooling can be performed in microwell plates. After each step of pooling, the temperature is ramped to enable annealing and ligation, then additional pooling carried out.
Target polynucleotide assembly involves forming a set of intermediates . A set of intermediates can include a plus strand oligonucleotide annealed to a minus strand oligonucleotide, as described above. The annealed intermediate can be formed by providing a single plus strand oligonucleotide annealed to a single minus strand oligonucleotide .
Alternatively, two or more oligonucleotides may comprise the plus strand or the minus strand. For example, in order to construct a polynucleotide (e.g., an initiating polynucleotide) which can be used to assemble a target polynucleotide of the invention, three or more oligonucleotides can be annealed. Thus, a first plus strand oligonucleotide, a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide, and a minus strand oligonucleotide having a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide can be annealed to form a partially double-stranded polynucleotide. The polynucleotide can include a 5' overhang, a 3' overhang, or a 5' overhang and a 3 ' overhang. The first plus strand oligonucleotide and second plus strand oligonucleotide are contiguous sequences such that they are ligatable. The minus strand oligonucleotide is partially complementary to both plus strand oligonucleotides and acts as a "bridge" or "stabilizer" sequence by annealing to both oligonucleotides. Subsequent polynucleotides comprised of more than two oligonucleotides annealed as previously described, can be used to assemble a target polynucleotide in a manner resulting in a contiguous double-stranded polynucleotide.
An example of using two or more plus strand oligonucleotides to assemble a polynucleotide is shown in Figure 3. A triplex of three oligonucleotides of about 50 bp each, which overlap by about 25 bp form a "nicked" intermediate. Two of these oligonucleotides provide a ligation substrate joined by ligase and the third oligonucleotide is a stabilizer that brings together two specific sequences by annealing resulting in the formation of a part of the final polynucleotide construct. This intermediate provides a substrate for DNA ligase which, through its nick sealing activity, joins the two 50-base pair oligonucleotides into a single 100 base single-stranded polynucleotide .
Following initial pooling and formation of annealed products, the products are assembled into increasingly larger polynucleotides. For example, following triplex formation of oligonucleotides, sets of triplexes are systematically joined, ligated, and assembled. Each step can be mediated by robotic pooling, ligation and thermal cycling to achieve annealing and denaturation. The final step joins assembled pieces into a complete sequence representing all of the fragments in the array. Since the efficiency of yield at each step is less than 100%, the mass amount of completed product in the final mixture may be very small. Optionally, additional specific oligonucleotide primers, usually 15 to 20 bases and complementary to the extreme ends of the assembly, can be annealed and PCR amplification carried out, thereby amplifying and purifying the final full-length product.
The methods of the invention provide several improvements over existing polynucleotide synthesis technology. For example, synthesis can utilize microdispensing piezioelectric or microsolenoid nanodispensors allowing very fast synthesis, much smaller reaction volumes and higher density plates as synthesis vessels. The instrument will use up to 1536 well plates giving a very high capacity. Additionally, controlled pooling can be performed by a microfluidic manifold that will move individual oligonucleotides though microchannels and mix/ligate in a controlled way. This will obviate the need for robotic pipetting and increases speed and efficiency. Thus, an apparatus that accomplishes a method of the invention will have a greater capability for simultaneous reactions giving an overall larger capacity for gene length.
Once target polynucleotide have been synthesized using a method of the present invention, it may be necessary to screen the sequences for analysis of function. Specifically contemplated by the present inventor are chip-based DNA technologies. Briefly, these techniques involve quantitative methods for analyzing large numbers of genes rapidly and accurately. By tagging genes with oligonucleotides or using fixed probe arrays, one can employ chip technology to segregate target molecules as high- density arrays and screen these molecules on the basis of hybridization.
The use of combinatorial synthesis and high throughput screening assays are well known to those of skill in the art. For example, U.S Patent Number 5,807,754; 5,807,683; 5,804,563; 5,789,162; 5,783,384; 5,770,358; 5,759,779; 5,747,334;5,686,242; 5,198,346; 5,738,996; 5,733, 743; 5,714,320; and 5,663,046 (each specifically incorporated herein by reference) describe screening systems useful for determining the activity of a target polypeptide . These patents teach various aspects of the methods and compositions involved in the assembly and activity analyses of high-density arrays of different polysubunits (polynucleotides or polypeptides) . As such it is contemplated that the methods and compositions described in the patents listed above may be useful in assaying the activity profiles of the target polypeptides of the present invention.
In another embodiment, the invention provides a method of synthesizing a target polynucleotide by providing a target polynucleotide sequence and identifying at least one initiating polynucleotide sequence present in the target polynucleotide sequence that includes at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a double-stranded polynucleotide. The initiating polynucleotide is contacted under conditions suitable for primer annealing with a first oligonucleotide having partial complementarity to the 3' portion of the plus strand of the initiating polynucleotide, and a second oligonucleotide having partial complementarity to the 3' portion of the minus strand of the initiating polynucleotide. Primer extension subsequently performed using polynucleotide synthesis from the 3' -hydroxyl of: 1) the plus strand of the initiating polynucleotide; 2) the annealed first oligonucleotide; 3) the minus strand of the initiating polynucleotide; and 4) the annealed second oligonucleotide. The synthesis results in the initiating sequence being extended bi-directionally thereby forming a nascent extended initiating polynucleotide. The extended initiating sequence can be further extended by repeated cycles of annealing and primer extension.
As previously noted, oligonucleotides can be used as building blocks to assemble polynucleotides through annealing and ligation reactions. Alternatively, oligonucleotides can be used as primers to manufacture polynucleotides through annealing and primer extension reactions. The term "primer" is used herein to refer to a binding element which comprises an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, i.e., in the presence of appropriate nucleotides and an agent for polymerization such as a DNA polymerase in an appropriate buffer ("buffer" includes pH, ionic strength, cofactors, etc.) and at a suitable temperature.
The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the agent for polymerization. The exact lengths of the primers will depend on many factors, including temperature and source of primer and use of the method. Primers having only short sequences capable of hybridization to the target nucleotide sequence generally require lower temperatures to form sufficiently stable hybrid complexes with the template.
The primers herein are selected to be "substantially" complementary to the different strands of each specific sequence to be amplified. This means that the primers must be sufficiently complementary to hybridize with their respective strands. Therefore, the primer sequence need not reflect the exact sequence of the template. Commonly, however, the primers have exact complementarity except with respect to analyses effected according to the method described in Nucleic Acids Research 17 (7) 2503-2516 (1989) or a corresponding method employing linear amplification or an amplification technique other than the polymerase chain reaction.
The agent for primer extension of an oligonucleotide may be any compound or system that will function to accomplish the synthesis of primer extension products, including enzymes . Suitable enzymes for this purpose include, for example, Ξ. coli DNA Polymerase I, Klenow fragment of E. coli DNA polymerase I, T4 DNA polymerase, other available DNA polymerases, reverse transcriptase, and other enzymes, including thermostable enzymes. The term "thermostable enzyme" as used herein refers to any enzyme that is stable to heat and is heat resistant and catalyses (facilitates) combination of the nucleotides in the proper manner to form the primer extension products which are complementary to each nucleic acid strand. Generally, the synthesis will be initiated at the 3 ' end of each primer and will proceed in the 5' direction along the template strand, until synthesis terminates. A preferred thermostable enzyme that may be employed in the process of the present invention is that which can be extracted and purified from Thermus aquations. Such an enzyme has a molecular weight of about 86,000- 90,000 daltons. Thermus aquaticus strain YT1 is available without restriction from the American Type Culture
Collection, 12301 Parklawn Drive, Rockville, Md., U.S.A. as
ATCC 25,104.
Processes for amplifying a desired target polynucleotide are known and have been described in the literature. K. Kleppe et al in J. Mol . Biol . , (1971), 56, 341-361 disclose a method for the amplification of a desired DNA sequence. The method involves denaturation of a DNA duplex to form single strands. The denaturation step is carried out in the presence of a sufficiently large excess of two nucleic acid primers that hybridize to regions adjacent to the desired DNA sequence. Upon cooling two structures are obtained each containing the full length of the template strand appropriately complexed with primer. DNA polymerase and a sufficient amount of each required nucleoside triphosphate are added whereby two molecules of the original duplex are obtained. The above cycle of denaturation, primer addition and extension are repeated until the appropriate number of copies of the desired target polynucleotide is obtained.
The present invention further provides a method for the expression and isolation of a target polypeptide encoded by a target polynucleotide. The method includes incorporating a target polynucleotide synthesized by a method of the invention into an expression vector; introducing the expression vector of into a suitable host cell; culturing the host cell under conditions and for such time as to promote the expression of the target polypeptide encoded by the target polynucleotide; and isolating the target polypeptide .
The invention can be used to modify certain functional, structural, or phylogenic features of a model polynucleotide encoding a model polypeptide resulting in an altered target polypeptide. An input or model polynucleotide sequence encoding a model polypeptide can be electronically manipulated to determine a potential for an effect of an amino acid change (or variance) at a particular site or multiple sites in the model polypeptide. Once identified, a novel target polynucleotide sequence is assembled by a method of the invention such that the target polynucleotide encodes a target polypeptide possessing a characteristic different from that of the model polypeptide.
The methods of the invention may rely on the use of public sequence and structure databases . These databases become more robust as more and more sequences and structures are added. Information regarding the amino acid sequence of a target polypeptide and the tertiary structure of the polypeptide can be used to synthesize oligonucleotides that can be assembled into a target polynucleotide encoding a target polypeptide. A model polypeptide should have sufficient structural information to analyze the amino acids involved in the function of the polypeptide. The structural information can be derived from x-ray crystallography, NMR, or some other technique for determining the structure of a protein at the amino acid or atomic level. Once selected, the sequence and structural information obtained from the model polypeptide can be used to generate a plurality of polynucleotides encoding a plurality of variant amino acid sequences that comprise a target polypeptide. Thus, a model polypeptide can be selected based on overall sequence similarity to the target protein or based on the presence of a portion having sequence similarity to a portion of the target polypeptide.
A "polypeptide", as used herein, is a polymer in which the monomers are alpha amino acids and are joined together through amide bonds. Amino acids may be the L-optical isomer or the D-optical isomer. Polypeptides are two or more amino acid monomers long and are often more than 20 amino acid monomers long. Standard abbreviations for amino acids are used (e.g., P for proline) . These abbreviations are included in Stryer, Biochemistry, Third Ed., 1988, which is incorporated herein by reference for all purposes. With respect to polypeptides, "isolated" refers to a polypeptide that constitutes the major component in a mixture of components, e.g., 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, or 95% or more by weight. Isolated polypeptides typically are obtained by purification from an organism in which the polypeptide has been produced, although chemical synthesis is also possible. Method of polypeptide purification includes, for example, chromatography or immunoaffinity techniques.
Polypeptides of the invention may be detected by sodium dodecyl sulphate (SDS) -polyacrylamide gel electrophoresis followed by Coomassie Blue-staining or Western blot analysis using monoclonal or polyclonal antibodies that have binding affinity for the polypeptide to be detected. A "chimeric polypeptide," as used herein, is a polypeptide containing portions of amino acid sequence derived from two or more different proteins, or two or more regions of the same protein that are not normally contiguous .
A "ligand", as used herein, is a molecule that is recognized by a receptor. Examples of ligands that can be investigated by this invention include, but are not restricted to, agonists and antagonists for cell membrane receptors, toxins and venoms, viral epitopes, hormones, opiates, steroids, peptides, enzyme substrates, cofactors, drugs, lectins, sugars, oligonucleotides, nucleic acids, oligosaccharides, and proteins.
A "receptor", as used herein, is a molecule that has an affinity for a ligand. Receptors may be naturally-occurring or manmade molecules . They can be employed in their unaltered state or as aggregates with other species.
Receptors may be attached, covalently or noncovalently, to a binding member, either directly or via a specific binding substance . Examples of receptors which can be employed by this invention include, but are not restricted to, antibodies, cell membrane receptors, monoclonal antibodies and antisera reactive with specific antigenic determinants, viruses, cells, drugs, polynucleotides, nucleic acids, peptides, cofactors, lectins, sugars, polysaccharides, cellular membranes, and organelles. A "ligand receptor pair" is formed when two molecules have combined through molecular recognition to form a complex.
Specific examples of polypeptides which can synthesized by this invention include but are not restricted to: a) Microorganism receptors: Determination of ligands that bind to microorganism receptors such as specific transport proteins or enzymes essential to survival of microorganisms would be a useful tool for discovering new classes of antibiotics. Of particular value would be antibiotics against opportunistic fungi, protozoa, and bacteria resistant to antibiotics in current use.
b) Enzymes: For instance, a receptor can comprise a binding site of an enzyme such as an enzyme responsible for cleaving a neurotransmitter; determination of ligands for this type of receptor to modulate the action of an enzyme that cleaves a neurotransmitter is useful in developing drugs that can be used in the treatment of disorders of neurotransmission.
c) Antibodies: For instance, the invention may be useful in investigating a receptor that comprises a ligand- binding site on an antibody molecule which combines with an epitope of an antigen of interest; determining a sequence that mimics an antigenic epitope may lead to the development of vaccines in which the immunogen is based on one or more of such sequences or lead to the development of related diagnostic agents or compounds useful in therapeutic treatments such as for autoimmune diseases (e.g., by blocking the binding of the "self" antibodies) .
d) Polynucleotides : Sequences of polynucleotides may be synthesized to establish DNA or RNA binding sequences that act as receptors for synthesized sequence.
e) Catalytic Polypeptides: Polymers, preferably antibodies, which are capable of promoting a chemical reaction involving the conversion of one or more reactants to one or more products. Such polypeptides generally include a binding site specific for at least one reactant or reaction intermediate and an active functionality proximate to the binding site, which functionality is capable of chemically modifying the bound reactant. Catalytic polypeptides and others are described in, for example, PCT
Publication No. WO 90/05746, WO 90/05749, and WO 90/05785, which are incorporated herein by reference for all purposes.
f) Hormone receptors: Identification of the ligands that bind with high affinity to a receptor such as the receptors for insulin and growth hormone is useful in the development of, for example, an oral replacement of the daily injections which diabetics must take to relieve the symptoms of diabetes or a replacement for growth hormone . Other examples of hormone receptors include the vasoconstrictive hormone receptors; determination of ligands for these receptors may lead to the development of drugs to control blood pressure.
g) Opiate receptors : Determination of ligands which bind to the opiate receptors in the brain is useful in the development of less-addictive replacements for morphine and related drugs .
In the context of a polypeptide, the term "structure" refers to the three dimensional arrangement of atoms in the protein. "Function" refers to any measurable property of a protein. Examples of protein function include, but are not limited to, catalysis, binding to other proteins, binding to non-protein molecules (e.g., drugs), and isomerization between two or more structural forms. "Biologically relevant protein" refers to any protein playing a role in the life of an organism. To identify significant structural motifs, the sequence of the model polypeptide is examined for matches to the entries in one or more databases of recognized domains, e.g., the PROSITE database domains (Bairoch, Nucl. Acids.
Res. 24:217, 1997) or the pfam HMM database (Bateman et al . , (2000) Nucl. Acids. Res. 28:263). The PROSITE database is a compilation of two types of sequence signatures-profiles, typically representing whole protein domains, and patterns typically representing just the most highly conserved functional or structural aspects of protein domains.
The methods of the invention can be used to generate polypeptides containing polymorphisms that have an effect on a catalytic activity of a target polypeptide or a non- catalytic activity of the target polypeptide (e.g., structure, stability, binding to a second protein or polypeptide chain, binding to a nucleic acid molecule, binding to a small molecule, and binding to a macromolecule that is neither a protein nor a nucleic acid) . For example, the invention provides a means for assembling any polynucleotide sequence encoding a target polypeptide such that the encoded polypeptide can be expressed and screened for a particular activity. By altering particular amino acids at specific points in the target polypeptide, the operating temperature, operating pH, or any other characteristic of a polypeptide can be manipulated resulting in a polypeptide with a unique activity. Thus, the methods of the invention can be used to identify amino acid substitutions that can be made to engineer the structure or function of a polypeptide of interest (e.g., to increase or decrease a selected activity or to add or remove a selective activity) . In addition, the methods of the invention can be used in the identification and analysis of candidate polymorphisms for polymorphism-specific targeting by pharmaceutical or diagnostic agents, for the identification and analysis of candidate polymorphisms for pharmacogenomic applications, and for experimental biochemical and structural analysis of pharmaceutical targets that exhibit amino acid polymorphism.
A library of target polynucleotides encoding a plurality of target polypeptides can be prepared by the present invention. Host cells are transformed by artificial introduction of the vectors containing the target polynucleotide by inoculation under conditions conducive for such transformation. The resultant libraries of transformed clones are then screened for clones which display activity for the polypeptide of interest in a phenotypic assay for activity.
A target polynucleotide of the invention can be incorporated (i.e., cloned) into an appropriate vector. For purposes of expression, the target sequences encoding a target polypeptide of the invention may be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to a plasmid, virus, or other vehicle known in the art that has been manipulated by insertion or incorporation of the polynucleotide sequence encoding a target polypeptide of the invention. The expression vector typically contains an origin of replication, a promoter, as well as specific genes that allow phenotypic selection of the transformed cells. Vectors suitable for use in the present invention include, but are not limited to, the T7-based expression vector for expression in bacteria (Rosenberg et al . , Gene, 56:125, 1987) , the pMSXND expression vector for expression in mammalian cells (Lee and Nathans, J. Biol . Chem. , 263:3521, 1988) , baculovirus-derived vectors for expression in insect cells, cauliflower mosaic virus, CaMV, tobacco mosaic virus, TMV.
Depending on the vector utilized, any of a number of suitable transcription and translation elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see, e.g., Bitter et al . , Methods in Enzymology, 153:516-544, 1987). These elements are well known to one of skill in the art .
The term "operably linked" or "operably associated" refers to functional linkage between the regulatory sequence and the polynucleotide sequence regulated by the regulatory sequence. The operably linked regulatory sequence controls the expression of the product expressed by the polynucleotide sequence. Alternatively, the functional linkage also includes an enhancer element.
"Promoter" means a nucleic acid regulatory sequence sufficient to direct transcription. Also included in the invention are those promoter elements that are sufficient to render promoter-dependent polynucleotide sequence expression controllable for cell-type specific, tissue specific, or inducible by external signals or agents; such elements may be located in the 5' or 3 ' regions of the native gene, or in the introns .
"Gene expression" or "polynucleotide sequence expression" means the process by which a nucleotide sequence undergoes successful transcription and translation such that detectable levels of the delivered nucleotide sequence are expressed in an amount and over a time period so that a functional biological effect is achieved.
In yeast, a number of vectors containing constitutive or inducible promoters may be used. (Current Protocols in Molecular Biology, Vol. 2, Ed. Ausubel et al . , Greene Publish. Assoc. & Wiley Interscience, Ch. 13, 1988; Grant et al., "Expression and Secretion Vectors for Yeast," in Methods in Enzymology, Eds. Wu & Grossman, Acad. Press,
N.Y., Vol. 153, pp.516-544, 1987; Glover, DNA Cloning, Vol. II, IRL Press, Wash., D.C., Ch. 3, 1986; "Bitter, Heterologous Gene Expression in Yeast," Methods in Enzymology, Eds. Berger & Kimmel, Acad. Press, N.Y., Vol. 152, pp. 673-684, 1987; and The Molecular Biology of the Yeast Saccharomyces, Eds. Strathern et al . , Cold Spring Harbor Press, Vols . I and II, 1982) . A constitutive yeast promoter, such as ADH or LEU2 , or an inducible promoter, such as GAL, may be used ("Cloning in Yeast," Ch. 3, R. Rothstein In: DNA Cloning Vol.11, A Practical Approach, Ed. DM Glover, IRL Press, Wash., D.C., 1986). Alternatively, vectors may be used which promote integration of foreign DNA sequences into the yeast chromosome .
In certain embodiments, it may be desirable to include specialized regions known as telomeres at the end of a target polynucleotide sequence. Telomeres are repeated sequences found at chromosome ends and it has long been known that chromosomes with truncated ends are unstable, tend to fuse with other chromosomes and are otherwise lost during cell division. Some data suggest that telomeres interact with the nucleoprotein complex and the nuclear matrix. One putative role for telomeres includes stabilizing chromosomes and shielding the ends from degradative enzyme.
Another possible role for telomeres is in replication.
According to present doctrine, replication of DNA requires starts from short RNA primers annealed to the T-end of the template. The result of this mechanism is an "end replication problem" in which the region corresponding to the RNA primer is not replicated. Over many cell divisions, this will result in the progressive truncation of the chromosome. It is thought that telomeres may provide a buffer against this effect, at least until they are themselves eliminated by this effect. A further structure that may be included in target polynucleotide is a centromere .
In certain embodiments of the invention, the delivery of a nucleic acid in a cell may be identified in vitro or in vivo by including a marker in the expression construct . The marker would result in an identifiable change to the transfected cell permitting easy identification of expression.
An expression vector of the invention can be used to transform a target cell. By "transformation" is meant a genetic change induced in a cell following incorporation of new DNA (i.e., DNA exogenous to the cell) . Where the cell is a mammalian cell, the genetic change is generally achieved by introduction of the DNA into the genome of the cell. By "transformed cell" is meant a cell into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques. Transformation of a host cell with recombinant DNA may be carried out by conventional techniques as are well known to those skilled in the art.
Where the host is prokaryotic, such as E. coli, competent cells that are capable of DNA uptake can be prepared from cells harvested after exponential growth phase and subsequently treated by the CaCl2 method by procedures well known in the art. Alternatively, MgCl2 or RbCl can be used.
Transformation can also be performed after forming a protoplast of the host cell or by electroporation.
A target polypeptide of the invention can be produced in prokaryotes by expression of nucleic acid encoding the polypeptide. These include, but are not limited to, microorganisms, such as bacteria transformed with recombinant bacteriophage DNA, plasmid DNA, or cosmid DNA expression vectors encoding a polypeptide of the invention. The constructs can be expressed in E. coli in large scale for in vitro assays. Purification from bacteria is simplified when the sequences include tags for one-step purification by nickel-chelate chromatography. The construct can also contain a tag to simplify isolation of the polypeptide. For example, a polyhistidine' tag of, e.g., six histidine residues, can be incorporated at the amino terminal end, or carboxy terminal end, of the protein. The polyhistidine tag allows convenient isolation of the protein in a single step by nickel-chelate chromatography. The target polypeptide of the invention can also be engineered to contain a cleavage site to aid in protein recovery. Alternatively, the polypeptides of the inventi"on can be expressed directly in a desired host cell for assays in situ.
When the host is a eukaryote, such methods of transfection of DNA as calcium phosphate co-precipitates, conventional mechanical procedures, such as microinjection, electroporation or biollistic techniques, insertion of a plasmid encased in liposomes, or virus vectors may be used.
Eukaryotic cells can also be cotransfected with DNA sequences encoding a polypeptide of the invention, and a second foreign DNA molecule encoding a selectable phenotype, such as the herpes simplex thymidine kinase gene . Another method is to use a eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papilloma virus, to transiently infect or transform eukaryotic cells and express the protein. (Eukaryotic Viral Vectors, Cold Spring Harbor
Laboratory, Gluzman ed., 1982). Preferably, a eukaryotic host is utilized as the host cell, as described herein.
Eukaryotic systems, and preferably mammalian expression systems, allow for proper post-translational modifications of expressed mammalian proteins to occur. Eukaryotic cells that possess the cellular machinery for proper processing of the primary transcript, glycosylation, phosphorylation, and advantageously secretion of the gene product should be used as host cells for the expression of the polypeptide of the invention. Such host cell lines may include, but are not limited to, CHO, VERO, BHK, HeLa, COS, MDCK, Jurkat, HEK- 293, and WI38.
For long-term, high-yield production of recombinant proteins, stable expression is preferred. Rather than using expression vectors that contain viral origins of replication, host cells can be transformed with the cDNA encoding a target polypeptide of the invention controlled by appropriate expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, polyadenylation sites, etc.), and a selectable marker. The selectable marker in the recombinant plasmid confers resistance to the selection and allows cells to stably integrate the plasmid into their chromosomes and grow to form foci that, in turn, can be cloned and expanded into cell lines. For example, following the introduction of foreign DNA, engineered cells may be allowed to grow for 1-2 days in an enriched media, and then are switched to a selective media. A number of selection systems may be used, including, but not limited to, the herpes simplex virus thymidine kinase (Wigler et al . , Cell, 11:223, 1977), hypoxanthine-guanine phosphoribosyltransferase (Szybalska & Szybalski, Proc . Natl . Acad. Sci . USA, 48:2026, 1962), and adenine phosphoribosyltransferase (Lowy et al . , Cell, 22:817, 1980) genes can be employed in tk- , hgprt- or aprt- cells, respectively. Also, antimetabolite resistance can be used as the basis of selection for dhfr, which confers resistance to methotrexate (Wigler et al . , Proc. Natl. Acad. Sci. USA, 77:3567, 1980; O 'Hare et al . , Proc. Natl. Acad. Sci. USA, 8:1527, 1981); gpt , which confers resistance to mycophenolic acid (Mulligan & Berg, Proc. Natl. Acad. Sci. USA, 78:2072, 1981; neo, which confers resistance to the aminoglycoside G-418 (Colberre-Garapin et al . , J. Mol . Biol . , 150:1, 1981); and hygro, which confers resistance to hygromycin genes (Santerre et al . , Gene, 30:147, 1984). Recently, additional selectable genes have been described, namely trpB, which allows cells to utilize indole in place of tryptophan; hisD, which allows cells to utilize histinol in place of histidine (Hartman & Mulligan, Proc. Natl. Acad. Sci. USA, 85:8047, 1988); and ODC (ornithine decarboxylase) , which confers resistance to the ornithine decarboxylase inhibitor, 2- (difluoromethyl) -DL-ornithine, DFMO (McConlogue L., In: Current Communications in Molecular Biology, Cold Spring Harbor Laboratory, ed. , 1987) . Techniques for the isolation and purification of either microbially or eukaryotically expressed polypeptides of the invention may be by any conventional means, such as, for example, preparative chromatographic separations and immunological separations, such as those involving the use of monoclonal or polyclonal antibodies or antigen.
A target polynucleotide, or expression construct containing a target polynucleotide, may be entrapped in a liposome. Liposomes are vesicular structures characterized by a phospholipid bilayer membrane and an inner aqueous medium. Multilarnellar liposomes have multiple lipid layers separated by aqueous medium and form spontaneously when phospholipids are suspended in an excess of aqueous solution. The lipid components undergo self-rearrangement before the formation of closed structures and entrap water and dissolved solutes between the lipid bilayers. The liposome may be complexed with a hernagglutinating virus (HVJ) . This has been shown to facilitate fusion with the cell membrane and promote cell entry of liposome- encapsulated DNA. In other embodiments, the liposome may be complexed or employed in conjunction with nuclear non- histone chromosomal proteins (HMG-1) . In yet further embodiments, the liposome may be complexed or employed in conjunction with both HVJ and HMG-1. In that such expression constructs have been successfully employed in transfer and expression of nucleic acid in vitro and in vivo, then they are applicable for the present invention. Where a bacterial promoter is employed in the DNA construct, it also will be desirable to include within the liposome an appropriate bacterial polymerase.
The present invention describes methods for enabling the creation of a target polynucleotide based upon information only, i.e., without the requirement for existing genes, DNA molecules or genomes. Generally, using computer software, it is possible to construct a virtual polynucleotide in the computer. This polynucleotide consists of a string of DNA bases, G, A, T or C, comprising for example an entire artificial polynucleotide sequence in a linear string. Following construction of a sequence, computer software is then used to parse the target sequence breaking it down into a set of overlapping oligonucleotides of specified length. This results in a set of shorter DNA sequences that overlap to cover the entire length of the target polynucleotide in overlapping sets.
Typically, a gene of 1000 bases pairs would be broken down into 20 100- mers where 10 of these comprise one strand and 10 of these comprise the other strand. They would be selected to overlap on each strand by 25 to 50 base pairs.
The degeneracy of the genetic code permits substantial freedom in the choice of codons for any particular amino acid sequence. Transgenic organisms such as plants frequently prefer particular codons that, though they encode the same protein, may differ from the codons in the organism from which the gene was derived. For example, U.S. Pat. No. 5,380,831 to Adang et al . describes the creation of insect resistant transgenic plants that express the Bacillus thuringiensis (Bt) toxin gene. The Bt crystal protein, an insect toxin, is encoded by a full-length gene that is poorly expressed in transgenic plants. In order to improve expression in plants, a synthetic gene encoding the protein containing codons preferred in plants was substituted for the natural sequence. The invention disclosed therein comprised a chemically synthesized gene encoding an insecticidal protein which is frequently equivalent to a native insecticidal protein of Bt . The synthetic gene was designed to be expressed in plants at a level higher than a native Bt gene .
In designing a target polynucleotide that encodes a particular polypeptide, the hydropathic index of amino acids may be considered. The importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art. Each amino acid has been assigned a hydropathic index on the basis of their hydrophobicity and charge characteristics, these are: Isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (47); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate (-3.5); glutamine (- 3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and arginine (45) .
It is known in the art that certain amino acids may be substituted by other amino acids having a similar hydropathic index or score and still result in a protein with similar biological activity, i.e., still obtain a biological functionally equivalent protein. In making such changes, the substitution of amino acids whose hydropathic indices are within ±2 is preferred, those which are within ± I are particularly preferred, and those within ±0.5 are even more particularly preferred.
It is also understood in the art that the substitution of like amino acids can be made effectively on the basis of hydrophilicity . U.S. Patent 4,554,101, incorporated herein by reference, states that the greatest local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino acids, correlates with a biological property of the protein.
As detailed in U.S. Patent 4,554,101, the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 + 1); glutarnate (+3.0 + 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (44); proline (-0.5 ± 1); alanine (45); histidine -0.5); cysteine (-1.0); methionine (-1.3); valine 1.5); leucine (-1.8); isoleucine
(-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (- 3.4) .
It is understood that an amino acid can be substituted for another having a similar hydrophilicity value and still obtain a biologically equivalent and immunologically equivalent polypeptide. In such changes, the substitution of amino acids whose hydrophilicity values are within ±2 is preferred, those that are within +1 are particularly preferred, and those within +0.5 are even more particularly preferred.
As outlined above, amino acid substitutions are generally based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like. Exemplary substitutions that take various of the foregoing characteristics into consideration are well known to those of skill in the art and include: arginine and lysine; glutarnate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine.
Aspects of the invention may be implemented in hardware or software, or a combination of both. However, preferably, the algorithms and processes of the invention are implemented in one or more computer programs executing on programmable computers each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device, and at least one output device. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
Each program may be implemented in any desired computer language (including machine, assembly, high level procedural, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.
Each such computer program is preferably stored on a storage medium or device (e.g., ROM, CD-ROM, tape, or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
Thus, in another embodiment, the invention provides a computer program, stored on a computer-readable medium, for generating a target polynucleotide sequence. The computer program includes instructions for causing a computer system to: 1) identify an initiating polynucleotide sequence contained in the target polynucleotide sequence; 2) parse the target polynucleotide sequence into multiply distinct, partially complementary, oligonucleotides; and 3) control assembly of the target polynucleotide sequence by controlling the bi-directional extension of the initiating polynucleotide sequence by the sequential addition of partially complementary oligonucleotides resulting in a contiguous double-stranded polynucleotide. The computer program will contain an algorithm for parsing the sequence of the target polynucleotide by generating a set of oligonucleotides corresponding to a polypeptide sequence. The algorithm utilizes a polypeptide sequence to generate a DNA sequence using a specified codon table. The algorithm then generates a set of parsed oligonucleotides corresponding to the (+) and (-) strands of the DNA sequence in the following manner:
1. The DNA sequence GENE [] , an array of bases, is generated from the protein sequence AA[], an array of amino acids, using a specified codon table. An example of the codon table for E. coli type II codons, is listed below. a. parameters i. N Length of protein in amino acid residues ii. L = 3N Length of gene in DNA bases iii. Q Length of each component oligonucleotide iv. X = Q/2 Length of overlap between oligonucleotides v. W = 3N/Q Number of oligonucleotides in the F set vi . Z = 3N/Q + 1 Number of oligonucleotides in the R set vii. F[1:W] set of (+) strand oligonucleotides viii . R [ : Z] set of (-) strand oligonucleotides ix. AA[1:N] array of amino acid residues x. GENE[1:L] array of bases comprising the gene b. Obtain or design a protein sequence AA[] consisting of a list of amino acid residues. c. Generate the DNA sequence, GENE [] , from the protein sequence, AA[] i . For I = 1 to N ii. Translate AA[J] from codon table generating GENE [I: 1+2] iii. 1 = 1 + 3 iv. J = J+ 1 v. Go to ii
2. Two sets of overlapping oligonucleotides are generated from GENE [] ; F [] covers the (+) strand and R[] is a complementary, partially overlapping set covering the (-) strand. a. Generate the F [] set of oligos i. For I = 1 to W ii. F[I] •***- GENE [I:I+Q-1] iii. I = I + Q iv. Go to ii b. Generate the R set of oligos i. J = W ii. For I = 1 to W iii. R[I] **= GENE [W:W-Q] iv. J = J - Q v. Go to iii c. Result is two set of oligos F [] and R[] of Q length d. Generate the final two finishing oligos
Figure imgf000053_0001
ii. S[2] = GENE [L-Q/2:L]
Subsequently, oligonucleotide set assembly is established by the following algorithm:
Two sets of oligonucleotides F[1:W] R[1:Z] S[l:2] 3. Step 1 a. For I = 1 to W b . Ligate F [I] , F [I + 1] , R [I] ; place in T [ I] c . Ligate F [I+2] , R [I+1] , R [I+2] T [I+1] d . I = I + 3 e . Go to b
4 . Step 2 a. Do the following until only a single reaction remains i. For I = 1 to W/3 ii. Ligate T[I] , T[I+1] iii. 1 = 1 + 2 iv. Go to ii CODON TABLE (E. coli Class II preferred usage)
Figure imgf000054_0001
Algorithms of the invention useful for assembly of a target polynucleotide can further be described as Perl script as set forth below. ALGORITHM 1 provides a method for converting a protein sequence into a polynucleotide sequence using E. Coli codons: #$sequence is the protein sequence in single letter amino acid code
#$seqlen is the length of the protein sequence
#$amino acid is the individual amino acid in the sequence
#$codon is the individual DNA triplet codon in the Gene sequence
#$DNAsquence is the gene sequence in DNA bases
#$baselen is the length of the DNA sequence in bases
$seqlen = length ($sequence) $baselen = $seqlen * 3; for ($n = 0; $n <= $seqlen; $n++) {
$aminoacid = substr ($sequence, $n, 1) ;
The following list provides the class II codon preference in Perl for E. coli if ($aminoacid eq "m") {$codon "ATG" ; } elsif ($aminoacid eq $codon = "TTC" elsif ($aminoacid eq $codon = "CTG" elsif ($aminoacid eq $codon = "TCT" elsif ($aminoacid eq $codon = "TAC" elsif ($aminoacid eq $codon = "TGC" elsif ($aminoacid eq $codon = "TGG" elsif ($aminoacid eq $codon = "ATC" elsif ($aminoacid eq $codon = "ACC" elsif ($aminoacid eq $codon = "CCG" elsif ($aminoacid eq $codon = "CAG" elsif ($aminoacid eq $codon = elsif ($aminoacid eq $codon = "GTT" elsif ($aminoacid eq $codon = "GCG" elsif ($aminoacid eq $codon = "AAC" elsif ($aminoacid eq $codon = "AAA" elsif ($aminoacid eq $codon = "GAC" elsif ($aminoacid eq $codon = "GAA" elsif ($aminoacid eq $codon = "GGT" elsif ($aminoacid eq
Figure imgf000055_0001
$codon = "CAC" else {$codon = ""};
$DNAsequence = $DNAsequence + $codon;
ALGORITHM 2 provides a method for parsing a polynucleotide sequence into component forward and reverse oligonucleotides that can be reassembled into a complete target polynucleotide encoding a target polypeptide: #$oligoname is the identifier name for the list and for each component #oligonucleotide
#$OL is the length of each component oligonucleotide #$Overlap is the length of the overlap in bases between each forward and each #reverse oligonucleotide #$sequence is the DNA sequence in bases #$seqlen is the length of the DNA sequence in bases #$bas is the individual base in a sequence #$forseq is the sequence of a forward oligonucleotide #$revseq is the sequence of a reverse oligonucleotide #$revcomp is the reverse complemented sequence of the gene #$oligonameF- [] is the list of parsed forward oligos #$oligonameR- [] is the list of parsed reverse oligos
$Overlap = <STDIN>;
$seqlen = length ($sequence) ; #convert forward sequence to upper case if lower case
$forseq = " " ; for ($j = 0; $j <= seqlen-1; $j ++) { $bas = substr ($sequence, $j,D ; if ($bas eq "a"){$cfor = "A" ; } elsif ($bas eq "t" {$cfor II *T* II elsif ($bas eq "c" ($cfor "C" elsif ($bas eq llgl. {$cfor "G" elsif ($bas eq "A" {$cfor "A" elsif ($bas eq II *p II {$cfor "T" elsif ($bas eq "C" {$cfor "C" elsif ($bas eq "G" {$cfor "G" else {$cfor = "X"}
$forseq = $forseq. $cfor print OUT "$j \n" ;
}
The reverse complement of the sequence generated above is identified by:
$revcomp = " " ; for ($i = $seqlen-l; $i >= 0; $i--) { $base = substr ($sequence, $i, 1) ; if ($base eq "a"){$comp = "T" ; } elsif ($base eq "t" $comp = "A" ; } elsif ($base eq llgll $comp = "C"; } elsif ($base eq "C" $comp = "G" ; } elsif ($base eq "A" $comp = II rp 11 . elsif ($base eq II *J>II $comp = "A" ; } elsif ($base eq »G" $comp = "C" ; } elsif ($base eq "C" $comp = "G" ; } else {$comp = "X"}; $revcomp = $revcomp. $comp;
}
#now do the parsing
#generate the forward oligo list print OUT "Forward oligos\n" ; print "Forward oligos\n"; $r = 1; for ($i = 0; $i <= $seqlen -1; $i+=$OL) { $oligo = substr ($sequence, $i, $OL) ; print OUT "$oligname F- $r $oligo\n" ; print "$oligname F- $r $oligo\n" ; $r = $r + 1;
}
#generate the forward reverse list
$r = 1; for ($i = $seqlen - $Overlap - $OL; $i >= 0; $i-=$OL)
{ print OUT "\n"; print "\n";
$oligo = substr ($revcomp, $i, $OL) ; print OUT "$oligname R- $r $oligo" ; print "$oligname R- $r $oligo";
$r = $r + 1; }
#Rectify and print out the last reverse oligo consisting of 1/2 from the beginning # of the reverse complement. $oligo = substr ($revcomp, 1, $Overlap) ; print OUT "$oligo\n"; print "$oligo\n";
The invention further provides a computer-assisted method for synthesizing a target polynucleotide encoding a target polypeptide derived from a model sequence using a programmed computer including a processor, an input device, and an output device, by inputting into the programmed computer, through the input device, data including at least a portion of the target polynucleotide sequence encoding a target polypeptide. Subsequently, the sequence of at least one initiating polynucleotide present in the target polynucleotide sequence is determined and a model for synthesizing the target polynucleotide sequence is derived.
The model is based on the position of the initiating sequence in the target polynucleotide sequence using overall sequence parameters necessary for expression of the target polypeptide in a biological system. The information is outputted to an output device which provides the means for synthesizing and assembling to target polynucleotide.
It is understood that any apparatus suitable for polynucleotide synthesis can be used in the present invention. Various non-limiting examples of apparatus, components, assemblies and methods are described below. For example, in one embodiment, it is contemplated that a nanodispensing head with up to 16 valves can be used to deposit synthesis chemicals in assembly vessels (Figure 4) . Chemicals can be controlled using a syringe pump from the reagent reservoir. Because of the speed and capability of the ink-jet dispensing system, synthesis can be made very small and very rapid. Underlying the reaction chambers is a set of assembly vessels linked to microchannels that will move fluids by microfluidics . The configuration of the channels will pool pairs and triplexes of oligonucleotides systematically using, for example, a robotic device. However, pooling can be accomplished using fluidics and without moving parts.
As shown in Figure 5, oligonucleotide synthesis, oligonucleotide assembly by pooling and annealing, and ligation can be done using microfluidic mixing, resulting in the same set of critical triplex intermediates that serves as the substrate for annealing, ligation and oligonucleotide joining. DNA ligase and other components can be placed in the buffer fluid moving through the instrument microchambers . Thus, synthesis and assembly can be carried out in a highly controlled way in the same instrument.
As shown in Figure 6, the pooling manifold can be produced from non-porous plastic and designed to control sequential pooling of oligonucleotides synthesized in arrays. Oligonucleotide parsing from a gene sequence designed in the computer can be programmed for synthesis where (+) and (-) strands are placed in alternating wells of the array. Following synthesis in this format, the 12 row sequences of the gene are directed into the pooling manifold that systematically pools three wells into reaction vessels forming the critical triplex structure. Following temperature cycling for annealing and ligation, four sets of triplexes are pooled into 2 sets of 6 oligonucleotide products, then 1 set of 12 oligonucleotide products. Each row of the synthetic array is associated with a similar manifold resulting in the first stage of assembly of 8 sets of assembled oligonucleotides representing 12 oligonucleotides each. As shown in Figure 7, the second manifold pooling stage is controlled by a single manifold that pools the 8 row assemblies into a single complete assembly. Passage of the oligonucleotide components through the two manifold assemblies (the first 8 and the second single) results in the complete assembly of all 96 oligonucleotides from the array. The assembly module (Figure 8) of Genewriter™ can include a complete set of 7 pooling manifolds produced using microfabrication in a single plastic block that sits below the synthesis vessels . Various configurations of the pooling manifold will allow assembly of 96,384 or 1536 well arrays of parsed component oligonucleotides . The initial configuration is designed for the assembly of 96 oligonucleotides synthesized in a pre-defined array, composed of 48 pairs of overlapping 50 mers. Passage through the assembly device in the presence of DNA ligase and other appropriate buffer and chemical components, and with appropriate temperature controls on the device, will assembly these into a single 2400base double stranded gene assembly (Figure 9) .
The basic pooling device design can be made of
Plexiglas™ or other type of co-polymer with microgrooves or microfluidic channels etched into the surface and with a temperature control element such as a Peltier circuit underlying the junction of the channels. This results in a microreaction vessel at the junction of two channels for 1) mixing of the two streams, 2) controlled temperature maintenance or cycling a the site of the junction and 3) expulsion of the ligated mixture from the exit channel into the next set of pooling and ligation chambers.
As shown in Figure 11, the assembly platform design can consist of 8 synthesis microwell plates in a 96 well configuration, addressed with 16 channels of microdispensing. Below each plate is: 1) an evacuation manifold for removing synthesis components; and 2) an assembly manifold based on the schematic in Figure 9 for assembling component oligonucleotides from each 96-well array. Figure 12 shows a higher capacity assembly format using 1536-well microplates and capable of synthesis of 1536 component oligonucleotides per plate. Below each plate is: 1) an evacuation manifold for removing synthesis components; and 2) an assembly manifold assembly for assembling 1536 component oligonucleotides from each 1536-well array. Pooling and assembly strategies can be based on the concepts used for 96-well plates.
An alternative assembly format includes using surface- bound oligonucleotide synthesis rather than soluble synthesis on CPG glass beads (Figure 13) . In this configuration, oligonucleotides are synthesized with a hydrocarbon linker that allows attachment to a solid support. Following parsing of component sequences and synthesis, the synthesized oligonucleotides are covalently attached to a solid support such that the stabilizer is attached and the two ligation substrates added to the overlying solution. Ligation occurs as mediated by DNA ligase in the solution and increasing temperature above the Tm removes the linked oligonucleotides by thermal melting. As shown in Figure 14 the systematic assembly on a solid support of a set of parsed component oligonucleotides can be arranged in an array with the set of stabilizer oligonucletoide attached. The set of ligation substrate oligonucleotides are placed in the solution and, systematic assembly is carried out in the solid phase by sequential annealing, ligation and melting which moves the growing DNA molecules across the membrane surface.
Figure 15 shows an additional alternative means for oligonucleotide assembly, by binding the component oligonucleotides to a set of metal electrodes on a microelectronic chip, where each electrode can be controlled independently with respect to current and voltage. The array contains the set of minus strand oligonucleotides. Placing a positive change on the electrode will move by electrophoresis the component ligase substrate oligonucleotide onto the surface where annealing takes place. The presence of DNA ligase mediates covalent joining or ligation of the components. The electrode is then turned off or a negative charge is applied and the DNA molecule expulsed from the electrode. The next array element containing the next stabilizer oligonucleotide from the parsed set is turned on with a positive charge and a second annealing, joining and ligation with the next oligonucleotide in the set carried out. Systematic and repetitive application of voltage control, annealing, ligation and denaturation will result in the movement of the growing chain across the surface as well as assembly of the components into a complete DNA molecule .
The invention further provides methods for the automated synthesis of target polynucleotides. For example, a desired sequence can be ordered by any means of communication available to a user wishing to order such a sequence. A "user", as used herein, is any entity capable of communicating a desired polynucleotide sequence to a server. The sequence may be transmitted by any means of communication available to the user and receivable by a server. The user can be provided with a unique designation such that the user can obtain information regarding the synthesis of the polynucleotide during synthesis. Once obtained, the transmitted target polynucleotide sequence can be synthesized by any method set forth in the present invention.
The invention further provides a method for automated synthesis of a polynucleotide, by providing a user with a mechanism for communicating a model polynucleotide sequence and optionally providing the user with an opportunity to communicate at least one desired modification to the model sequence. The invention envisions a user providing a model sequence and a desired modification to that sequence which results in the alteration of the model sequence. Any modification that alters the expression, function or activity of a target polynucleotide or encoded target polypeptide can be communicated by the user such that a modified polynucleotide or polypeptide is synthesized or expressed according to a method of the invention. For example, a model polynucleotide encoding a polypeptide normally expressed in a eukaryotic system can be altered such that the codons of the resulting target polynucleotide are conducive for expression of the polypeptide in a prokaryotic system. In addition, the user can indicate a desired modified activity of a polypeptide encoded by a model polynucleotide. Once provided, the algorithms and methods of the present invention can be used to synthesize a target polynucleotide encoding a target polypeptide believed to have the desired modified activity. The methods of the invention can be further utilized to express the target polypeptide and to screen for the desired activity. It is understood that the methods of the invention provide a means for synthetic evolution whereby any parameter of polynucleotide expression and/or polypeptide activity can be altered as desired.
Once the transmitted model sequence and desired modification are provided by the user, the data including at least a portion of the model polynucleotide sequence is inputted into a programmed computer, through an input device. Once inputted, the algorithms of the invention are used to determine the sequence of the model polynucleotide sequence containing the desired modification and resulting in a target polynucleotide containing the modification. Subsequently, the processor and algorithms of the invention is used to identify at least one initiating polynucleotide sequence present in the polynucleotide sequence. A target polynucleotide (i.e., a modified model polynucleotide) is identified and synthesized.
EXAMPLES Nucleic Acid Synthesis Design Protocol
For the purposes of assembling a synthetic nucleic acid sequence encoding a target polypeptide, a model polypeptide sequence or nucleic acid sequence is obtained and analyzed using a suitable DNA analysis package, such as, for example, MacVector or DNA Star. If the target protein will be expressed in a bacterial system, for example, the model sequence can be converted to a sequence encoding a polypeptide utilizing E. coli preferred codons (i.e., Type I, Type II or Type II codon preference) . The present invention provides the conversion programs Codon I, Codon II or Codon III. A nucleic acid sequence of the invention can be designed to accommodate any codon preference of any prokaryotic or eucaryotic organism.
In addition to the above codon preferences, specific promoter, enhancer, replication or drug resistance sequences can be included in a synthetic nucleic acid sequence of the invention. The length of the construction can be adjusted by padding to give a round number of bases based on about 25 to 100 bp synthesis. The synthesis of sequences of about 25 to 100 bp in length can be manufactured and assembled using the array synthesizer system and may be used without further purification. For example, two 96-well plates containing 100 -mers could give a 9600 bp construction of a target sequence .
Subsequent to the design of the oligonucleotides needed for assembly of the target sequence, the oligonucleotides are parsed using ParseOligo™, a proprietary computer program that optimizes nucleic acid sequence assembly. Optional steps in sequence assembly include identifying and eliminating sequences that may give rise to hairpins, repeats or other difficult sequences. The parsed oligonucleotide list is transferred to the Synthesizer driver software. The individual oligonucleotides are pasted into the wells and oligonucleotide synthesis is accomplished.
Assembly of Parsed Oligonucleotides Using a Two-Step PCR Reaction:
Obtain arrayed sets of parsed overlapping oligonucleotides, 50 bases each, with an overlap of about 25 base pairs (bp) . The oligonucleotide concentration is from 250 nM (250 μM/ml) . 50 base oligos give Tms from 75 to 85 degrees C, 6 to 10 od260, 11 to 15 nanomoles, 150 to 300 μg. Resuspend in 50 to 100 μl of H20 to make 250 nM/ml . Combine equal amounts of each oligonucleotide to final concentration of 250 μM (250 nM/ml) . Add 1 μl of each to give 192 μl . Add 8 μl dH20 to bring up to 200 μl . Final concentration is 250 μM mixed oligos. Dilute 250-fold by taking 10 μl of mixed oligos and add to 1 ml of water. (1/100; 2.5 μM ) then take 1 μl of this and add to 24 μl IX PCR mix. The PCR reaction includes :
10 mM TRIS-HC1, pH 9.0 2.2 mM MgCl2 50 mM KC1 0 , 2 mM each dNTP
0.1% Triton X-100 One U Taql polymerase is added to the reaction. The reaction is thermoycled under the following conditions a. Assembly i. 55 cycles of
1. 94 degrees 30 s
2. 52 degrees 30s 3. 72 degrees 30s
Following assembly amplification, take 2.5 μl of this assembly mix and add to 100 μl of PCR mix. (40X dilution) . Prepare outside primers by taking 1 μl of FI (forward primer) and 1 μl of R96 (reverse primer) at 250 μM (250 nm/ml - .250 nmole/μl) and add to the 100 μl PCR reaction. This gives a final concentration of 2.5 uM each oligo. Add 1 U Taql polymerase and thermocycle under the following conditions :
35 cycles (or original protocol 23 cycles) 94 degrees 30s
50 degrees 30s 72 degrees 60s Extract with phenol/chloroform. Precipitate with ethanol . Resuspend in 10 μl of dH20 and analyze on an agarose gel.
Assembly of Parsed Oligonucleotides Using Taql Ligation
Arrayed sets of parsed overlapping oligonucleotides of about 25 to 150 bases in length each, with an overlap of about 12 to 75 base pairs (bp) , are obtained. The oligonucleotide concentration is from 250 nM (250 μM/ml) . For example, 50 base oligos give Tms from 75 to 85 degrees C, 6 to 10 od260, 11 to 15 nanomoles, 150 to 300 μg. Resuspend in 50 to 100 ml of H20 to make 250 nM/ml.
Using a robotic workstation, equal amounts of forward and reverse oligos are combined pairwise. Take 10 μl of forward and 10 μl of reverse oligo and mix in a new 96-well v-bottom plate. This gives one array with sets of duplex oligonucleotides at 250 μμ, according to pooling scheme Step 1 in Table 1. Prepare an assembly plate by taking 2 μl of each oligomer pair and adding to a fresh plate containing 100 μl of ligation mix in each well. This gives an effective concentration of 2.5 μM or 2.5 nM/ml. Transfer 20 μl of each well to a fresh microwell plate and add 1 μl of T4 polynucleotide kinase and 1 μl of 1 mM ATP to each well . Each reaction will have 50 pmoles of oligonucleotide and 1 nmole ATP. Incubate at 37 degrees C for 30 minutes.
Initiate assembly according to Steps 2-7 of Table 1. Carry out pooling Step 2 mixing each successive well with the next. Add 1 μl of Taql ligase to each mixed well. Cycle once at 94 degrees for 30 sec; 52 degrees for 30s ; then 72 degrees for 10 minutes.
Carry out step 3 (Table 1) of pooling scheme and cycle according to the temperature scheme above . Carry out steps 4 and 5 of the pooling scheme and cycle according to the temperature scheme above. Carry out pooling scheme step 6 and take 10 μl of each mix into a fresh microwell. Carry out step 7 pooling scheme by pooling the remaining three wells. Reaction volumes will be: Initial plate has 20 ul per well.
Step 2 20 ul + 20 ul = 40 ul Step 3 80 ul Step 4 160 ul
Step 5 230 ul Step 6 10 ul + lOul = 20 ul
Step 7 20 + 20 + 20 = 60 ul final reaction volume A final PCR amplification was then performed by taking 2 ul of final ligation mix and add to 20 ul of PCR mix containing 10 mM TRIS-HC1, pH 9.0, 2.2 mM MgCl2, 50 mM KCl, 0.2 mM each dNTP and 0.1% Triton X-100
Prepare outside primers by taking 1 μl of FI (forward primer) and 1 μl of R96 (reverse primer) at 250 μM (250 nm/ml - .250 nmole/μl) and add to the 100 μl PCR reaction giving a final concentration of 2.5 uM each oligo. Add 1 U Taql polymerase and cycle for 35 cycles under the following conditions: 94 degrees for 30s; 50 degrees for 30s; and 72 degrees for 60s. Extract the mixture with phenol/chloroform. Precipitate with ethanol . Resuspend in 10 μl of dH0 and analyze on an agarose gel .
Table 1. Pooling scheme for ligation assembly.
Ligation method - Well pooling scheme
STEP FROM TO STEP FROM TO
All F All R A2 A4
A6 A8
Al A2 A10 A12
A3 A4 B2 B4
A5 A6 B6 B8
A7 A8 B10 B12
A9 A10 C2 C4
All A12 C6 C8
Bl B2 CIO C12
B3 B4 D2 D4
B5 B6 D6 D8
B7 B8 D10 D12
B9 B10 E2 E4 Bll B12 E6 E8
Cl C2 E10 E12
C3 C4 F2 F4
C5 C6 F6 F8
C7 C8 F10 F12
C9 CIO G2 G4
Cll C12 G6 G8
Dl D2 G10 G12
D3 D4 H2 H4
D5 D6 H6 H8
D7 D8 H10 H12
D9 D10
Dll D12 A4 A8
El E2 A12 B4
E3 E4 B8 B12
E5 E6 C4 C8
E7 E8 C12 D4
E9 E10 D8 D12
Ell E12 E4 E8
FI F2 E12 F4
F3 F4 F8 F12
F5 F6 G4 G8
F7 F8 G12 H4
F9 F10 H8 H12
Fll F12
Gl G2 A8 B4
G3 G4 B12 C8
G5 G6 D4 D12
G7 G8 E8 F4
G9 G10 F12 G8
Gil G12 H4 H12
HI H2
H3 H4 B4 C8
H5 H6 D12 F4 H7 H8 G8 H12
H9 H10
Hll H12 C8 F4 H12
Assembly of Parsed Oligonucleotides Using Tag I Synthesis and Assembly
Arrayed sets of parsed overlapping oligonucleotides of about 25 to 150 bases in length each, with an overlap of about 12 to 75 base pairs (bp) , are obtained. The oligonucleotide concentration is from 250 nM (250 μM/ml) . 50 base oligos give Tms from 75 to 85 degrees C, 6 to 10 od26o, 11 to 15 nanomoles, 150 to 300 μg. Resuspend in 50 to 100 ml of H20 to make 250 nM/ml.
The invention envisions using a robotic workstation to accomplish nucleic acid assembly. In the present example, two working plates containing forward and reverse oligonucleotides in a PCR mix at 2.5 mM are prepared and 1 μl of each oligo are added to 100 μl of PCR mix in a fresh microwell providing one plate of forward and one of reverse oligos in an array. Cycling assembly is then initiated as follows according to the pooling scheme outlined in Table 1. In the present example, 96 cycles of assembly can be accomplished according to this scheme.
Remove 2 μl of well F-El to a fresh well; remove 2 μl of R-El to a fresh well; add 18 μl of IX PCR mix; add 1 U of Taql polymerase;
Cycle once: 94 degrees 30 s
52 degrees 30 s 72 degrees 30 s Subsequently, remove 2 μl of well F-E2 to the reaction vessel; remove 2 μl of well R-D12 to the reaction vessel. Cycle once according to the temperatures above. Repeat the pooling and cycling according to the scheme outlined in Table 1 for about 96 cycles.
A PCR amplification is then performed by taking 2 Dl of final reaction mix and adding it to 20 μl of a PCR mix comprising: 10 mM TRIS-HC1, pH 9.0
2.2 mM MgCl2 50 mM KC1 0.2 mM each dNTP 0.1% Triton X-100
Outside primers are prepared by taking 1 μl of FI and 1 μl of R96 at 250 mM (250 nm/ml - .250 nmole/ml) and add to the 100 μl PCR reaction. This gives a final concentration of 2.5 μM each oligo. 1 U Taql polymerase is subsequently added and the reaction is cycled for about 23 to 35 cycles under the following conditions:
94 degrees 30s 50 degrees 30s 72 degrees 60s The reaction is subsequently extracted with phenol/chloroform, precipitated with ethanol and resuspend in 10 ml of dH20 for analysis on an agarose gel.
Equal amounts of forward and reverse oligos pairwise are added by taking 10 μl of forward and 10 μl of reverse oligo and mix in a new 96-well v-bottom plate. This provides one array with sets of duplex oligonucleotides at 250 mM, according to pooling scheme Step 1 in Table 1. An assembly plate was prepared by taking 2 μl of each oligomer pair and adding them to the plate containing 100 μl of ligation mix in each well. This gives an effective concentration of 2.5 μM or 2.5 nM/ml. About 20 μl of each well is transferred to a fresh microwell plate in addition to 1 μl of T4 polynucleotide kinase and 1 μl of 1 mM ATP. Each reaction will have 50 pmoles of oligonucleotide and 1 nmole ATP. Incubate at 37 degrees for 30 minutes.
Nucleic acid assembly was initiated according to Steps 2-7 of Table 1. Step 2 pooling is carried out by mixing each well with the next well in succession. 1 μl of Taql ligase to is added to each mixed well and cycled once as follows : 94 degrees 30 sec
52 degrees 30s 72 degrees 10 minutes Step 3 of pooling scheme is carried out and cycled according to the temperature scheme above . Steps 4 and 5 of the pooling scheme are carried out and cycled according to the temperature scheme above. Carry out pooling scheme step 6 and take 10 μl of each mix into a fresh microwell. Step 7 pooling scheme is carried out by pooling the remaining three wells. The reaction volumes will be (initial plate has 20 μl per well) :
Step 2 20 μl + 20 μl = 40 μl Step 3 80 μl
Step 4 160 μl Step 5 230 μl Step 6 10 μl + lOμl = 20 μl
Step 7 20 + 20 + 20 = 60 μl final reaction volume A final PCR amplification is performed by taking 2 μl of the final ligation mix and adding it to 20 μl of PCR mix comprising:
10 mM TRIS-HC1, pH 9.0 2.2 mM MgC12
50 mM KC1 0.2 mM each dNTP 0.1% Triton X-100 Outside primers are prepared by taking 1 μl of FI and 1 μl of R96 at 250 mM (250 nm/ml - .250 nmole/ml) and adding them to the 100 μl PCR reaction giving a final concentration of 2.5 uM for each oligo. Subsequentlly, 1 U of Taql polymerase is added and cycled for about 23 to 35 cycles under the following conditions: 94 degrees 30s
50 degrees 30s 72 degrees 60s The product is extracted with phenol/chloroform, precipitate with ethanol, resuspend in 10 μl of dH20 and analyzed on an agarose gel .
Table 2. Pooling scheme for assembly using Taql polymerase (also topoisomerase II) .
Step Forvrard .igo Reverse oligo
1 F E 1 + R E 1 Pause
2 F E 2 + R D 12 Pause
3 F E 3 + R D 11 Pause
4 F E 4 + R D 10 Pause
5 F E 5 + R D 9 Pause
6 F E 6 + R D 8 Pause
7 F E 7 + R D 7 Pause
8 F E 8 + R D 6 Pause 9 F E 9 + R D 5 Pause
10 F E 10 + R D 4 Pause
11 F E 11 + R D 3 Pause
12 F E 12 + R D 2 Pause
13 F F 1 + R D 1 Pause
14 F F 2 + R C 12 Pause
15 F F 3 + R C 11 Pause
16 F F 4 + R C 10 Pause
17 F F 5 + R c 9 Pause
18 F F 6 + R c 8 Pause
19 F F 7 + R c 7 Pause
20 F F 8 + R c 6 Pause
21 F F 9 + R c 5 Pause
22 F F 10 + R c 4 Pause
23 F F 11 + R c 3 Pause
24 F F 12 + R c 2 Pause
25 F G 1 + R c 1 Pause
26 F G 2 + R B 12 Pause
27 F G 3 + R B 11 Pause
28 F G 4 + R B 10 Pause
29 F G 5 + R B 9 Pause
30 F G 6 + R B 8 Pause
31 F G 7 + R B 7 Pause
32 F G 8 + R B 6 Pause
33 F G 9 + R B 5 Pause
34 F G 10 + R B 4 Pause
35 F G 11 + R B 3 Pause
36 F G 12 + R B 2 Pause
37 F H 1 + R B 1 Pause
38 F H 2 + R A 12 Pause
39 F H 3 + R A 11 Pause
40 F H 4 + R A 10 Pause
41 F H 5 + R A 9 Pause
42 F H 6 + R A 8 Pause 43 F H 7 + R A 7 Pause
44 F H 8 + R A 6 Pause
45 F H 9 + R A 5 Pause
46 F H 10 + R A 4 Pause
47 F H 11 + R A 3 Pause
48 F H 12 + R A 2 Pause
Table 3. Alternate pooling scheme (initiating assembly from the 5' or 3' end)
1. F-Al - R-Al denature, anneal, polymerase extension
2. F-A2 - R-H12 denature, anneal, polymerase extension
3. F-A3 - R-Hll denature anneal, polymerase extension
4. F-A4 - R-H10 denature anneal, polymerase extension
5. F-A5 - R-H9 denature anneal, polymerase extension
6. F-A6 -> R-H8 denature anneal, polymerase extension
7. F-A7 -> R-H7 denature anneal, polymerase extension
8. F-A8 - R-H6 denature anneal, polymerase extension
9. F-A9 - R-H5 denature anneal, polymerase extension
10. F-A10- R-H4 denature anneal, polymerase extension
11. F-All- R-H3 denature anneal, polymerase extension
12. F-A12-> R-H2 denature anneal, polymerase extension
13. F-Bl -> R-Hl denature anneal, polymerase extension
14. F-B2 -> R-G12 denature anneal, polymerase extension
15. F-B3 -> R-Gll denature anneal, polymerase extension
16. F-B4 -> R-G10 denature anneal, polymerase extension
17. F-B5 -> R-G9 denature anneal, polymerase extension
18. F-B6 - R-G8 denature anneal, polymerase extension
19. F-B7 -» R-G7 denature anneal, polymerase extension
20. F-B8 -> R-G6 denature anneal, polymerase extension
21. F-B9 - R-G5 denature anneal, polymerase extension
22. F-B10-* R-G4 denature anneal, polymerase extension
23. F-Bll- R-G3 denature anneal, polymerase extension
24. F-B12- R-G2 denature anneal, polymerase extension
25. F-Cl - R-Gl denature anneal, polymerase extension
26. F-C2 - R-F12 denature anneal, polymerase extension
27. F-C3 - R-Fll denature anneal, polymerase extension
28. F-C4 - R-F10 denature anneal, polymerase extension
29. F-C5 - R-F9 denature anneal, polymerase extension
30. F-C6 -> R-F8 denature anneal, polymerase extension
31. F-C7 - R-F7 denature anneal, polymerase extension 32. F-C8 ^ R-F6 denature , anneal , polymerase extension
33. F -C9 R-F5 denature, anneal , polymerase extension
34. F -cιo- R-F4 denature, anneal, polymerase extension
35. F -cn- R-F3 denature, anneal , polymerase extension
36. F -C12- R-F2 denature , anneal , polymerase extension
37. F -Dl ^ R-Fl denature, anneal , polymerase extension
38. F -D2 - R-E12 denature, anneal , polymerase extension
39. F -D3 ^ R-Ell denature , anneal , polymerase extension
40. F -D4 ^ R-EIO denature, anneal , polymerase extension
41. F* -D5 - R-E9 denature , anneal , polymerase extension
42. F- -D6 R-E8 denature, anneal , polymerase extension
43. F -D7 ^ R-E7 denature, anneal , polymerase extension
44. F- -D8 - R-E6 denature , anneal , polymerase extension
45. F- -D9 R-E5 denature , anneal , polymerase extension
46. F- -DK }^ R-E4 denature, anneal , polymerase extension
47. F- -DI: L^ R-E3 denature, anneal , polymerase extension
48. F- -D12- R-E2 denature , anneal , polymerase extension
49. F- -El ^ R-El denature, anneal , polymerase extension
50. F- -E2 ^ R-D12 denature , anneal , polymerase extension
51. F- -E3 - R-Dll denature, anneal, polymerase extension
52. F- -E4 - R-D10 denature, anneal , polymerase extension
53. F- -E5 ^ R-D9 denature, anneal , polymerase extension
54. F- -E6 ^ R-D8 denature , anneal, polymerase extension
55. F- •E7 -» R-D7 denature, anneal , polymerase extension
56. F- •E8 ^ R-D6 denature, anneal, polymerase extension
57. F- -E9 ^ R-D5 denature , anneal , polymerase extension
58. F- -E10- R-D4 denature, anneal , polymerase extension
59. F- -Ell- R-D3 denature , anneal , polymerase extension
60. F- -E12- R-D2 denature, anneal , polymerase extension
61. F- FI ^ R-Dl denature, anneal , polymerase extension
62. F- F2 R-C12 denature, anneal , polymerase extension
63. F- F3 ^ R-Cll denature, anneal, polymerase extension
64. F- F4 ^ R-C10 denature , anneal , polymerase extension 65. F--F5 ^ R-C9 denature, anneal, polymerase extension
66. F -F6 ^ R-C8 denature , anneal , polymerase extension
67. F- -F7 -> R-C7 denature , anneal , polymerase extension
68. F -F8 ^ R-C6 denature , anneal , polymerase extension
69. F -F9 ^ R-C5 denature , anneal , polymerase extension
70. F- -F10- R-C4 denature , anneal , polymerase extension
71. F- -Fll-> R-C3 denature , anneal , polymerase extension
72. F- -F12-> R-C2 denature , anneal, polymerase extension
73. F- -Gl ^ R-Cl denature , anneal , polymerase extension
74. F- -G2 ^ R-B12 denature , anneal , polymerase extension
75. F- -G3 -> R-Bll denature , anneal, polymerase extension
76. F- -G4 -> R-BIO denature , anneal , polymerase extension
77. F- -G5 R-B9 denature , anneal, polymerase extension
78. F- -G6 - R-B8 denature , anneal, polymerase extension
79. F- -G7 ^ R-B7 denature anneal , polymerase extension
80. F- -G8 ^ R-B6 denature anneal , polymerase extension
81. F- -G9 ^ R-B5 denature anneal, polymerase extension
82. F- -G10- R-B4 denature anneal , polymerase extension
83. F- -Gll- R-B3 denature anneal , polymerase extension
84. F- -G12" R-B2 denature anneal , polymerase extension
85. F- -HI ^ R-Bl denature, anneal , polymerase extension
86. F- -H2 - R-A12 denature, anneal , polymerase extension
87. F- -H3 ^ R-All denature , anneal, polymerase extension
88. F- -H4 - R-AIO denature , anneal, polymerase extension
89. F- H5 ^ R-A9 denature, anneal, polymerase extension
90. F- -H6 ^ R-A8 denature , anneal , polymerase extension
91. F- -H7 ^ R-A7 denature, anneal, polymerase extension
92. F- H8 ^ R-A6 denature, anneal, polymerase extension
93. F- -H9 - R-A5 denature , anneal, polymerase extension
94. F- -H10-> R-A4 denature, anneal, polymerase extension
95. F- Hll- R-A3 denature , anneal , polymerase extension
96. F- H12- R-A2 denature , anneal , polymerase extension Assembly of Nucleic Acid Molecules
The nucleic acid molecules listed in Table 4 have been produced using the methods described herein. The features and characteristics of each nucleic acid molecule is also described in Table 4.
As described in Table 4, a synthetic plasmid of 4800 bp in length was assembled. The plasmid comprises 192 oligonucleotides (two sets of 96 overlapping 50 mers; 25 bp overlap) . The plasmid is essentially pUC containing kanamycin resistance instead of ampicillin resistance. The synthetic plasmid also contains lux A and B genes from the Vibrio fisheri bacterial luciferase gene. The SynPucl9 plasmid is 2700 bp in length comprising a sequence essentially identical to pUC19 only shortened to precisely 2700 bp. Two sets of 96 50 mers were used to assemble the plasmid. The Synlux4 pUC19 plasmid was shortened and luxA gene was added. 54 100-mer oligonucleotides comprising two sets of 27 oligonucleotides were used to assemble the plasmid. The miniQElO plasmid comprising 2400 bp was assembled using 48 50 mer oligonucleotides. MiniQElO is an expression plasmid containing a 6X His tag and bacterial promoter for high-level polypeptide expression. MiniQElO was assembled and synthesized using the Taql polymerase amplification method of the invention. The microQE plasmid is a minimal plasmid containing only an ampicillin gene, an origin of replication and a linker of pQE plasmids . MicroQE was assembled using either combinatoric ligation with 24 50- mers or with one tube PCR amplification. The SynFibl,
SynFibB and SynFibG nucleic acid sequences are synthetic human fibrinogens manufactured using E. coli codons to optimize expression in a prokaryotic expression system. Table 4. Synthetic nucleic acid molecules produced using the methods of the invention.
Synthetic Plasmid 4800 192 50 circular F1-F96
SynPUC/19 2700 192 50 circular F01-F96
SynLux/4 2700 54 100 circular Fl-27
MiniQElO 2400 48 50 circular
MicroQE 1200 24 50 circular MQEF-1,24
Synfibl 1850 75 50 linear SFAF1-37
PQE25 2400 96 25 circular F1-F48
SynFibB 1500 60 59 50mers linear FibbFl-30
1 25mer
SynFibG 1350 54 53 50mers linear FibgFl-27
1 25mer
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

Claims

WHAT IS CLAIMED IS:
1. A method of synthesizing a target polynucleotide comprising: a) providing a target polynucleotide sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide of a) , wherein the initiating polynucleotide comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double- stranded polynucleotide comprised of a 5' overhang and a 3' overhang; c) identifying a second polynucleotide present in the target polynucleotide of a) , wherein the second polynucleotide is contiguous with the initiating polynucleotide and comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, wherein at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating polynucleotide; d) identifying a third polynucleotide present in the target polynucleotide of a) , wherein the third polynucleotide is contiguous with the initiating sequence and comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, wherein at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating polynucleotide which is not complementary to an overhang of the second polynucleotide; e) contacting the initiating polynucleotide of b) with the second polynucleotide of c) and the third polynucleotide of d) under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, wherein the initiating sequence is extended bi-directionally; f) in the absence of primer extension, optionally contacting the mixture of e) with a ligase under conditions suitable for ligation; and g) optionally repeating b) through f) to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation, whereby a target polynucleotide is synthesized.
2. The method of claim 1, wherein the target polynucleotide sequence encodes a target polypeptide.
3. The method of claim 2, wherein the target polypeptide is a protein.
4. The method of claim 3 , wherein the protein is an enzyme .
5. The method of claim 1, wherein the initiating polynucleotide sequence is identified by a computer program.
6. The method of claim 5, wherein the computer program comprises the following algorithm:
7. The method of claim 1, wherein the plus strand of the initiating, second or third polynucleotide is about 15 to 1000 nucleotides in length.
8. The method of claim 1, wherein the plus strand of the initiating, second or third polynucleotide is about 20 to 500 nucleotides in length.
9. The method of claim 1, wherein the plus strand of the initiating, second or third polynucleotide is about 25 to 100 nucleotides in length.
10. The method of claim 1, wherein the minus strand of the initiating, second or third polynucleotide is about 15 to 1000 nucleotides in length.
11. The method of claim 1, wherein the minus strand of the initiating, second or third polynucleotide is about 20 to 500 nucleotides in length.
12. The method of claim 1, wherein the minus strand of the initiating, second or third polynucleotide is about 25 to 100 nucleotides in length.
13. The method of claim 1, wherein the initiating polynucleotide is attached to a solid support.
14. A method of synthesizing a target polynucleotide comprising: a) providing a target polynucleotide sequence derived from a model sequence; b) identifying at least one initiating polynucleotide sequence present in the target polynucleotide sequence of a) , wherein the initiating polynucleotide comprises: 1) a first plus strand oligonucleotide; 2) a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide; and 3) a minus strand oligonucleotide comprising a first contiguous sequence that is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide ; c) annealing the first plus strand oligonucleotide and the second plus strand oligonucleotide to the minus strand oligonucleotide of b) resulting in a partially double-stranded initiating polynucleotide comprised of a 5' overhang and a 3' overhang; d) identifying a second polynucleotide sequence present in the target polynucleotide sequence of a) , wherein the second polynucleotide sequence is contiguous with the initiating polynucleotide sequence and comprises: 1) a first plus strand oligonucleotide; 2) a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide; and 3) a minus strand oligonucleotide comprising a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide ; e) annealing the first plus strand oligonucleotide and the second plus strand oligonucleotide to the minus strand oligonucleotide of d) resulting in a partially double-stranded second polynucleotide, wherein at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating polynucleotide ; f) identifying a third polynucleotide present in the target polynucleotide of a) , wherein the third polynucleotide is contiguous with the initiating sequence and comprises: 1) a first plus strand oligonucleotide; 2) a second plus strand oligonucleotide contiguous with the first plus strand oligonucleotide; and 3) a minus strand oligonucleotide comprising a first contiguous sequence which is at least partially complementary to the first plus strand oligonucleotide and second contiguous sequence which is at least partially complementary to the second plus strand oligonucleotide ; g) annealing the first plus strand oligonucleotide and the second plus strand oligonucleotide to the minus strand oligonucleotide of f) resulting in a partially double-stranded second polynucleotide, wherein at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating polynucleotide and not complementary to an overhang of the second polynucleotide ; h) contacting the initiating polynucleotide of c) with the second polynucleotide of e) and the third polynucleotide of g) under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, wherein the initiating sequence is extended bi-directionally; i) in the absence of primer extension, optionally contacting the mixture of h) with a ligase under conditions suitable for ligation; and j) optionally repeating b) through i) to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation, whereby a target polynucleotide is synthesized.
15. A method for synthesizing a target polynucleotide, comprising: a) providing a target polynucleotide sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide of a) , wherein the initiating polynucleotide comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide; c) contacting the initiating polynucleotide under conditions suitable for primer annealing with a first oligonucleotide having partial complementarity to the 3" portion of the plus strand of the initiating polynucleotide, and a second oligonucleotide having partial complementarity to the 3' portion of the minus strand of the initiating polynucleotide; d) catalyzing under conditions suitable for primer extension: 1) polynucleotide synthesis from the 3'- hydroxyl of the plus strand of the initiating polynucleotide; 2) polynucleotide synthesis from the 3'- hydroxyl of the annealed first oligonucleotide; 3) polynucleotide synthesis from the 3' -hydroxyl of the minus strand of the initiating polynucleotide; and 4) polynucleotide synthesis from the 3' -hydroxyl of the annealed second oligonucleotide, wherein the initiating sequence is extended bi-directionally thereby forming a nascent extended initiating polynucleotide; e) contacting the extended initiating polynucleotide of d) under conditions suitable for primer annealing with a third oligonucleotide having partial complementarity to the 3 ' portion of the plus strand of the extended initiating polynucleotide, and a fourth oligonucleotide having partial complementarity to the 3' portion of the minus strand of the extended initiating polynucleotide ; f) catalyzing under conditions suitable for primer extension: 1) polynucleotide synthesis from the 3'- hydroxyl of the plus strand of the extended initiating polynucleotide; 2) polynucleotide synthesis from the 3'- hydroxyl of the annealed third oligonucleotide; 3) polynucleotide synthesis from the 3 '-hydroxyl of the minus strand of the extended initiating polynucleotide; and 4) polynucleotide synthesis from the 3' -hydroxyl of the annealed fourth oligonucleotide, wherein the extended initiating sequence is extended bi-directionally thereby forming a nascent extended initiating polynucleotide; and g) optionally repeating e) through f) as desired, resulting in formation of the target polynucleotide sequence .
16. The method of claim 15, wherein the target polynucleotide sequence encodes a target polypeptide.
17. The method of claim 16, wherein the target polypeptide is a protein.
18. The method of claim 17, wherein the protein is an enzyme .
19. The method of claim 15, wherein the initiating polynucleotide is identified by an algorithm.
20. A method of synthesizing a target polynucleotide comprising: a) providing a target polynucleotide sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide of a) , wherein the initiating polynucleotide comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double- stranded polynucleotide comprised of at least a 5' overhang or a 3 ' overhang; c) identifying a second polynucleotide present in the target polynucleotide of a) , wherein the second polynucleotide is contiguous with the initiating polynucleotide and comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, wherein at least one overhang of the second polynucleotide is complementary to the overhang of the initiating polynucleotide; d) contacting the initiating polynucleotide of b) with the second polynucleotide of c) under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, wherein the initiating sequence is extended uni- directionally; e) in the absence of primer extension, optionally contacting the mixture of e) with a ligase under conditions suitable for ligation; and f) optionally repeating b) through e) to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation, whereby a target polynucleotide is synthesized.
21. The method of claim 15, wherein the plus strand of the initiating, second or third polynucleotide is about 15 to 1000 nucleotides in length.
22. The method of claim 15, wherein the plus strand of the initiating, second or third polynucleotide is about 20 to 500 nucleotides in length.
23. The method of claim 15, wherein the plus strand of the initiating, second or third polynucleotide is about 25 to 100 nucleotides in length.
24. The method of claim 15, wherein the minus strand of the initiating, second or third polynucleotide is about 15 to 1000 nucleotides in length.
25. The method of claim 15, wherein the minus strand of the initiating, second or third polynucleotide is about 20 to 500 nucleotides in length.
26. The method of claim 15, wherein the minus strand of the initiating, second or third polynucleotide is about
25 to 100 nucleotides in length.
27. The method of claim 15, wherein the initiating polynucleotide is attached to a solid support .
28. A method for isolating a target polypeptide encoded by a target polynucleotide, comprising: a) providing a target polynucleotide sequence derived from a model sequence; b) identifying at least one initiating polynucleotide present in the target polynucleotide of a) , wherein the initiating polynucleotide comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double- stranded polynucleotide comprised of a 5' overhang and a 3' overhang; c) identifying a second polynucleotide present in the target polynucleotide of a) , wherein the second polynucleotide is contiguous with the initiating sequence and comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3 " overhang, or a 5 ' overhang and a 3 ' overhang, wherein at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating sequence; d) identifying a third polynucleotide present in the target polynucleotide of a) , wherein the third polynucleotide is contiguous with the initiating sequence and comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, wherein at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating sequence which is not complementary to an overhang of the second polynucleotide; e) contacting the initiating polynucleotide of b) with the second polynucleotide of c) and the third polynucleotide of d) under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, wherein the initiating sequence is extended bi-directionally; f) in the absence of primer extension, optionally contacting the mixture of e) with a ligase under conditions suitable for ligation; g) optionally repeating b) through f) to sequentially add double-stranded polynucleotides to the extended initiating sequence through repeated cycles of annealing and ligation, whereby a target polynucleotide is synthesized; h) incorporating the target polynucleotide of g) in an expression vector; i) introducing the expression vector of h) into a suitable host cell; j) culturing the cell of i) under conditions and for such time as to promote the expression of the target polypeptide encoded by the target polynucleotide; and k) isolating the target polypeptide.
29. The method of claim 28, wherein the target polypeptide is a chimeric protein.
30. The method of claim 28, wherein the target polypeptide is a fusion protein.
31. The method of claim 28, wherein the expression vector is a bacterial expression vector.
32. The method of claim 29, wherein the expression vector is an animal cell expression vector.
33. The method of claim 28, wherein the expression vector is an insect cell expression vector.
34. The method of claim 28, wherein the expression vector is a retroviral vector.
35. The method of claim 29, wherein the expression vector is contained in a host cell.
36. The method of claim 35, wherein the host cell is a prokaryotic cell.
37. The method of claim 35, wherein the host cell is a eukaryotic cell.
38. The method of claims 1, 14, 15 or 27, wherein the oligonucleotides are produced by synthesis on a automated DNA synthesizer.
39. A method of synthesizing a target polynucleotide comprising: a) providing a target polynucleotide sequence derived from a model sequence; b) chemically synthesizing a plurality of single-stranded oligonucleotides each of which is partially complementary to at least one oligonucleotide present in the plurality, wherein the sequence of the plurality of oligonucleotides is a contiguous sequence of the target polynucleotide ; c) contacting the partially complementary oligonucleotides of b) under conditions and for such time suitable for annealing, the contacting resulting in a plurality of partially double-stranded polynucleotides, wherein each double-stranded polynucleotide is comprised of a 5' overhang and a 3' overhang; d) identifying at least one initiating polynucleotide derived from the model sequence, wherein the initiating polynucleotide is present in the plurality of double-stranded polynucleotides set forth in c) ; e) in the absence of primer extension, subjecting a mixture comprising the initiating polynucleotide and 1) a double-stranded polynucleotide that will anneal to the 5' portion of said initiating and sequence; 2) a double-stranded polynucleotide that will anneal to the 3 ' portion of the initiating polynucleotide; and 3) a DNA ligase under conditions suitable for annealing and ligation, wherein the initiating polynucleotide is extended bi-directionally; f) sequentially annealing double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing, whereby the target polynucleotide is produced.
40. The method of claim 39, wherein the oligonucleotides are produced by synthesis on an automated DNA synthesizer.
41. A computer program, stored on a computer-readable medium, for generating a target polynucleotide sequence, the computer program comprising instructions for causing a computer system to : a) identify an initiating polynucleotide sequence contained in the target polynucleotide sequence; b) parse the target polynucleotide sequence into multiply distinct, partially complementary, oligonucleotides; c) control assembly of the target polynucleotide sequence by controlling the bi-directional extension of the initiating polynucleotide sequence by the sequential addition of partially complementary oligonucleotides resulting in a contiguous double-stranded polynucleotide.
42. The computer program of claim 41, wherein the parsing is performed by an algorithm.
43. The computer program of claim 42, wherein the algorithm comprises: $0verlap = <STDIN>;
$seqlen = length ($sequence) ;
}
$revcomp = " " ; for ($i = $seqlen-l; $i >= 0; $i--) { $base = substr ($sequence, $i, 1) ; if ($base eq "a"){$comp = "T";} elsif ($base eq "t") {$comp = "A" elsif ($base eq "g") {$comp = "C" elsif ($base eq "c") {$comp = "G" elsif ($base eq "A"){$comp = "T" elsif ($base eq "T") {$comp = "A" elsif ($base eq "G"){$comp = "C" elsif ($base eq "C"){$comp = "G" else {$comp = "X" } ;
$revcomp = $revcomp. $comp;
} print OUT "Forward oligos\n" ; print "Forward oligos\n" ; $r = 1; for ($i = 0; $i <= $seqlen -1; $i+=$0L) { $oligo = substr ($sequence, $i, $OL) ; print OUT "$oligname F- $r $oligo\n"; print "$oligname F- $r $oligo\n" ;
$r = $r + 1; $r *= 1; for ($i = $seqlen $Overlap - $OL; $i >= 0; $i-=$OL)
{ print OUT "\n" ; print "\n";
$oligo = substr ($revcomp, $i, $OL) ; print OUT "$oligname R- $r $oligo" ; print "$oligname R- $r $oligo";
$r = $r + 1;
}
$oligo = substr ($revcomp, 1, $Overlap> ; print OUT "$oligo\n"; print "$oligo\n";
wherein
$oligoname is the identifier name for the list and for each component #oligonucleotide;
$OL is the length of each component oligonucleotide;
$Overlap is the length of the overlap in bases between each forward and each #reverse oligonucleotide;
$sequence is the DNA sequence in bases;
$seqlen is the length of the DNA sequence in bases;
$bas is the individual base in a sequence ;
$forseq is the sequence of a forward oligonucleotide;
$revseq is the sequence of a reverse oligonucleotide;
$revcomp is the reverse complemented sequence of the gene,
$oligonameF- [] is the list of parsed forward oligos; and
$oligonameR- [] is the list of parsed reverse oligos.
44. The computer program of claim 43, wherein the forward sequence is optionally converted to upper case using an algorithm comprising:
$forseq = " " ; for ($j = 0; $j <= seqlen-1; $j ++)
{ $bas = substr ($sequence, $j , 1) ; if ($bas eq "a"){$cfor = "A";} elsif ($bas eq "t" {$cfor "T elsif ($bas eq "c" {$cfor "C elsif ($bas eq "g" {$cfor "G elsif ($bas eq "A" {$cfor "A elsif ($bas eq "T" {$cfor "T elsif ($bas eq "C" {$cfor "C elsif ($bas eq "G" {$cfor "G else {$cfor = "X"}; $forseq = $forseq. $cfor; print OUT "$j \n" ; wherein
$seqlen is the length of the DNA sequence in bases $bas is the individual base in a sequence
$forseq is the sequence of a forward oligonucleotide .
45. A computer-assisted method for synthesizing a target polynucleotide encoding a target polypeptide derived from a model sequence using a programmed computer including a processor, an input device, and an output device, comprising: a) inputting into the programmed computer, through the input device, data including at least a portion of the target polynucleotide sequence encoding a target polypeptide; b) determining, using the processor, the sequence of at least one initiating polynucleotide present in the target polynucleotide sequence c) selecting, using the processor, a model for synthesizing the target polynucleotide sequence based on the position of the initiating sequence in the target polynucleotide sequence using overall sequence parameters necessary for expression of the target polypeptide in a biological system; and d) outputting, to the output device, the results of the at least one determination.
46. The method of claim 45, further comprising predicting, using the processor, whether changing the model sequence to the target polynucleotide will have an effect on the target polypeptide encoded by the target polynucleotide based on at least one physical, structural or phylogenetic characteristic of the model sequence.
47. A method for automated synthesis of a target polynucleotide sequence, comprising: a) providing a user with an opportunity to communicate a desired target polynucleotide sequence ; b) allowing the user to transmit the desired target polynucleotide sequence to a server; c) providing the user with a unique designation; d) obtaining the transmitted target polynucleotide sequence provided by the user.
48. The method of claim 47, further comprising: f) identifying at least one initiating polynucleotide present in the target polynucleotide of e) , wherein the initiating polynucleotide comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double- stranded polynucleotide comprised of a 5** overhang and a 3' overhang; g) identifying a second polynucleotide present in the target polynucleotide of e) , wherein the second polynucleotide is contiguous with the initiating polynucleotide and comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3 ' overhang, wherein at least one overhang of the second polynucleotide is complementary to at least one overhang of the initiating polynucleotide ; h) identifying a third polynucleotide present in the target polynucleotide of e) , wherein the third polynucleotide is contiguous with the initiating sequence and comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide resulting in a partially double-stranded polynucleotide comprised of a 5' overhang, a 3' overhang, or a 5' overhang and a 3' overhang, wherein at least one overhang of the third polynucleotide is complementary to at least one overhang of the initiating polynucleotide which is not complementary to an overhang of the second polynucleotide; i) contacting the initiating polynucleotide of f) with the second polynucleotide of g) and the third polynucleotide of h) under conditions and for such time suitable for annealing, the contacting resulting in a contiguous double-stranded polynucleotide, wherein the initiating sequence is extended bi-directionally; j) in the absence of primer extension, optionally contacting the mixture of i) with a ligase under conditions suitable for ligation; and k) optionally repeating f) through k) to sequentially add double-stranded polynucleotides to the extended initiating polynucleotide through repeated cycles of annealing and ligation, whereby a target polynucleotide is synthesized.
49. The method of claim 47, further comprising: f) identifying at least one initiating polynucleotide present in the target polynucleotide of e) , wherein the initiating polynucleotide comprises at least one plus strand oligonucleotide annealed to at least one minus strand oligonucleotide; g) contacting the initiating polynucleotide under conditions suitable for primer annealing with a first oligonucleotide having partial complementarity to the 3 ' portion of the plus strand of the initiating polynucleotide, and a second oligonucleotide having partial complementarity to the 3 ' portion of the minus strand of the initiating polynucleotide; h) catalyzing under conditions suitable for primer extension: 1) polynucleotide synthesis from the 3'- hydroxyl of the plus strand of the initiating polynucleotide; 2) polynucleotide synthesis from the 3'- hydroxyl of the annealed first oligonucleotide; 3) polynucleotide synthesis from the 3' -hydroxyl of the minus strand of the initiating polynucleotide; and 4) polynucleotide synthesis from the 3' -hydroxyl of the annealed second oligonucleotide, wherein the initiating sequence is extended bi-directionally thereby forming a nascent extended initiating polynucleotide; i) contacting the extended initiating polynucleotide of h) under conditions suitable for primer annealing with a third oligonucleotide having partial complementarity to the 3' portion of the plus strand of the extended initiating polynucleotide, and a fourth oligonucleotide having partial complementarity to the 3 ' portion of the minus strand of the extended initiating polynucleotide ; j) catalyzing under conditions suitable for primer extension: 1) polynucleotide synthesis from the 3'- hydroxyl of the plus strand of the extended initiating polynucleotide; 2) polynucleotide synthesis from the 3 - hydroxyl of the annealed third oligonucleotide; 3) polynucleotide synthesis from the 3 ' -hydroxyl of the minus strand of the extended initiating polynucleotide; and 4) polynucleotide synthesis from the 3 ' -hydroxyl of the annealed fourth oligonucleotide, wherein the extended initiating sequence is extended bi-directionally thereby forming a nascent extended initiating polynucleotide; and k) optionally repeating f) through j) as desired, resulting in formation of the target polynucleotide sequence .
50. A method for automated synthesis of a polynucleotide, comprising: a) providing a user with a mechanism for communicating a model polynucleotide sequence; b) optionally providing the user with an opportunity to communicate at least one desired modification to the model sequence if desired; c) allowing the user to transmit the model sequence and desired modification to a server; d) providing user with a unique designation; e) obtaining the transmitted model sequence and desired modification provided by the user; f) inputting into a programmed computer, through an input device, data including at least a portion of the model polynucleotide sequence; g) determining, using the processor, the sequence of the model polynucleotide sequence containing the desired modification; h) further determining, using the processor, at least one initiating polynucleotide sequence present in the model polynucleotide sequence i) selecting, using the processor, a model for synthesizing the modified model polynucleotide sequence based on the position of the initiating sequence in the model polynucleotide sequence; and j) outputting, to the output device, the results of the at least one determination.
51. An isolated polynucleotide composition comprising: a) an initiating polynucleotide comprising a plus strand and a minus strand, wherein the plus or minus strand is modified to incorporate a moiety that binds to a solid support; b) a first primer suitable for primer extension having partial complementarity to the 3' portion of the plus strand of the initiating polynucleotide c) a second primer suitable for primer extension having partial complementarity to the 3' portion of the minus strand of the initiating polynucleotide; and d) a solid support matrix, wherein each of the first and second primers consists of about 25 to 1000 nucleotides.
52. An isolated polynucleotide composition comprising: a) an initiating polynucleotide comprising a plus strand and a minus strand, wherein the plus or minus strand is modified to incorporate a moiety that binds to a solid support; b) a first primer suitable for primer extension having partial complementarity to the 3' portion of the plus strand of the initiating polynucleotide c) a second primer suitable for primer extension having partial complementarity to the 3 ' portion of the minus strand of the initiating polynucleotide; and d) a solid support matrix, wherein each of the first and second primers consists of about 25 to 1000 nucleotides.
PCT/US2002/001649 2001-01-19 2002-01-18 Computer-directed assembly of a polynucleotide encoding a target polypeptide WO2002081490A2 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
US10/250,894 US20040241650A1 (en) 2001-01-19 2002-01-18 Computer-directed assembly of a polynucleotide encoding a target polypeptide
JP2002579476A JP2004533228A (en) 2001-01-19 2002-01-18 Computer-based assembly of a polynucleotide encoding a target polypeptide
KR1020037009589A KR100860291B1 (en) 2001-01-19 2002-01-18 Computer-directed assembly of a polynucleotide encoding a target polypeptide
EP02739079A EP1385950B1 (en) 2001-01-19 2002-01-18 Computer-directed assembly of a polynucleotide encoding a target polypeptide
DK02739079T DK1385950T3 (en) 2001-01-19 2002-01-18 Computer controlled assembly of a polynucleotide encoding a target polypeptide
CA002433463A CA2433463A1 (en) 2001-01-19 2002-01-18 Computer-directed assembly of a polynucleotide encoding a target polypeptide
MXPA03006344A MXPA03006344A (en) 2001-01-19 2002-01-18 Computer-directed assembly of a polynucleotide encoding a target polypeptide.
DE60227361T DE60227361D1 (en) 2001-01-19 2002-01-18 COMPUTER MEDIATED ASSEMBLY OF POLYNUCLEOTIDES ENCODING A TARGETED POLYPEPTIDE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US26269301P 2001-01-19 2001-01-19
US60/262,693 2001-01-19

Publications (3)

Publication Number Publication Date
WO2002081490A2 true WO2002081490A2 (en) 2002-10-17
WO2002081490A3 WO2002081490A3 (en) 2003-11-27
WO2002081490A8 WO2002081490A8 (en) 2004-05-21

Family

ID=22998593

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/001649 WO2002081490A2 (en) 2001-01-19 2002-01-18 Computer-directed assembly of a polynucleotide encoding a target polypeptide

Country Status (13)

Country Link
US (9) US20040241650A1 (en)
EP (1) EP1385950B1 (en)
JP (1) JP2004533228A (en)
KR (1) KR100860291B1 (en)
AT (1) ATE399857T1 (en)
AU (1) AU2008201007A1 (en)
CA (1) CA2433463A1 (en)
DE (1) DE60227361D1 (en)
DK (1) DK1385950T3 (en)
ES (1) ES2309175T3 (en)
MX (1) MXPA03006344A (en)
PT (1) PT1385950E (en)
WO (1) WO2002081490A2 (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1470252A2 (en) * 2001-08-02 2004-10-27 Egea Biosciences, Inc. Method for assembly of a polynucleotide encoding a target polypeptide
US7041481B2 (en) 2003-03-14 2006-05-09 The Regents Of The University Of California Chemical amplification based on fluid partitioning
EP1771579A2 (en) * 2004-05-04 2007-04-11 Dna Twopointo Inc. Design, synthesis and assembly of synthetic nucleic acids
US8158391B2 (en) 2009-05-06 2012-04-17 Dna Twopointo, Inc. Production of an α-carboxyl-ω-hydroxy fatty acid using a genetically modified Candida strain
DE102010056289A1 (en) 2010-12-24 2012-06-28 Geneart Ag Process for the preparation of reading frame correct fragment libraries
US8825411B2 (en) 2004-05-04 2014-09-02 Dna Twopointo, Inc. Design, synthesis and assembly of synthetic nucleic acids
US9051666B2 (en) 2002-09-12 2015-06-09 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US9089844B2 (en) 2010-11-01 2015-07-28 Bio-Rad Laboratories, Inc. System for forming emulsions
US9492797B2 (en) 2008-09-23 2016-11-15 Bio-Rad Laboratories, Inc. System for detection of spaced droplets
US9623384B2 (en) 2008-09-23 2017-04-18 Bio-Rad Laboratories, Inc. System for transporting emulsions from an array to a detector
US9764322B2 (en) 2008-09-23 2017-09-19 Bio-Rad Laboratories, Inc. System for generating droplets with pressure monitoring
US9885034B2 (en) 2011-04-25 2018-02-06 Bio-Rad Laboratories, Inc. Methods and compositions for nucleic acid analysis
US9925510B2 (en) 2010-01-07 2018-03-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US9968902B2 (en) 2009-11-25 2018-05-15 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10099219B2 (en) 2010-03-25 2018-10-16 Bio-Rad Laboratories, Inc. Device for generating droplets
US10166522B2 (en) 2009-09-02 2019-01-01 Bio-Rad Laboratories, Inc. System for mixing fluids by coalescence of multiple emulsions
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US10207240B2 (en) 2009-11-03 2019-02-19 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
WO2019073072A1 (en) 2017-10-13 2019-04-18 Ribbon Biolabs Gmbh A novel method for synthesis of polynucleotides using a diverse library of oligonucleotides
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10378048B2 (en) 2010-03-02 2019-08-13 Bio-Rad Laboratories, Inc. Emulsion chemistry for encapsulated droplets
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US10512910B2 (en) 2008-09-23 2019-12-24 Bio-Rad Laboratories, Inc. Droplet-based analysis method
WO2020208234A1 (en) 2019-04-10 2020-10-15 Ribbon Biolabs Gmbh A library of polynucleotides
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US11130128B2 (en) 2008-09-23 2021-09-28 Bio-Rad Laboratories, Inc. Detection method for a target nucleic acid
US11629377B2 (en) 2017-09-29 2023-04-18 Evonetix Ltd Error detection during hybridisation of target double-stranded nucleic acid
US11702662B2 (en) 2011-08-26 2023-07-18 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7879580B2 (en) * 2002-12-10 2011-02-01 Massachusetts Institute Of Technology Methods for high fidelity production of long nucleic acid molecules
US7932025B2 (en) * 2002-12-10 2011-04-26 Massachusetts Institute Of Technology Methods for high fidelity production of long nucleic acid molecules with error control
JP4093157B2 (en) * 2003-09-17 2008-06-04 株式会社日立製作所 Distributed inspection device and host inspection device
US20070148681A1 (en) * 2005-12-16 2007-06-28 Gene Oracle, Inc. Gene synthesis kit
US20070231805A1 (en) * 2006-03-31 2007-10-04 Baynes Brian M Nucleic acid assembly optimization using clamped mismatch binding proteins
WO2007136834A2 (en) * 2006-05-19 2007-11-29 Codon Devices, Inc. Combined extension and ligation for nucleic acid assembly
US7872120B2 (en) * 2007-08-10 2011-01-18 Venkata Chalapathi Rao Koka Methods for synthesizing a collection of partially identical polynucleotides
WO2009138954A2 (en) 2008-05-14 2009-11-19 British Columbia Cancer Agency Branch Gene synthesis by convergent assembly of oligonucleotide subsets
US8951939B2 (en) 2011-07-12 2015-02-10 Bio-Rad Laboratories, Inc. Digital assays with multiplexed detection of two or more targets in the same optical channel
US9132394B2 (en) 2008-09-23 2015-09-15 Bio-Rad Laboratories, Inc. System for detection of spaced droplets
US9417190B2 (en) 2008-09-23 2016-08-16 Bio-Rad Laboratories, Inc. Calibrations and controls for droplet-based assays
US20110224086A1 (en) * 2010-03-09 2011-09-15 Jose Pardinas Methods and Algorithms for Selecting Polynucleotides For Synthetic Assembly
JP6155419B2 (en) 2010-03-25 2017-07-05 バイオ−ラッド・ラボラトリーズ・インコーポレーテッド Droplet transport system for detection
CA2830443C (en) 2011-03-18 2021-11-16 Bio-Rad Laboratories, Inc. Multiplexed digital assays with combinatorial use of signals
WO2013155531A2 (en) 2012-04-13 2013-10-17 Bio-Rad Laboratories, Inc. Sample holder with a well having a wicking promoter
US9678948B2 (en) 2012-06-26 2017-06-13 International Business Machines Corporation Real-time message sentiment awareness
US9104656B2 (en) * 2012-07-03 2015-08-11 International Business Machines Corporation Using lexical analysis and parsing in genome research
US9460083B2 (en) 2012-12-27 2016-10-04 International Business Machines Corporation Interactive dashboard based on real-time sentiment analysis for synchronous communication
US9690775B2 (en) 2012-12-27 2017-06-27 International Business Machines Corporation Real-time sentiment analysis for synchronous communication
US10167366B2 (en) 2013-03-15 2019-01-01 Melior Innovations, Inc. Polysilocarb materials, methods and uses
TWI695067B (en) 2013-08-05 2020-06-01 美商扭轉生物科技有限公司 De novo synthesized gene libraries
WO2016019387A1 (en) * 2014-08-01 2016-02-04 The Regents Of The University Of California One-pot multiplex gene synthesis
CA2975855A1 (en) 2015-02-04 2016-08-11 Twist Bioscience Corporation Compositions and methods for synthetic gene assembly
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US9981239B2 (en) 2015-04-21 2018-05-29 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
WO2017049231A1 (en) 2015-09-18 2017-03-23 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US11512347B2 (en) 2015-09-22 2022-11-29 Twist Bioscience Corporation Flexible substrates for nucleic acid synthesis
CN108603307A (en) 2015-12-01 2018-09-28 特韦斯特生物科学公司 functionalized surface and its preparation
GB2568444A (en) 2016-08-22 2019-05-15 Twist Bioscience Corp De novo synthesized nucleic acid libraries
JP6871364B2 (en) 2016-09-21 2021-05-12 ツイスト バイオサイエンス コーポレーション Nucleic acid-based data storage
GB2573069A (en) 2016-12-16 2019-10-23 Twist Bioscience Corp Variant libraries of the immunological synapse and synthesis thereof
WO2018156792A1 (en) 2017-02-22 2018-08-30 Twist Bioscience Corporation Nucleic acid based data storage
WO2018170169A1 (en) 2017-03-15 2018-09-20 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
WO2018231864A1 (en) 2017-06-12 2018-12-20 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
CN111566209A (en) 2017-06-12 2020-08-21 特韦斯特生物科学公司 Seamless nucleic acid assembly method
SG11202002194UA (en) 2017-09-11 2020-04-29 Twist Bioscience Corp Gpcr binding proteins and synthesis thereof
CA3079613A1 (en) 2017-10-20 2019-04-25 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
CN112041438A (en) 2018-01-04 2020-12-04 特韦斯特生物科学公司 DNA-based digital information storage
CN112639130A (en) 2018-05-18 2021-04-09 特韦斯特生物科学公司 Polynucleotides, reagents and methods for nucleic acid hybridization
EP3930753A4 (en) 2019-02-26 2023-03-29 Twist Bioscience Corporation Variant nucleic acid libraries for glp1 receptor
SG11202109283UA (en) 2019-02-26 2021-09-29 Twist Bioscience Corp Variant nucleic acid libraries for antibody optimization
WO2020257612A1 (en) 2019-06-21 2020-12-24 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly
JP2023517139A (en) * 2019-12-30 2023-04-21 源點生物科技股▲フン▼有限公司 Apparatus and method for preparing nucleic acid sequences using enzymes
EP4091169A4 (en) * 2020-01-17 2024-02-14 Asklepios Biopharmaceutical Inc Systems and methods for synthetic regulatory sequence design or production
GB2619548A (en) * 2022-06-10 2023-12-13 Nunabio Ltd Nucleic acid and gene synthesis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0316018A2 (en) * 1985-03-29 1989-05-17 Cetus Oncology Corporation Modification of DNA sequences
WO1990000626A1 (en) * 1988-07-14 1990-01-25 Baylor College Of Medicine Solid phase assembly and reconstruction of biopolymers
EP0385410A2 (en) * 1989-02-28 1990-09-05 Canon Kabushiki Kaisha Partially double-stranded oligonucleotide and method for forming oligonucleotide
WO1994012632A1 (en) * 1992-11-27 1994-06-09 University College London Improvements in nucleic acid synthesis by pcr
WO1999014318A1 (en) * 1997-09-16 1999-03-25 Board Of Regents, The University Of Texas System Method for the complete chemical synthesis and assembly of genes and genomes
WO2000042560A2 (en) * 1999-01-19 2000-07-20 Maxygen, Inc. Methods for making character strings, polynucleotides and polypeptides
WO2000049142A1 (en) * 1999-02-19 2000-08-24 Febit Ferrarius Biotechnology Gmbh Method for producing polymers

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4652639A (en) * 1982-05-06 1987-03-24 Amgen Manufacture and expression of structural genes
GB8810400D0 (en) 1988-05-03 1988-06-08 Southern E Analysing polynucleotide sequences
US5198346A (en) * 1989-01-06 1993-03-30 Protein Engineering Corp. Generation and selection of novel DNA-binding proteins and polypeptides
US5143854A (en) * 1989-06-07 1992-09-01 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof
US5837528A (en) * 1989-06-22 1998-11-17 Hoffmann La Roche, Inc. Bacterial strains which overproduce riboflavin
AU644619B2 (en) * 1989-12-21 1993-12-16 Advanced Technologies (Cambridge) Limited Modification of plant metabolism
US5747334A (en) * 1990-02-15 1998-05-05 The University Of North Carolina At Chapel Hill Random peptide library
CA2092317A1 (en) * 1990-09-28 1992-03-29 David H. Gelfand Purified thermostable nucleic acid polymerase enzyme from thermosipho africanus
JPH06504997A (en) 1990-12-06 1994-06-09 アフィメトリックス, インコーポレイテッド Synthesis of immobilized polymers on a very large scale
DE69225227T2 (en) * 1991-06-27 1998-10-29 Genelabs Tech Inc SCREENING TEST FOR DETECTING DNA-BINDING MOLECULES
WO1993005182A1 (en) * 1991-09-05 1993-03-18 Isis Pharmaceuticals, Inc. Determination of oligonucleotides for therapeutics, diagnostics and research reagents
DK0604552T3 (en) * 1991-09-18 1997-08-04 Affymax Tech Nv Process for the synthesis of different assemblies of oligomers
EP0916396B1 (en) * 1991-11-22 2005-04-13 Affymetrix, Inc. (a Delaware Corporation) Combinatorial strategies for polymer synthesis
US5783384A (en) * 1992-01-13 1998-07-21 President And Fellows Of Harvard College Selection of binding-molecules
US5733743A (en) * 1992-03-24 1998-03-31 Cambridge Antibody Technology Limited Methods for producing members of specific binding pairs
PT643775E (en) * 1992-05-28 2004-10-29 Ct For Molecular Biology And M QUINONE DERIVATIVES TO INCREASE CELL BIOENERGY
US5807683A (en) * 1992-11-19 1998-09-15 Combichem, Inc. Combinatorial libraries and methods for their use
US5714320A (en) * 1993-04-15 1998-02-03 University Of Rochester Rolling circle synthesis of oligonucleotides and amplification of select randomized circular oligonucleotides
EP0705334A1 (en) * 1993-06-14 1996-04-10 Basf Aktiengesellschaft Tight control of gene expression in eucaryotic cells by tetracycline-responsive promoters
US5591578A (en) * 1993-12-10 1997-01-07 California Institute Of Technology Nucleic acid mediated electron transfer
ZA95260B (en) * 1994-01-13 1995-09-28 Univ Columbia Synthetic receptors libraries and uses thereof
US5738996A (en) * 1994-06-15 1998-04-14 Pence, Inc. Combinational library composition and method
US5663046A (en) * 1994-06-22 1997-09-02 Pharmacopeia, Inc. Synthesis of combinatorial libraries
FR2732693B1 (en) * 1995-04-06 1997-05-09 Bio Veto Tests Bvt CONTAMINANT REVELATION INDICATOR AND METHOD OF APPLICATION TO THE PRODUCTION OF AN ANTIBIOGRAM DIRECTLY CARRIED OUT ON A SAMPLING
US5807754A (en) * 1995-05-11 1998-09-15 Arqule, Inc. Combinatorial synthesis and high-throughput screening of a Rev-inhibiting arylidenediamide array
US6110457A (en) * 1995-12-08 2000-08-29 St. Louis University Live attenuated vaccines based on cp45 HPIV-3 strain and method to ensure attenuation in such vaccines
US6670127B2 (en) * 1997-09-16 2003-12-30 Egea Biosciences, Inc. Method for assembly of a polynucleotide encoding a target polypeptide
US7321828B2 (en) 1998-04-13 2008-01-22 Isis Pharmaceuticals, Inc. System of components for preparing oligonucleotides
CA2340284A1 (en) * 1998-08-25 2000-03-02 The Scripps Research Institute Methods and systems for predicting protein function
US7244560B2 (en) * 2000-05-21 2007-07-17 Invitrogen Corporation Methods and compositions for synthesis of nucleic acid molecules using multiple recognition sites

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0316018A2 (en) * 1985-03-29 1989-05-17 Cetus Oncology Corporation Modification of DNA sequences
WO1990000626A1 (en) * 1988-07-14 1990-01-25 Baylor College Of Medicine Solid phase assembly and reconstruction of biopolymers
EP0385410A2 (en) * 1989-02-28 1990-09-05 Canon Kabushiki Kaisha Partially double-stranded oligonucleotide and method for forming oligonucleotide
WO1994012632A1 (en) * 1992-11-27 1994-06-09 University College London Improvements in nucleic acid synthesis by pcr
WO1999014318A1 (en) * 1997-09-16 1999-03-25 Board Of Regents, The University Of Texas System Method for the complete chemical synthesis and assembly of genes and genomes
WO2000042560A2 (en) * 1999-01-19 2000-07-20 Maxygen, Inc. Methods for making character strings, polynucleotides and polypeptides
WO2000042561A2 (en) * 1999-01-19 2000-07-20 Maxygen, Inc. Oligonucleotide mediated nucleic acid recombination
WO2000049142A1 (en) * 1999-02-19 2000-08-24 Febit Ferrarius Biotechnology Gmbh Method for producing polymers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1385950A2 *

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1470252A2 (en) * 2001-08-02 2004-10-27 Egea Biosciences, Inc. Method for assembly of a polynucleotide encoding a target polypeptide
EP1470252A4 (en) * 2001-08-02 2006-03-08 Egea Biosciences Inc Method for assembly of a polynucleotide encoding a target polypeptide
US9051666B2 (en) 2002-09-12 2015-06-09 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10774325B2 (en) 2002-09-12 2020-09-15 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10450560B2 (en) 2002-09-12 2019-10-22 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10640764B2 (en) 2002-09-12 2020-05-05 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
USRE48788E1 (en) 2003-03-14 2021-10-26 Lawrence Livermore National Security, Llc Chemical amplification based on fluid partitioning
USRE43365E1 (en) 2003-03-14 2012-05-08 Lawrence Livermore National Security, Llc Apparatus for chemical amplification based on fluid partitioning in an immiscible liquid
USRE45539E1 (en) 2003-03-14 2015-06-02 Lawrence Livermore National Security, Llc Method for chemical amplification based on fluid partitioning in an immiscible liquid
USRE47080E1 (en) 2003-03-14 2018-10-09 Lawrence Livermore National Security, Llc Chemical amplification based on fluid partitioning
USRE41780E1 (en) 2003-03-14 2010-09-28 Lawrence Livermore National Security, Llc Chemical amplification based on fluid partitioning in an immiscible liquid
US7041481B2 (en) 2003-03-14 2006-05-09 The Regents Of The University Of California Chemical amplification based on fluid partitioning
USRE46322E1 (en) 2003-03-14 2017-02-28 Lawrence Livermore National Security, Llc Method for chemical amplification based on fluid partitioning in an immiscible liquid
EP1771579A2 (en) * 2004-05-04 2007-04-11 Dna Twopointo Inc. Design, synthesis and assembly of synthetic nucleic acids
EP1771579A4 (en) * 2004-05-04 2011-04-20 Dna Twopointo Inc Design, synthesis and assembly of synthetic nucleic acids
US8825411B2 (en) 2004-05-04 2014-09-02 Dna Twopointo, Inc. Design, synthesis and assembly of synthetic nucleic acids
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US10258989B2 (en) 2008-09-23 2019-04-16 Bio-Rad Laboratories, Inc. Method of making a device for generating droplets
US10258988B2 (en) 2008-09-23 2019-04-16 Bio-Rad Laboratories, Inc. Device for generating droplets
US9764322B2 (en) 2008-09-23 2017-09-19 Bio-Rad Laboratories, Inc. System for generating droplets with pressure monitoring
US9649635B2 (en) 2008-09-23 2017-05-16 Bio-Rad Laboratories, Inc. System for generating droplets with push-back to remove oil
US10512910B2 (en) 2008-09-23 2019-12-24 Bio-Rad Laboratories, Inc. Droplet-based analysis method
US11130134B2 (en) 2008-09-23 2021-09-28 Bio-Rad Laboratories, Inc. Method of performing droplet-based assays
US11633739B2 (en) 2008-09-23 2023-04-25 Bio-Rad Laboratories, Inc. Droplet-based assay system
US9636682B2 (en) 2008-09-23 2017-05-02 Bio-Rad Laboratories, Inc. System for generating droplets—instruments and cassette
US11130128B2 (en) 2008-09-23 2021-09-28 Bio-Rad Laboratories, Inc. Detection method for a target nucleic acid
US10279350B2 (en) 2008-09-23 2019-05-07 Bio-Rad Laboratories, Inc. Method of generating droplets
US11612892B2 (en) 2008-09-23 2023-03-28 Bio-Rad Laboratories, Inc. Method of performing droplet-based assays
US9623384B2 (en) 2008-09-23 2017-04-18 Bio-Rad Laboratories, Inc. System for transporting emulsions from an array to a detector
US9492797B2 (en) 2008-09-23 2016-11-15 Bio-Rad Laboratories, Inc. System for detection of spaced droplets
US8158391B2 (en) 2009-05-06 2012-04-17 Dna Twopointo, Inc. Production of an α-carboxyl-ω-hydroxy fatty acid using a genetically modified Candida strain
US10166522B2 (en) 2009-09-02 2019-01-01 Bio-Rad Laboratories, Inc. System for mixing fluids by coalescence of multiple emulsions
US10677693B2 (en) 2009-09-02 2020-06-09 Bio-Rad Laboratories, Inc. System for mixing fluids by coalescence of multiple emulsions
US10207240B2 (en) 2009-11-03 2019-02-19 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
US9968902B2 (en) 2009-11-25 2018-05-15 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US11071963B2 (en) 2010-01-07 2021-07-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US9925510B2 (en) 2010-01-07 2018-03-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US10378048B2 (en) 2010-03-02 2019-08-13 Bio-Rad Laboratories, Inc. Emulsion chemistry for encapsulated droplets
US11060136B2 (en) 2010-03-02 2021-07-13 Bio-Rad Laboratories, Inc. Emulsion chemistry for encapsulated droplets
US11866771B2 (en) 2010-03-02 2024-01-09 Bio-Rad Laboratories, Inc. Emulsion chemistry for encapsulated droplets
US10099219B2 (en) 2010-03-25 2018-10-16 Bio-Rad Laboratories, Inc. Device for generating droplets
US10744506B2 (en) 2010-03-25 2020-08-18 Bio-Rad Laboratories, Inc. Device for generating droplets
US10272432B2 (en) 2010-03-25 2019-04-30 Bio-Rad Laboratories, Inc. Device for generating droplets
US9089844B2 (en) 2010-11-01 2015-07-28 Bio-Rad Laboratories, Inc. System for forming emulsions
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US11845054B2 (en) 2010-11-12 2023-12-19 Gen9, Inc. Methods and devices for nucleic acids synthesis
US10982208B2 (en) 2010-11-12 2021-04-20 Gen9, Inc. Protein arrays and methods of using and making the same
WO2012084923A1 (en) 2010-12-24 2012-06-28 Geneart Ag Method for producing reading-frame-corrected fragment libraries
DE102010056289A1 (en) 2010-12-24 2012-06-28 Geneart Ag Process for the preparation of reading frame correct fragment libraries
US11939573B2 (en) 2011-04-25 2024-03-26 Bio-Rad Laboratories, Inc. Methods and compositions for nucleic acid analysis
US10760073B2 (en) 2011-04-25 2020-09-01 Bio-Rad Laboratories, Inc. Methods and compositions for nucleic acid analysis
US9885034B2 (en) 2011-04-25 2018-02-06 Bio-Rad Laboratories, Inc. Methods and compositions for nucleic acid analysis
US10190115B2 (en) 2011-04-25 2019-01-29 Bio-Rad Laboratories, Inc. Methods and compositions for nucleic acid analysis
US11702662B2 (en) 2011-08-26 2023-07-18 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10927369B2 (en) 2012-04-24 2021-02-23 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
US11629377B2 (en) 2017-09-29 2023-04-18 Evonetix Ltd Error detection during hybridisation of target double-stranded nucleic acid
US11352619B2 (en) 2017-10-13 2022-06-07 Ribbon Biolabs Gmbh Method for synthesis of polynucleotides using a diverse library of oligonucleotides
WO2019073072A1 (en) 2017-10-13 2019-04-18 Ribbon Biolabs Gmbh A novel method for synthesis of polynucleotides using a diverse library of oligonucleotides
WO2020208234A1 (en) 2019-04-10 2020-10-15 Ribbon Biolabs Gmbh A library of polynucleotides

Also Published As

Publication number Publication date
PT1385950E (en) 2008-08-12
ATE399857T1 (en) 2008-07-15
US20050221340A1 (en) 2005-10-06
KR100860291B1 (en) 2008-09-25
US20050191648A1 (en) 2005-09-01
MXPA03006344A (en) 2004-12-03
US7966338B2 (en) 2011-06-21
US20060154283A1 (en) 2006-07-13
EP1385950A2 (en) 2004-02-04
US20050244841A1 (en) 2005-11-03
EP1385950B1 (en) 2008-07-02
US20040241650A1 (en) 2004-12-02
US20030138782A1 (en) 2003-07-24
AU2008201007A1 (en) 2008-05-15
KR20030077582A (en) 2003-10-01
ES2309175T3 (en) 2008-12-16
CA2433463A1 (en) 2002-10-17
US20050118628A1 (en) 2005-06-02
DE60227361D1 (en) 2008-08-14
WO2002081490A3 (en) 2003-11-27
US20050053997A1 (en) 2005-03-10
WO2002081490A8 (en) 2004-05-21
DK1385950T3 (en) 2008-11-03
JP2004533228A (en) 2004-11-04
US20070054277A1 (en) 2007-03-08

Similar Documents

Publication Publication Date Title
EP1385950B1 (en) Computer-directed assembly of a polynucleotide encoding a target polypeptide
US6670127B2 (en) Method for assembly of a polynucleotide encoding a target polypeptide
CA2362939C (en) Method for producing polymers
US9644225B2 (en) Programmable oligonucleotide synthesis
AU2002311757A1 (en) Computer-directed assembly of a polynucleotide encoding a target polypeptide
AU2008201933A1 (en) Method for assembly of a polynucleotide encoding a target polypeptide
Frank Segmented solid supports: My personal addiction to Merrifield’s solid phase synthesis

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2433463

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2002311757

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: PA/a/2003/006344

Country of ref document: MX

WWE Wipo information: entry into national phase

Ref document number: 1020037009589

Country of ref document: KR

Ref document number: 2002579476

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 028039157

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2002739079

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020037009589

Country of ref document: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 10250894

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 2002739079

Country of ref document: EP

CFP Corrected version of a pamphlet front page
CR1 Correction of entry in section i

Free format text: IN PCT GAZETTE 42/2002 DUE TO A TECHNICAL PROBLEMAT THE TIME OF INTERNATIONAL PUBLICATION, SOME INFORMATION WAS MISSING UNDER (81). THE MISSING INFORMATION NOW APPEARS IN THE CORRECTED VERSION