US20080131943A1

US20080131943A1 - Polynucleotides for production of farnesyl dibenzodiazepinones

Info

Publication number: US20080131943A1
Application number: US11/876,636
Authority: US
Inventors: Chris M. Farnet; Emmanuel Zazopoulos
Original assignee: Thallion Pharmaceuticals Inc
Current assignee: Thallion Pharmaceuticals Inc
Priority date: 2003-01-21
Filing date: 2007-10-22
Publication date: 2008-06-05
Also published as: ES2483897T3; AU2004206046A1; US20090263886A1; WO2004065591A1; US20050043297A1; MXPA05007743A; US7521222B2; AU2004206046B2; US7304054B2; US20060079512A1; CA2466340A1; CA2466340C; NZ541815A; US20060079509A1; US7101872B2; JP4913588B2; EP1585814B1; JP2006515874A; US20080199940A1; NZ561169A

Abstract

This invention provides genes and their encoded proteins, involved in the biosynthesis of farnesyl dibenzodiazepinones, including ECO-04601. The invention relates to expression vectors comprising the genes and to host cells transformed with these vectors. The invention further relates to methods of producing farnesyl dibenzodiazepinone compounds using the genes and proteins of the invention, for example, involving expression of biosynthetic pathway genes in transformed host cells.

Description

RELATED APPLICATIONS

This application is a divisional of U.S. patent application Ser. No. 11/330,123, filed Jan. 12, 2006, which is a continuation-in-part of U.S. patent application Ser. No. 10/762,107, filed Jan. 21, 2004, now issued as U.S. Pat. No. 7,101,872, which claims priority to each of U.S. Provisional Application No. 60/441,126, filed Jan. 21, 2003, U.S. Provisional Application No. 60/492,997, filed Aug. 7, 2003, and U.S. Provisional Application No. 60/518,286, filed Nov. 10, 2003. The entire disclosures of each of these applications are herein incorporated by reference.

SEQUENCE LISTING ON COMPACT DISK

The content of the following submissions on compact discs are incorporated herein by reference in its entirety: A compact disc copy of the Sequence Listing (COPY 1) (file name: 3005-5US-50US.ST25.txt, date recorded Jan. 10, 2006, size: 298 KB) and a duplicate compact disc copy of the Sequence Listing (COPY 2) (file name: 3005-5US-50US.ST25.txt, date recorded Jan. 10, 2006, size: 298 KB).

FIELD OF THE INVENTION

The invention relates to novel polynucleotide sequences and their encoded proteins, which are involved in the biosynthesis of a farnesyl dibenzodiazepinone compound and analogs. The invention relates to the use of such polynucleotides and proteins to produce farnesyl dibenzodiazepinone compounds and analogs. One method of obtaining the compound is by cultivation of a novel modified strain of Micromonospora sp., i.e., 046-ECO11 or [S01]046; another method involves expression of biosynthetic pathway genes in transformed host cells. The present invention further relates to cosmids 046KM and 046KQ and their methods of use.

BACKGROUND OF THE INVENTION

The euactinomycetes are a subset of a large and complex group of Gram-positive bacteria known as actinomycetes. Over the past few decades these organisms, which are abundant in soil, have generated significant commercial and scientific interest as a result of the large number of therapeutically useful compounds produced as secondary metabolites. The intensive search for strains able to produce new secondary metabolites having potential therapeutic applications has led to the identification of hundreds of new species. Many of the euactinomycetes, particularly Streptomyces and the closely related Saccharopolyspora genera, have been extensively studied. Both of these genera produce a notable diversity of biologically active metabolites. Because of the commercial significance of these compounds, much is known about the genetics and physiology of these organisms.
Microbial genomic information is unique in that, unlike the organization of genomic information in higher life forms, microbial secondary metabolic biosynthetic genes are known to cluster together within the genome. This information allows identification of the gene locus encoding the enzymes responsible for the biosynthesis of a specific molecule. Equally, the identification of the genes present within a cluster allows prediction of the structure of the secondary metabolite. The identification of the genes and proteins responsible for the production of active molecules allows for example, generation of structural analogs or improvement of the production process.
U.S. patent application Ser. No. 10/762,107 describes a dibenzodiazepinone secondary metabolite, specifically 10-farnesyl-4,6,8-trihydroxy-dibenzodiazepin-11-one (named ECO-04601) produced by a known euactinomycetes strain, Micromonospora sp. (IDAC 231203-01). Likewise, U.S. Pat. No. 5,541,181 (Ohkuma et al.) also discloses a dibenzodiazepinone secondary metabolite, specifically 5-farnesyl-4,7,9-trihydroxy-dibenzodiazepin-11-one (named “BU-4664L”), produced by a known euactinomycetes strain, Micromonospora sp. M990-6 (ATCC 55378). Both these dibenzodiazepinones have been reported to have anti-tumor activity.
Although many biologically active compounds have been identified from bacteria, there remains the need to obtain novel naturally occurring compounds with enhanced properties. Current methods of obtaining such compounds include screening of natural isolates and chemical modification of existing compounds, both of which are costly and time consuming. Current screening methods are based on general biological properties of the compound, which require prior knowledge of the structure of the molecules. Methods for chemically modifying known active compounds exist, but still suffer from practical limitations as to the type of compounds obtainable.
Thus, there exists a considerable need to obtain pharmaceutically active compounds in a cost-effective manner and with high yield. The present invention solves these problems by providing polynucleotides, polypeptides, vectors comprising the polynucleotides and host cells comprising the vectors for production of dibenzodiazepinones, as well as methods to generate farnesyl dibenzodiazepinones by de novo biosynthesis (heterologous or homologous expression of biosynthetic genes) or semi-synthesis rather than by chemical synthesis.

SUMMARY OF THE INVENTION

The invention further encompasses an isolated polynucleotide comprising one or more of SEQ ID NOs. 1, 64 and 73, wherein the polynucleotide encodes a polypeptide that participates in a biosynthetic pathway for a farnesyl dibenzodiazepinone.
The invention further encompasses an isolated polynucleotide comprising SEQ ID NOs. 1, 64 and 73, wherein the polynucleotide encodes a polypeptide that participates in a biosynthetic pathway for a farnesyl dibenzodiazepinone.
The invention further encompasses an isolated polynucleotide that encodes a polypeptide selected from the group consisting of SEQ ID NOs. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 71, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94 and 96.
The invention further provides an isolated nucleic acid comprising a nucleotide sequence identical or complementary to a polynucleotide encoding a polypeptide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% or at least 99% identity to a sequence selected from the group consisting of SEQ ID NOs. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 71, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94 and 96 said polypeptide having the same biological function as its corresponding protein.
The invention further provides an isolated nucleic acid comprising a nucleotide sequence hybridizing under low, moderate, high or very high stringency conditions to the complement of a polynucleotide encoding a sequence selected from the group consisting of SEQ ID NOs. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 71, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94 and 96, said polypeptide having the same biological function as its corresponding protein.
The invention provides an isolated, purified or enriched nucleic acid comprising a polynucleotide, or a nucleotide sequence complementary thereto, said polynucleotide encoding a polypeptide selected from an adenylating amide synthetase (ADSA) having at least 80%, at least 90%, or at least 95% identity to the adenylating amide synthetase of SEQ ID NO: 48; and an isoprenyl transferase (IPTN) having at least 80%, at least 90%, or at least 95% identity to the isoprenyl transferase of SEQ ID NO: 22. In one embodiment, the invention provides an expression vector comprising said ADSA or IPTN-encoding nucleic acid. In another embodiment, the invention provides host cells transformed which such vector.
The invention further provides a polypeptide selected from an adenylating amide synthetase (ADSA) having at least 80%, at least 90%, or at least 95% identity to the adenylating amide synthetase of SEQ ID NO: 48; and an isoprenyl transferase (IPTN) having at least 80%, at least 90%, or at least 95% identity to the isoprenyl transferase of SEQ ID NO: 22.
In one embodiment, the isolated polynucleotide comprising SEQ ID No. 1 encodes a polypeptide selected from the group consisting of SEQ ID Nos. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60 and 62.
In another embodiment, the isolated polynucleotide comprising SEQ ID No. 64 encodes a polypeptide selected from the group consisting of SEQ ID NOS: 65, 67, 69 and 71.
In another embodiment, the isolated polynucleotide comprising SEQ ID No. 73, encodes a polypeptide selected from the group consisting of SEQ ID NOS: 74, 76, 78, 80, 82, 84, 86 and 88.
The invention further encompasses an isolated polypeptide of SEQ ID NO. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 71, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94 and 96.
The invention further provides an isolated polypeptide having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% or at least 99% identity to a sequence selected from the group consisting of SEQ ID NOs. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 71, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94 and 96, said polypeptide having the same biological function as its corresponding protein.
In one embodiment, the polypeptide participates in a biosynthetic pathway for a farnesyl dibenzodiazepinone.
The invention further encompasses an expression vector comprising one or more of the polynucleotides described herein.
The invention further encompasses a recombinant prokaryotic organism comprising one or more such expression vectors.
In one embodiment, the organism is an actinomycete.
In another embodiment, the organism requires the expression vector to synthesize a farnesyl dibenzodiazepinone. That is, the organism is deficient in the ability to synthesize a farnesyl dibenzodiazepinone before transformation with a polynucleotide as described herein.
The invention further encompasses a method of making a farnesyl dibenzodiazepinone de novo in a prokaryote, comprising the steps of: (a) providing a prokaryote that is incapable of synthesizing a farnesyl dibenzodiazepinone; (b) transforming the prokaryote with an expression vector as described herein; and (c) culturing the prokaryote under conditions such that a polypeptide of the invention is expressed and catalyses the synthesis of a farnesyl dibenzodiazepinone compound or analog.
In one embodiment, the prokaryote is an actinomycete.
In another embodiment, the vector expresses a polypeptide of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 71, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94 and 96.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1: shows inhibition of tumor growth resulting from administration of 10 to 30 mg/kg of ECO-04601 to glioblastoma-bearing mice beginning one day after tumor cell inoculation.

FIG. 2: shows inhibition of tumor growth resulting from administration of 20-30 mg/kg of ECO-04601 to glioblastoma-bearing mice beginning ten days after tumor cell inoculation.

FIG. 3: shows micrographs of tumor sections from mice bearing glioblastoma tumors and treated with saline or ECO-04601. The cell density of tumor treated with ECO-04601 appears decreased and nuclei from ECO-04601-treated tumor cells are larger and pynotic suggesting a cytotoxic effect.

FIG. 4: shows the biosynthetic locus of ECO-04601, isolated from Micromonospora sp. strain 046-ECO11, including the positions of cosmids 046KM and 046KQ.

FIGS. 5 to 8: show the different steps involved in the biosynthetic pathway of ECO-04601. Each of FIGS. 5 to 8 shows the three biosynthetic loci A, B and C where ORFs are represented by arrows. Highlighted ORFs are involved in the steps described in the schematic diagram. The biosynthetic enzymes involved in the steps depicted in schematic diagrams are indicated by their family designation and the respective ORF number in each of Loci A, B and C (e.g., 8/7/7).

FIG. 5: shows a schematic diagram of the biosynthetic pathway for the production of farnesyl-diphosphate, providing the farnesyl group of ECO-04601.

FIG. 6: shows a schematic diagram of the biosynthetic pathway for the production of 3-hydroxy-anthranilate-adenylate precursor of the dibenzodiazepinone group.

FIG. 7: shows a schematic diagram of the biosynthetic pathway for the production of 2-amino-6-hydroxy-[1,4]benzoquinone precursor of the core dibenzodiazepinone.

FIG. 8: shows a schematic diagram of the biosynthetic pathway for the assembly of the ECO-04601 precursors, farnesyl-diphosphate, 3-hydroxy-anthranilate-adenylate and 2-amino-6-hydroxy-[1,4]benzoquinone.

FIGS. 9 and 10: show clustal alignments respectively of isoprenyl transferase and adenylating amide synthetase enzymes of locus A with the corresponding enzymes present in loci B and C. In each of the clustal alignments: (i) an asterisk “*” indicates positions which have a single, fully conserved residues; (ii) a colon “:” indicates that one of the following strong groups is fully conserved in a specific position: (S, T or A); (N, E, Q or K); (N, H, Q or K); (N, D, E or Q); (Q, H, R or K); (M, I, L or V); (M, I, L or F); (H or Y); and (F, Y or W); and (iii) a period “.” indicates that one of the following weaker groups is fully conserved: (C, S or A); (A, T or V); (S, A or G); (S, T, N or K); (S, T, P or A); (S, G, N or D); (S, N, D, E, Q or K); (N, D, E, Q, H or K); (N, E, Q, H, R or K); (F, V, L, I or M): and (H, F or Y). The number at the end of each line indicates the position of the last amino acid of the line within the specific domain.

FIG. 9: shows an amino acid alignment comparing the isoprenyl transferase (IPTN) enzyme of locus A (SEQ ID NO: 22), isolated from Micronospora sp. strain 046-ECO11, with the isoprenyl transferase enzyme of locus B (SEQ ID NO 90) isolated from Micromonospora echinospora challisensis NRRL 12255, and the partial isoprenyl transferase enzyme of locus C (SEQ ID NO: 94) isolated from Streptomyces carzinostaticus neocarzinostaticus ATCC 15944.

FIG. 10: shows an amino acid alignment comparing the adenylating amide synthetase (ADSA) enzyme of locus A (SEQ ID NO: 48), isolated from Micronospora sp. strain 046-ECO11, with the adenylating amide synthetase of locus B (SEQ ID NO 92) isolated from Micromonospora echinospora challisensis NRRL 12255, and locus C (SEQ ID NO: 96) isolated from Streptomyces carzinostaticus neocarzinostaticus ATCC 15944.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides isolated and purified polynucleotides that encode farnesyl dibenzodiazepinone-producing enzymes, i.e., polypeptides from farnesyl dibenzodiazepinone-producing microorganisms, fragments thereof, vectors containing those polynucleotides, and host cells transformed with those vectors. These polynucleotides, fragments thereof, and vectors comprising the polynucleotides can be used as reagents in the production of farnesyl dibenzodiazepinones. The invention also relates to a method for producing new farnesyl dibenzodiazepinones, by selectively altering the genetic information of an organism or by feeding the proteins or a host cell transformed with vectors comprising nucleic acids encoding them, with close analogs of the key intermediates. Portions of the polynucleotide sequences disclosed herein are also useful as primers for the amplification of DNA or as probes to identify related domains from other farnesyl dibenzodiazepinone producing microorganisms.

I. Definitions

For convenience, the meaning of certain terms and phrases used in the specification, examples, and appended claims, are provided below.
As used herein, the term “farnesyl dibenzodiazepinone” refers to a class of dibenzodiazepinone compounds containing a farnesyl moiety. The term includes, but is not limited to, the exemplified compound of the present invention, 10-farnesyl-4,6,8-trihydroxy-dibenzodiazepin-11-one, which is referred to herein as “ECO-04601.”
The terms “farnesyl dibenzodiazepinone-producing microorganism” and “producer of farnesyl dibenzodiazepinone,” as used herein, refer to a microorganism that carries genetic information necessary to produce a farnesyl dibenzodiazepinone compound, whether or not the organism naturally produces the compound. The terms apply equally to organisms in which the genetic information to produce the farnesyl dibenzodiazepinone compound is found in the organism as it exists in its natural environment, and to organisms (host cells) in which the genetic information is introduced by recombinant techniques.
Specific organisms contemplated herein include, without limitation, organisms of the family Micromonosporaceae, of which preferred genera include Micromonospora, Actinoplanes and Dactylosporangium; the family Streptomycetaceae, of which preferred genera include Streptomyces and Kitasatospora; the family Pseudonocardiaceae, of which preferred genera are Amycolatopsis and Saccharopolyspora; and the family Actinosynnemataceae, of which preferred genera include Saccharothrix and Actinosynnema; however the terms are intended to encompass all organisms containing genetic information necessary to produce a farnesyl dibenzodiazepinone compound. A preferred producer of a farnesyl dibenzodiazepinone compound includes microbial strain 046-ECO11, a deposit of which was made on Mar. 7, 2003, with the International Depository Authority of Canada (IDAC), Bureau of Microbiology, Health Canada, 1015 Arlington Street, Winnipeg, Manitoba, Canada R3E 3R2, under Accession No. IDAC 070303-01.
The term “gene” means the segment of DNA involved in producing a polypeptide chain; it includes regions preceding and following the coding region (leader and trailer) as well as, where applicable, intervening regions (introns) between individual coding segments (exons).
The terms “gene locus, “gene cluster,” and “biosynthetic locus” refer to a group of genes or variants thereof involved in the biosynthesis of a farnesyl dibenzodiazepinone compound. For example, the biosynthetic locus in strain 046-ECO11 that directs the production of ECO-04601 referred to herein as “046D” or “locus A”, the biosynthetic locus in Micromonospora echinospora challisensis NRRL 12255 referred to herein as “052E” or “locus B”, the biosynthetic locus in Streptomyces carzinostaticus neocarzinostaticus ATCC 15944 referred to herein as “237C” or “locus C”, or the corresponding biosynthetic locus from a farnesyl dibenzodiazepinone-producing microorganism. Genetic modification of gene locus, gene cluster or biosynthetic locus refers to any genetic recombinant techniques known in the art including mutagenesis, inactivation, or replacement of nucleic acids that can be applied to generate variants of ECO-04601.
A DNA or nucleotide “coding sequence” or “sequence encoding” a particular polypeptide or protein, is a DNA sequence which is transcribed and translated into a polypeptide or protein when placed under the control of an appropriate regulatory sequence.
“Oligonucleotide” refers to a nucleic acid, generally of at least 10, preferably 15 and more preferably at least 20 nucleotides in length, preferably no more than 100 nucleotides in length, that are hybridizable to a genomic DNA molecule, a cDNA molecule, or an mRNA molecule encoding a gene, mRNA, cDNA or other nucleic acid of interest.
A promoter sequence is “operably linked to” a coding sequence recognized by RNA polymerase which initiates transcription at the promoter and transcribes the coding sequence into mRNA.
The term “repl icon” as used herein means any genetic element, such as a plasmid, cosmid, chromosome or virus, that behaves as an autonomous unit of polynucleotide replication within a cell. An “expression vector” or “vector” is a replicon in which another polynucleotide fragment is attached, such as to bring about the replication and/or expression of the attached fragment. “Plasmids” are designated herein by a lower case “p” preceded or followed by capital letters and/or numbers. The starting plasmids disclosed herein are commercially available, publicly available on an unrestricted basis, or can be constructed from available plasmids in accordance with published procedures. In addition, equivalent plasmids to those described herein are known in the art and will be apparent to the skilled artisan.
The terms “express” and “expression” means allowing or causing the information in a gene or DNA sequence to become manifest, for example producing a protein by activating the cellular functions involved in transcription and translation of a corresponding gene or DNA sequence. A DNA sequence is expressed in or by a cell to form an “expression product” such as a protein. The expression product itself, e.g. the resulting protein, may also be said to be “expressed” by the cell. An expression product can be characterized as intracellular, extracellular or secreted.
“Digestion” of DNA refers to enzymatic cleavage of the DNA with a restriction enzyme that acts only at certain sequences in the DNA. The various restriction enzymes used herein are commercially available and their reaction conditions, cofactors and other requirements were used as would be known to the ordinary skilled artisan. For analytical purposes, typically 1 μg of plasmid or DNA fragment is used with about 2 units of enzyme in about 20 μl of buffer solution. For the purpose of isolating DNA fragments for plasmid construction, typically 5 to 50 μg of DNA are digested with 20 to 250 units of enzyme in a larger volume. Appropriate buffers and substrate amounts for particular enzymes are specified by the manufacturer. Incubation times of about 1 hour at 37° C. are ordinarily used, but may vary in accordance with the supplier's instructions. After digestion the gel electrophoresis may be performed to isolate the desired fragment.
The term “isolated” as used herein means that the material is removed from its original environment (e.g. the natural environment where the material is naturally occurring). For example, a naturally occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or polypeptide, which is separated from some or all of the coexisting materials in the natural system, is isolated. Such polynucleotides could be part of a vector and/or such polynucleotides or polypeptides could be part of a composition, and still be isolated in that the vector or composition is not part of the natural environment.
The term “restriction fragment” as used herein refers to any linear DNA generated by the action of one or more restriction enzymes.
The term “transformation” means the introduction of a foreign gene, foreign nucleic acid, DNA or RNA sequence to a host cell, so that the host cell will express the introduced gene or sequence to produce a desired substance, typically a protein or enzyme coded by the introduced gene or sequence. The introduced gene or sequence may also be called a “cloned” or “foreign” gene or sequence, may include regulatory or control sequences, such as start, stop, promoter, signal, secretion, or other sequences used by a cell's genetic machinery. The gene or sequence may include nonfunctional sequences or sequences with no known function. A host cell that receives and expresses introduced DNA or RNA has been “transformed” and is a “transformant” or a “clone” or “recombinant”. The DNA or RNA introduced to a host cell can come from any source, including cells of the same genus or species as the host cell, or cells of a different genus or species.
The terms “recombinant polynucleotide” and “recombinant polypeptide” as used herein mean a polynucleotide or polypeptide which by virtue of its origin or manipulation is not associated with all or a portion of the polynucleotide or polypeptide with which it is associated in nature and/or is linked to a polynucleotide or polypeptide other than that to which it is linked in nature.
The term “host cell” as used herein, refer to both prokaryotic and eukaryotic cells which are used as recipients of the recombinant polynucleotides and vectors provided herein. In one embodiment, the host cell is a prokaryote.
The terms “open reading frame” and “ORF” as used herein refers to a region of a polynucleotide sequence which encodes a polypeptide; this region may represent a portion of a coding sequence or a total coding sequence.
As used herein and as known in the art, the term “identity” is the relationship between two or more polynucleotide sequences, as determined by comparing the sequences. Identity also means the degree of sequence relatedness between polynucleotide sequences, as determined by the match between strings of such sequences. Identity can be readily calculated (see, e.g., Computation Molecular Biology, Lesk, A. M., eds., Oxford University Press, New York (1998), and Biocomputing: Informatics and Genome Projects, Smith, D. W., ed., Academic Press, New York (1993), both of which are incorporated by reference herein). While there exist a number of methods to measure identity between two polynucleotide sequences, the term is well known to skilled artisans (see, e.g., Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press (1987); and Sequence Analysis Primer, Gribskov., M. and Devereux, J., eds., M. Stockton Press, New York (1991)). Methods commonly employed to determine identity between sequences include, for example, those disclosed in Carillo, H., and Lipman, D., SIAM J. Applied Math. (1988) 48:1073. “Substantially identical,” as used herein, means there is a very high degree of homology (preferably 100% sequence identity) between subject polynucleotide sequences. However, polynucleotides having greater than 90%, or 95% sequence identity may be used in the present invention, and thus sequence variations that might be expected due to genetic mutation, strain polymorphism, or evolutionary divergence can be tolerated.

II. Method of Making a Farnesyl Dibenzodiazepinone by Fermentation

The farnesyl dibenzodiazepinone compounds of the present invention may be biosynthesized by various microorganisms. Microorganisms that may synthesize the compounds of the present invention include but are not limited to bacteria of the order Actinomycetales, also referred to as actinomycetes. Non-limiting examples of members belonging to the genera of Actinomycetes include Nocardia, Geodermatophilus, Actinoplanes, Micromonospora, Nocardioides, Saccharothrix, Amycolatopsis, Kutzneria, Saccharomonospora, Saccharopolyspora, Kitasatospora, Streptomyces, Microbispora, Streptosporangium, and Actinomadura. The taxonomy of actinomycetes is complex and reference is made to Goodfellow, Suprageneric Classification of Actinomycetes (1989); Bergey's Manual of Systematic Bacteriology, Vol. 4 (Williams and Wilkins, Baltimore, pp. 2322-2339); and to Embley and Stackebrandt, “The molecular phylogeny and systematics of the actinomycetes,” Annu. Rev. Microbiol. (1994) 48:257-289, each of which is hereby incorporated by reference in its entirety, for genera that may synthesize the compounds of the invention.
Farnesyl dibenzodiazepinone-producing microorganisms are cultivated in culture medium containing known nutritional sources for actinomycetes. Such media having assimilable sources of carbon, nitrogen, plus optional inorganic salts and other known growth factors at a pH of about 6 to about 9. Suitable media include, without limitation, the growth media provided in Table 1. Microorganisms are cultivated at incubation temperatures of about 18° C. to about 40° C. for about 3 to about 40 days.

TABLE 1

Examples of Fermentation Media

Component	QB	MA	KH	RM	JA	FA	HI	CL

pH *¹	7.2	7.5	7	6.85	7.3	7.0	7.0	7.0
Glucose	12		10	10		10
Sucrose				100
Cane molasses						15
Corn starch					30
Soluble starch	10	25
Potato dextrin			20			40	20	20
Corn steep solid	5
Corn steep liquor	5				15
Dried yeast		2
Yeast extract			5				8.34
Malt extract					35
Pharmamedia ™	10				15
Glycerol							30	20
NZ-Amine A			5			10
Soybean powder		15
Fish meal								10
Bacto-peptone							2.5	5
MgSO₄•7H₂O						1
CaCO ₃		4	1		2	2	3	2
NaCl		5
(NH₄)₂SO₄		2						2
K₂SO₄				0.25
MgCl₂•6H₂O				10
Na₂HPO₄						3
Casamino acid				0.1
Proflo oil ™ (mL/L)	4
MOPS				21
Trace element				2
solution *²ml/L

Unless otherwise indicated all the ingredients are in g/L.
*¹The pH is to adjusted as marked prior to the addition of CaCO₃.
*²Trace elements solution contains: ZnCl₂40 mg; Fe Cl₃6H₂O (200 mg); CuCl₂2H₂O (10 mg); MnCl₂•4H₂O; Na₂B₄O₇•10H₂O (10 mg); (NH₄)₆MO₇O₂₄•4H₂O (10 mg) per litre.

The culture media inoculated with the farnesyl dibenzodiazepinone-producing microorganisms may be aerated by incubating the inoculated culture media with agitation, for example, shaking on a rotary shaker, or a shaking water bath. Aeration may also be achieved by the injection of air, oxygen or an appropriate gaseous mixture to the inoculated culture media during incubation. Following cultivation, the farnesyl dibenzodiazepinone compounds can be extracted and isolated from the cultivated culture media by techniques known to a skilled person in the art and/or disclosed herein, including for example centrifugation, chromatography, adsorption, filtration. For example, the cultivated culture media can be mixed with a suitable organic solvent such as n-butanol, n-butyl acetate or 4-methyl-2-pentanone, the organic layer can be separated for example, by centrifugation followed by the removal of the solvent, by evaporation to dryness or by evaporation to dryness under vacuum. The resulting residue can optionally be reconstituted with for example water, ethanol, ethyl acetate, methanol or a mixture thereof, and re-extracted with a suitable organic solvent such as hexane, carbon tetrachloride, methylene chloride or a mixture thereof. Following removal of the solvent, the compounds may be further purified by the use of standard techniques, such as chromatography.

III. Method of Making a Farnesyl Dibenzodiazepinone by Recombinant Technology

In another embodiment, the present invention relates to nucleic acid molecules that encode proteins useful in the production of farnesyl benzodiazepinones. Specifically, the present invention provides recombinant DNA vectors and nucleic acid molecules that encode all or part of the biosynthetic locus in strain 046-ECO11, which directs the production of ECO-04601, and is referred to herein as “046D.” The invention further includes genetic modification of 046D using conventional genetic recombinant techniques, such as mutagenesis, inactivation, or replacement of nucleic acids, to produce chemical variants of ECO-04601.
The invention thus provides a method for making a farnesyl benzodiazepinone compound using a transformed host cell comprising a recombinant DNA vector that encodes one or more of the polypeptides of the present invention, and culturing the host cell under conditions such that farnesyl benzodiazepinone is produced. In one embodiment, the host cell is a prokaryote. In another embodiment, the host cell is an actinomycete. In another embodiment, the host cell is a Streptomyces host cell. In a further embodiment, the host cell is a non-Streptomyces actinomycete such as a Rhodococcus, a Mycobaterium, or an Amycolatopsis specie.
The invention provides recombinant nucleic acids that produce a variety of farnesyl dibenzodiazepinone compounds that cannot be readily synthesized by chemical methodology alone. The invention allows direct manipulation of 046D biosynthetic locus via genetic engineering of the enzymes involved in the biosynthesis of a farnesyl dibenzodiazepinone according to the invention. The 046D biosynthetic locus is described in Example 5.
Farnesyl dibenzodazepinones and analogs are also produced by feeding one or more key intermediates or biosynthetic precursors (as defined in FIGS. 5-8) or close structural analogs, to a host cell comprising a recombinant DNA vector that encodes one or more of the polypeptides of the present invention, and culturing the host cell under conditions such that the farnesyl benzodiazepinone or analog is produced. Key intermediates are contacted directly with an isolated protein of the invention to perform the necessary steps for the production of a farnesyl dibenzodiazepinone (e.g., the farnesyl diphopshate and dibenzodiazepinone precursors can be coupled using an IPTN protein of the invention).
Key intermediates may be commercially available or may be prepared using standard chemical procedures or using the proteins of this invention. For example, farnesyl diphosphate and 3-hydroxyanthranilic acid are commercially available (e.g., Fluka F6892 and Aldrich 148776). 3-Amino-5-hydroxybenzoic acid, a precursor of the 2-amino-6-hydroxybenzoquinone, is prepared as described in Herlt et al (1981), Aust. J. Chem., vol 34, 1319-1324.
Recombinant DNA Vectors
Vectors of the invention typically comprise the DNA of a transmissible agent, into which foreign DNA is inserted. A common way to insert one segment of DNA into another segment of DNA involves the use of specific enzymes called restriction enzymes that cleave DNA at specific sites (specific groups of nucleotides) called restriction sites. A “cassette” refers to a DNA coding sequence or segment of DNA that codes for an expression product that can be inserted into a vector at defined restriction sites. The cassette restriction sites are designed to ensure insertion of the cassette in the proper reading frame. Generally, a nucleic acid molecule that encodes a protein useful in the production of a farnesyl dibenzodiazepinone is inserted at one or more restriction sites of the vector DNA, and then is carried by the vector into a prokaryote e.g. actinomycte, by transformation (see below). A segment or sequence of DNA having inserted or added DNA, such as an expression vector, can also be called a “DNA construct”. A common type of vector is a “plasmid” which generally is a self-contained molecule of double-stranded DNA, usually of bacterial origin, that can readily accept additional (foreign) DNA and which can be readily introduced into a suitable host cell. A plasmid vector often contains coding DNA and promoter DNA and has one or more restriction sites suitable for inserting foreign DNA. Coding DNA is a DNA sequence that encodes a particular amino acid sequence for a particular protein or enzyme. In one embodiment of the invention, the coding DNA encodes for polypeptides of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96 or 98 that may be useful for the biosynthesis of a farnesyl dibenzodiazepinone.
Promoter DNA of a recombinant vector is a DNA sequence that initiates, regulates, or otherwise mediates or controls the expression of the coding DNA. Promoter DNA and coding may be from the same or different organisms. Recombinant cloning vectors will often include one or more replication systems for cloning or expression, one or more markers for selection in the host, e.g. antibiotic resistance, and one or more expression cassettes. Vector constructs may be produced using conventional molecular biology and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Sambrook, Fritsch & Maniatis, Molecular Cloning: A Laboratory Manual, Second Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (herein “Sambrook et al., 1989”); DNA Cloning: A Practical Approach, Volumes I and II (D. N. Glover ed. 1985); F. M. Ausubel et al. (eds.), Current Protocols in Molecular Biology, John Wiley & Sons, Inc. (1994).
Examples of promoters that function in actinomycetes, e.g. Streptomyces, are taught in U.S. Pat. Nos. 5,830,695 and 5,466,590. Another example of a transcription promoter useful in Actinomycetes expression vectors is tipA, a promoter inducible by the antibiotic thiostrepton [c.f. Murakami, T., et al., (1989), J. Bacteriol, 171, 1459].
Transformation of Actinomycetes
A suitable transformation method for use with an actinomycete comprises forming the actinomycete culture into spheroplasts using lysozyme. A buffer solution containing recombinant DNA vectors and polyethylene glycol is then added, in order to introduce the vector into the host cells, by using either of the methods of Thompson or Keiser [c. f. Thompson, C. J., et al., (1982), J. Bacteriol., 151, 668-677 or Keiser, T. et al. (2000), “Practical Streptomyces Genetics”, The John Innes Foundation, Norwich], for example. A thiostrepton-resistance gene is frequently used as a selective marker in the transformation plasmid [c.f. Hopwood, D. A., et al., (1987), “Methods in Enzymology” 153, 116, Academic Press, New York], but the present invention is not limited thereto. Additional methods for the transformation of actinomycetes are taught in U.S. Pat. No. 5,393,665.
Assay for Farnesyl Dibenzodiazepinone or Biosynthetic Intermediates
Actinomycetes defective in farnesyl dibenzodiazepinone biosynthesis are transformed with one or more expression vectors encoding one or more proteins in the farnesyl benzodiazepinone biosynthetic pathway, thus restoring farnesyl benzodiazepinone biosynthesis by genetic complementation of the specific defect.
The presence or absence of farnesyl dibenzodiazepinone or intermediates in the biosynthetic pathway (see FIGS. 5 to 8) in a recombinant actinomycete can be determined using methodologies that are well known to persons of skill in the art. For example, ethyl acetate extracts of fermentation media used for the culture of a recombinant actinomycete are processed as described in Example 2 and fractions containing farnesyl dibenzodiazepinone or intermediates detected by TLC on commercial Kieselgel 60 F₂₅₄plates. Farnesyl dibenzodiazepinone and intermediate compounds are visualized by inspection of dried plates under UV light or by spraying the plates with a spray containing vanillin (0.75%) and concentrated sulfuric acid (1.5%, v/v) in ethanol and subsequently heating the plate. The exact identity of the compounds separated by TLC is then determined using gas chromatography-mass spectroscopy. Methods of mass spectroscopy are taught in the published U.S. Patent Application No. US2003/0052268.
Mutagenesis
The invention allows direct manipulation of 046D biosynthetic locus via genetic engineering of the enzymes involved in the biosynthesis of a farnesyl benzodiazepinone according to the invention.
A number of methods are known in the art that permit the random as well as targeted mutation of the DNA sequences of the invention (see for example, Ausubel et. al. Short Protocols in Molecular Biology (1995) 3rd Ed. John Wiley & Sons, Inc.). In addition, there are a number of commercially available kits for site-directed mutagenesis, including both conventional and PCR-based methods. Examples include the EXSITE™ PCR-Based Site-directed Mutagenesis Kit available from Stratagene (Catalog No. 200502) and the QUIKCHANGE™ Site-directed mutagenesis Kit from Stratagene (Catalog No. 200518), and the CHAMELEON® double-stranded Site-directed mutagenesis kit, also from Stratagene (Catalog No. 200509).
In addition the nucleotides of the invention may be generated by insertional mutation or truncation (N-terminal, internal or C-terminal) according to methodology known to a person skilled in the art.
Older methods of site-directed mutagenesis known in the art rely on sub-cloning of the sequence to be mutated into a vector, such as an M13 bacteriophage vector, that allows the isolation of single-stranded DNA template. In these methods, one anneals a mutagenic primer (i.e., a primer capable of annealing to the site to be mutated but bearing one or more mismatched nucleotides at the site to be mutated) to the single-stranded template and then polymerizes the complement of the template starting from the 3′ end of the mutagenic primer. The resulting duplexes are then transformed into host bacteria and plaques are screened for the desired mutation.
More recently, site-directed mutagenesis has employed PCR methodologies, which have the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require sub-cloning. Several issues must be considered when PCR-based site-directed mutagenesis is performed. First, in these methods it is desirable to reduce the number of PCR cycles to prevent expansion of undesired mutations introduced by the polymerase. Second, a selection must be employed in order to reduce the number of non-mutated parental molecules persisting in the reaction. Third, an extended-length PCR method is preferred in order to allow the use of a single PCR primer set. And fourth, because of the non-template-dependent terminal extension activity of some thermostable polymerases it is often necessary to incorporate an end-polishing step into the procedure prior to blunt-end ligation of the PCR-generated mutant product.
The protocol described below accommodates these considerations through the following steps. First, the template concentration used is approximately 1000-fold higher than that used in conventional PCR reactions, allowing a reduction in the number of cycles from 25-30 down to 5-10 without dramatically reducing product yield. Second, the restriction endonuclease Dpn I (recognition target sequence: 5-Gm6ATC-3, where the A residue is methylated) is used to select against parental DNA, since most common strains of E. coli Dam methylate their DNA at the sequence 5-GATC-3. Third, Taq Extender is used in the PCR mix in order to increase the proportion of long (i.e., full plasmid length) PCR products. Finally, Pfu DNA polymerase is used to polish the ends of the PCR product prior to intramolecular ligation using T4 DNA ligase.
A non-limiting example for the isolation of mutant polynucleotides is described in detail as follows:
Plasmid template DNA (approximately 0.5 pmole) is added to a PCR cocktail containing: 1× mutagenesis buffer (20 mM Tris HCl, pH 7.5; 8 mM MgCl₂; 40 μg/ml BSA); 12-20 pmole of each primer (one of skill in the art may design a mutagenic primer as necessary, giving consideration to those factors such as base composition, primer length and intended buffer salt concentrations that affect the annealing characteristics of oligonucleotide primers; one primer must contain the desired mutation, and one (the same or the other) must contain a 5′ phosphate to facilitate later ligation), 250 μM each dNTP, 2.5 U Taq DNA polymerase, and 2.5 U of Taq Extender (Available from Stratagene; See Nielson et al. (1994) Strategies 7: 27, and U.S. Pat. No. 5,556,772). Primers can be prepared using the triester method of Matteucci et al., 1981, J. Am. Chem. Soc. 103:3185-3191, incorporated herein by reference. Alternatively automated synthesis may be preferred, for example, on a Biosearch 8700 DNA Synthesizer using cyanoethyl phosphoramidite chemistry.
The PCR cycling is performed as follows: 1 cycle of 4 min at 94° C., 2 min at 50° C. and 2 min at 72° C.; followed by 5-10 cycles of 1 min at 94° C., 2 min at 54° C. and 1 min at 72° C. The parental template DNA and the linear, PCR-generated DNA incorporating the mutagenic primer are treated with DpnI (10 U) and Pfu DNA polymerase (2.5 U). This results in the DpnI digestion of the in vivo methylated parental template and hybrid DNA and the removal, by Pfu DNA polymerase, of the non-template-directed Taq DNA polymerase-extended base(s) on the linear PCR product. The reaction is incubated at 37° C. for 30 min and then transferred to 72° C. for an additional 30 min. Mutagenesis buffer (115 ul of 1×) containing 0.5 mM ATP is added to the DpnI-digested, Pfu DNA polymerase-polished PCR products. The solution is mixed and 10 ul are removed to a new microfuge tube and T4 DNA ligase (2-4 U) is added. The ligation is incubated for greater than 60 min at 37° C. Finally, the treated solution is transformed into competent E. coli according to standard methods.
Methods of random mutagenesis, which will result in a panel of mutants bearing one or more randomly situated mutations, exist in the art. Such a panel of mutants may then be screened for those exhibiting reduced uracil detection activity relative to the wild-type polymerase (e.g., by measuring the incorporation of 10 nmoles of dNTPs into polymeric form in 30 minutes in the presence of 200 μM dUTP and at the optimal temperature for a given DNA polymerase). An example of a method for random mutagenesis is the so-called “error-prone PCR method”. As the name implies, the method amplifies a given sequence under conditions in which the DNA polymerase does not support high fidelity incorporation. The conditions encouraging error-prone incorporation for different DNA polymerases vary, however one skilled in the art may determine such conditions for a given enzyme. A key variable for many DNA polymerases in the fidelity of amplification is, for example, the type and concentration of divalent metal ion in the buffer. The use of manganese ion and/or variation of the magnesium or manganese ion concentration may therefore be applied to influence the error rate of the polymerase.
Genes for desired mutant polypeptides generated by mutagenesis may be sequenced to identify the sites and number of mutations. For those mutants comprising more than one mutation, the effect of a given mutation may be evaluated by introduction of the identified mutation to the wild-type gene by site-directed mutagenesis in isolation from the other mutations borne by the particular mutant. Screening assays of the single mutant thus produced will then allow the determination of the effect of that mutation alone.

IV. Genes and Proteins for the Production of ECO-04601

As discussed in more detail below, the isolated, purified or enriched nucleic acids of one of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89 may be used to prepare one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88, respectively, or fragments comprising at least 50, 75, 100, 200, 300, 500 or more consecutive amino acids of one of the polypeptides of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88.
Accordingly, another aspect of the present invention is an isolated, purified or enriched nucleic acid which encodes one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments comprising at least 50, 75, 100, 150, 200, 300 or more consecutive amino acids of one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88. The coding sequences of these nucleic acids may be identical to one of the coding sequences of one of the nucleic acids of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89 or a fragment thereof, or may be different coding sequences which encode one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments comprising at least 50, 75, 100, 150, 200, 300 consecutive amino acids of one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 as a result of the redundancy or degeneracy of the genetic code. The genetic code is well known to those of skill in the art and can be obtained, for example, from Stryer, Biochemistry, 3^rdedition, W.H. Freeman & Co., New York.
The isolated, purified or enriched nucleic acid which encodes one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 may include, but is not limited to: (1) only the coding sequences of one of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89; (2) the coding sequences of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89 and additional coding sequences, such as leader sequences or proprotein; and (3) the coding sequences of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89 and non-coding sequences, such as non-coding sequences 5′ and/or 3′ of the coding sequence. Thus, as used herein, the term “polynucleotide encoding a polypeptide” encompasses a polynucleotide that includes only coding sequence for the polypeptide as well as a polynucleotide that includes additional coding and/or non-coding sequence.
The invention relates to polynucleotides based on SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89 but having polynucleotide changes that are “silent”, for example changes which do not alter the amino acid sequence encoded by the polynucleotides of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89. The invention also relates to polynucleotides which have nucleotide changes which result in amino acid substitutions, additions, deletions, fusions and truncations of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88. Such nucleotide changes may be introduced using techniques such as site directed mutagenesis, random chemical mutagenesis, exonuclease III deletion, and other recombinant DNA techniques.
The isolated, purified or enriched nucleic acids of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89, the sequences complementary thereto, or a fragment comprising at least 100, 150, 200, 300, 400 or more consecutive bases of one of the sequence of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89, or the sequences complementary thereto may be used as probes to identify and isolate DNAs encoding the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 respectively. In such procedures, a genomic DNA library is constructed from a sample microorganism or a sample containing a microorganism capable of producing a farnesyl dibenzodiazepinone. The genomic DNA library is then contacted with a probe comprising a coding sequence or a fragment of the coding sequence, encoding one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88, or a fragment thereof under conditions which permit the probe to specifically hybridize to sequences complementary thereto. In a preferred embodiment, the probe is an oligonucleotide of about 10 to about 30 nucleotides in length designed based on a nucleic acid of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89. Genomic DNA clones which hybridize to the probe are then detected and isolated. Procedures for preparing and identifying DNA clones of interest are disclosed in Ausubel et al., Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997; and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, 1989. In another embodiment, the probe is a restriction fragment or a PCR amplified nucleic acid derived from SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89.
The isolated, purified or enriched nucleic acids of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89, the sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89 or the sequences complementary thereto may be used as probes to identify and isolate related nucleic acids. In some embodiments, the related nucleic acids may be genomic DNAs (or cDNAs) from potential farnesyl dibenzodiazepinone producers. In such procedures, a nucleic acid sample containing nucleic acids from a potential farnesyl dibenzodiazepinone producer is contacted with the probe under conditions that permit the probe to specifically hybridize to related sequences. The nucleic acid sample may be a genomic DNA (or cDNA) library from the potential farnesyl dibenzodiazepinone-producer. Hybridization of the probe to nucleic acids is then detected using any of the methods described above.
Hybridization may be carried out under conditions of low stringency, moderate stringency or high stringency. As an example of nucleic acid hybridization, a polymer membrane containing immobilized denatured nucleic acids is first prehybridized for 30 minutes at 45° C. in a solution consisting of 0.9 M NaCl, 50 mM NaH₂PO₄, pH 7.0, 5.0 mM Na₂EDTA, 0.5% SDS, 10×Denhardt's, and 0.5 mg/ml polyriboadenylic acid. Approximately 2×10⁷cpm (specific activity 4−9×10⁸cpm/ug) of ³²P end-labeled oligonucleotide probe are then added to the solution. After 12-16 hours of incubation, the membrane is washed for 30 minutes at room temperature in 1×SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na₂EDTA) containing 0.5% SDS, followed by a 30 minute wash in fresh 1×SET at Tm-10° C. for the oligonucleotide probe where Tm is the melting temperature. The membrane is then exposed to autoradiographic film for detection of hybridization signals.
By varying the stringency of the hybridization conditions used to identify nucleic acids, such as genomic DNAs or cDNAs, which hybridize to the detectable probe, nucleic acids having different levels of homology to the probe can be identified and isolated. Stringency may be varied by conducting the hybridization at varying temperatures below the melting temperatures of the probes. The melting temperature of the probe may be calculated using the following formulas:
For oligonucleotide probes between 14 and 70 nucleotides in length the melting temperature (Tm) in degrees Celcius may be calculated using the formula: Tm=81.5+16.6(log [Na+])+0.41 (fraction G+C)−(600/N) where N is the length of the oligonucleotide.
If the hybridization is carried out in a solution containing formamide, the melting temperature may be calculated using the equation Tm=81.5+16.6(log [Na+])+0.41 (fraction G+C)−(0.63% formamide)−(600/N) where N is the length of the probe.
Prehybridization may be carried out in 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 0.1 mg/ml denatured fragmented salmon sperm DNA or 6×SSC, 5×Denhardt's reagent, 0.5% SDS, 0.1 mg/ml denatured fragmented salmon sperm DNA, 50% formamide. The composition of the SSC and Denhardt's solutions are listed in Sambrook et al., supra.
Hybridization is conducted by adding the detectable probe to the hybridization solutions listed above. Where the probe comprises double stranded DNA, it is denatured by incubating at elevated temperatures and quickly cooling before addition to the hybridization solution. It may also be desirable to similarly denature single stranded probes to eliminate or diminish formation of secondary structures or oligomerization. The filter is contacted with the hybridization solution for a sufficient period of time to allow the probe to hybridize to cDNAs or genomic DNAs containing sequences complementary thereto or homologous thereto. For probes over 200 nucleotides in length, the hybridization may be carried out at 15-25° C. below the Tm. For shorter probes, such as oligonucleotide probes, the hybridization may be conducted at 5-10° C. below the Tm. Preferably, the hybridization is conducted in 6×SSC, for shorter probes. Preferably, the hybridization is conducted in 50% formamide containing solutions, for longer probes. All the foregoing hybridizations would be considered to be examples of hybridization performed under conditions of high stringency.
Following hybridization, the filter is washed for at least 15 minutes in 2×SSC, 0.1% SDS at room temperature or higher, depending on the desired stringency. The filter is then washed with 0.1×SSC, 0.5% SDS at room temperature (again) for 30 minutes to 1 hour. Nucleic acids which have hybridized to the probe are identified by conventional autoradiography and non-radioactive detection methods.
The above procedure may be modified to identify nucleic acids having decreasing levels of homology to the probe sequence. For example, to obtain nucleic acids of decreasing homology to the detectable probe, less stringent conditions may be used. For example, the hybridization temperature may be decreased in increments of 5° C. from 68° C. to 42° C. in a hybridization buffer having a Na+ concentration of approximately 1 M. Following hybridization, the filter may be washed with 2×SSC, 0.5% SDS at the temperature of hybridization. These conditions are considered to be “moderate stringency” conditions above 50° C. and “low stringency” conditions below 50° C. A specific example of “moderate stringency” hybridization conditions is when the above hybridization is conducted at 55° C. A specific example of “low stringency” hybridization conditions is when the above hybridization is conducted at 45° C.
Alternatively, the hybridization may be carried out in buffers, such as 6×SSC, containing formamide at a temperature of 42° C. In this case, the concentration of formamide in the hybridization buffer may be reduced in 5% increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. Following hybridization, the filter may be washed with 6×SSC, 0.5% SDS at 50° C. These conditions are considered to be “moderate stringency” conditions above 25% formamide and “low stringency” conditions below 25% formamide. A specific example of “moderate stringency” hybridization conditions is when the above hybridization is conducted at 30% formamide. A specific example of “low stringency” hybridization conditions is when the above hybridization is conducted at 10% formamide. Nucleic acids which have hybridized to the probe are identified by conventional autoradiography and non-radioactive detection methods. Examples of conditions of different stringency are also provided in Table 2.

TABLE 2

Very High Stringency
(detects sequences sharing at least 90% identity)

Hybridization	in	5x	SCC	at	65° C.	16	hours
Wash twice	in	2x	SCC	at	room temeprature	15	mnutes each
Wash twice	in	0.5x	SCC	at	65° C.	20	minutes each

High Stringency

(detects sequences sharing at least 80% identity)

Hybridization	in	5x	SCC	at	65° C.	16	hours
Wash twice	in	2x	SCC	at	room temeprature	20	mnutes each
Wash once	in	1x	SCC	at	55° C.	30	minutes each

Low Stringency

(detects sequences sharing at least 50% identity)

Hybridization	in	6x	SCC	at	room temperature	16	hours
Wash twice	in	3x	SCC	at	room temeprature	20	minutes each

The preceding methods may be used to isolate nucleic acids having at least 97%, at least 95%, at least 90%, at least 85%, at least 80%, or at least 70% sequence identity to a nucleic acid sequence selected from the group consisting of the sequences of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89. The isolated nucleic acid may have a coding sequence that is a naturally occurring allelic variant of one of the coding sequences described herein. Such allelic variant may have a substitution, deletion or addition of one or more nucleotides when compared to the nucleic acids of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87 and 89, or the sequences complementary thereto.
Additionally, the above procedures may be used to isolate nucleic acids which encode polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, or at least 70% identity to a polypeptide having the sequence of one of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments comprising at least 50, 75, 100, 150, 200, 300 consecutive amino acids thereof.
Another aspect of the present invention is an isolated or purified polypeptide comprising the sequence of one of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments comprising at least 50, 75, 100, 150, 200 or 300 consecutive amino acids thereof. As discussed herein, such polypeptides may be obtained by inserting a nucleic acid encoding the polypeptide into a vector such that the coding sequence is operably linked to a sequence capable of driving the expression of the encoded polypeptide in a suitable host cell. For example, the expression vector may comprise a promoter, a ribosome binding site for translation initiation and a transcription terminator. The vector may also include appropriate sequences for modulating expression levels, an origin of replication and a selectable marker.
Promoters suitable for expressing the polypeptide or fragment thereof in bacteria include the E. coli lac or trp promoters, the lad promoter, the lacZ promoter, the T3 promoter, the T7 promoter, the gpt promoter, the lambda P_Rpromoter, the lambda P_Lpromoter, promoters from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), and the acid phosphatase promoter. Fungal promoters include the a factor promoter. Eukaryotic promoters include the CMV immediate early promoter, the HSV thymidine kinase promoter, heat shock promoters, the early and late SV40 promoter, LTRs from retroviruses, and the mouse metallothionein-I promoter. Other promoters known to control expression of genes in prokaryotic or eukaryotic cells or their viruses may also be used.
Mammalian expression vectors may also comprise an origin of replication, any necessary ribosome binding sites, a polyadenylation site, splice donors and acceptor sites, transcriptional termination sequences, and 5′ flanking nontranscribed sequences. In some embodiments, DNA sequences derived from the SV40 splice and polyadenylation sites may be used to provide the required nontranscribed genetic elements.
Vectors for expressing the polypeptide or fragment thereof in eukaryotic cells may also contain enhancers to increase expression levels. Enhancers are cis-acting elements of DNA, usually from about 10 to about 300 bp in length that act on a promoter to increase its transcription. Examples include the SV40 enhancer on the late side of the replication origin bp 100 to 270, the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and the adenovirus enhancers.
In addition, the expression vectors preferably contain one or more selectable marker genes to permit selection of host cells containing the vector. Examples of selectable markers that may be used include genes encoding dihydrofolate reductase or genes conferring neomycin resistance for eukaryotic cell culture, genes conferring tetracycline or ampicillin resistance in E. coli, and the S. cerevisiae TRP1 gene.
The appropriate DNA sequence may be inserted into the vector by a variety of procedures. In general, the DNA sequence is ligated to the desired position in the vector following digestion of the insert and the vector with appropriate restriction endonucleases. Alternatively, appropriate restriction enzyme sites can be engineered into a DNA sequence by PCR. A variety of cloning techniques are disclosed in Ausbel et al. Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997 and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbour Laboratory Press, 1989. Such procedures and others are deemed to be within the scope of those skilled in the art.
The vector may be, for example, in the form of a plasmid, a viral particle, or a phage. Other vectors include derivatives of chromosomal, nonchromosomal and synthetic DNA sequences, viruses, bacterial plasmids, phage DNA, baculovirus, yeast plasmids, vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies. A variety of cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y., (1989), incorporated by reference in its entirety for all purposes.
Particular bacterial vectors which may be used include the commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden), pGEM1 (Promega Biotec, Madison, Wis., USA) pQE70, pQE60, pQE-9 (Qiagen), pD10, phiX174, pBluescript™ II KS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene), ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia), pKK232-8 and pCM7. Particular eukaryotic vectors include pSV2CAT, pOG44, pXT1, pSG (Stratagene), pSVK3, PBPV, PMSG, and PSVL (Pharmacia). However, any other vector may be used as long as it is replicable and stable in the host cell.
The host cell may be any of the host cells familiar to those skilled in the art, including prokaryotic cells or eukaryotic cells. As representative examples of appropriate hosts, there may be mentioned: bacteria cells, such as E. coli, Streptomyces lividans, Streptomyces griseofuscus, Streptomyces ambofaciens, Rhodococcus, Amycolatopsis, Bacillus subtilis, Salmonella typhimurium and various species within the genera Pseudomonas, Streptomyces, Bacillus, and Staphylococcus, fungal cells, such as yeast, insect cells such as Drosophila S2 and Spodoptera Sf9, animal cells such as CHO, COS or Bowes melanoma, and adenoviruses. The selection of an appropriate host is within the abilities of those skilled in the art, see for example Manual of Industrial Microbiology and Biotechnology, 2^ndEdition, ASM Press, Washington D.C., incorporated by reference in its entirety, and more particularly Sections IV, V and VII.
The vector may be introduced into the host cells using any of a variety of techniques, including electroporation transformation, transfection, transduction, viral infection, gene guns, or Ti-mediated gene transfer. Where appropriate, the engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the genes of the present invention. Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter may be induced by appropriate means (e.g., temperature shift or chemical induction) and the cells may be cultured for an additional period to allow them to produce the desired polypeptide or fragment thereof.
Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract is retained for further purification. Microbial cells employed for expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Such methods are well known to those skilled in the art. The expressed polypeptide or fragment thereof can be recovered and purified from recombinant cell cultures by methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin chromatography. Protein refolding steps can be used, as necessary, in completing configuration of the polypeptide. If desired, high performance liquid chromatography (HPLC) can be employed for final purification steps.
Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts (described by Gluzman, Cell, 23:175 (1981)), and other cell lines capable of expressing proteins from a compatible vector, such as the C127, 3T3, CHO, HeLa and BHK cell lines. The constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence. Polypeptides of the invention may or may not also include an initial methionine amino acid residue.
Alternatively, the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments comprising at least 50, 75, 100, 150, 200 or 300 consecutive amino acids thereof can be synthetically produced by conventional peptide synthesizers. In other embodiments, fragments or portions of the polynucleotides may be employed for producing the corresponding full-length polypeptide by peptide synthesis; therefore, the fragments may be employed as intermediates for producing the full-length polypeptides.
Cell-free translation systems can also be employed to produce one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments comprising at least 50, 75, 100, 150, 200 or 300 consecutive amino acids thereof using mRNAs transcribed from a DNA construct comprising a promoter operably linked to a nucleic acid encoding the polypeptide or fragment thereof. In some embodiments, the DNA construct may be linearized prior to conducting an in vitro transcription reaction. The transcribed mRNA is then incubated with an appropriate cell-free translation extract, such as a rabbit reticulocyte extract, to produce the desired polypeptide or fragment thereof.
The present invention also relates to variants of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments comprising at least 50, 75, 100, 150, 200 or 300 consecutive amino acids thereof. The term “variant” includes derivatives or analogs of these polypeptides. In particular, the variants may differ in amino acid sequence from the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 by one or more substitutions, additions, deletions, fusions and truncations, which may be present in any combination.
The variants may be naturally occurring or created in vitro. In particular, such variants may be created using genetic engineering techniques such as site directed mutagenesis, random chemical mutagenesis, exonuclease III deletion procedures, and standard cloning techniques. Alternatively, such variants, fragments, analogs, or derivatives may be created using chemical synthesis or modification procedures.
Other methods of making variants are also familiar to those skilled in the art. These include procedures in which nucleic acid sequences obtained from natural isolates are modified to generate nucleic acids that encode polypeptides having characteristics which enhance their value in industrial or laboratory applications. In such procedures, a large number of variant sequences having one or more nucleotide differences with respect to the sequence obtained from the natural isolate are generated and characterized. Preferably, these nucleotide differences result in amino acid changes with respect to the polypeptides encoded by the nucleic acids from the natural isolates.
The variants of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 may be variants in which one or more of the amino acid residues of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code.
Conservative substitutions are those that substitute a given amino acid in a polypeptide by another amino acid of like characteristics. Typically seen as conservative substitutions are the following replacements: replacements of an aliphatic amino acid such as Ala, Val, Leu and Ile with another aliphatic amino acid; replacement of a Ser with a Thr or vice versa; replacement of an acidic residue such as Asp or Glu with another acidic residue; replacement of a residue bearing an amide group, such as Asn or Gln, with another residue bearing an amide group; exchange of a basic residue such as Lys or Arg with another basic residue; and replacement of an aromatic residue such as Phe or Tyr with another aromatic residue.
Other variants are those in which one or more of the amino acid residues of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 include a substituent group. Still other variants are those in which the polypeptide is associated with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol). Additional variants are those in which additional amino acids are fused to the polypeptide, such as leader sequence, a secretory sequence, a proprotein sequence or a sequence that facilitates purification, enrichment, or stabilization of the polypeptide.
In some embodiments, the fragments, derivatives and analogs retain the same biological function or activity as the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88. In other embodiments, the fragment, derivative or analogue includes a fused heterologous sequence that facilitates purification, enrichment, detection, stabilization or secretion of the polypeptide that can be enzymatically cleaved, in whole or in part, away from the fragment, derivative or analogue.
Another aspect of the present invention are polypeptides or fragments thereof which have at least 70%, at least 80%, at least 85%, at least 90%, or more than 95% identity to one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or a fragment comprising at least 50, 75, 100, 150, 200 or 300 consecutive amino acids thereof. It will be appreciated that amino acid “substantially identity” includes conservative substitutions such as those described above.
The polypeptides or fragments having homology to one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or a fragment comprising at least 50, 75, 100, 150, 200 or 300 consecutive amino acids thereof may be obtained by isolating the nucleic acids encoding them using the techniques described above.
Alternatively, the homologous polypeptides or fragments may be obtained through biochemical enrichment or purification procedures. The sequence of potentially homologous polypeptides or fragments may be determined by proteolytic digestion, gel electrophoresis and/or microsequencing. The sequence of the prospective homologous polypeptide or fragment can be compared to one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or a fragment comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof.
The polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88 or fragments, derivatives or analogs thereof comprising at least 40, 50, 75, 100, 150, 200 or 300 consecutive amino acids thereof invention may be used in a variety of applications. For example, the polypeptides or fragments, derivatives or analogs thereof may be used to catalyze biochemical reactions as described elsewhere in the specification.

EXAMPLES

Unless otherwise indicated, all numbers expressing quantities of ingredients, properties such as molecular weight, reaction conditions, IC₅₀and so forth used in the specification and claims are to be understood as being modified in all instances by the term “about”. Accordingly, unless indicated to the contrary, the numerical parameters set forth in the present specification and attached claims are approximations. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of significant figures and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set in the examples, Tables and Figures are reported as precisely as possible. Any numerical values may inherently contain certain errors resulting from variations in experiments, testing measurements, statistical analyses and such.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Example 1

Preparation of Production Culture

Unless otherwise noted, all reagents were purchased from Sigma Chemical Co. (St. Louis, Mo.), (Aldrich). Micromonospora spp. (deposit accession number IDAC 070303-01) was maintained on agar plates of ISP2 agar (Difco Laboratories, Detroit, Mich.). An inoculum for the production phase was prepared by transferring the surface growth of the Micromonospora spp. from the agar plates to 125-mL flasks containing 25 mL of sterile medium comprised of 24 g potato dextrin, 3 g beef extract, 5 g Bacto-casitone, 5 g glucose, 5 g yeast extract, and 4 g CaCO₃made up to one liter with distilled water (pH 7.0). The culture was incubated at about 28° C. for approximately 60 hours on a rotary shaker set at 250 rpm. Following incubation, 10 mL of culture was transferred to a 2 L baffled flask containing 500 mL of sterile production medium containing 20 g/L potato dextrin, 20 g/L glycerol, 10 g/L Fish meal, 5 g/L Bacto-peptone, 2 g/L CaCO₃, and 2 g/L (NH₄)₂SO₄, pH 7.0. Fermentation broth was prepared by incubating the production culture at 28° C. in a rotary shaker set at 250 rpm for one week.

Example 2

Isolation

500 mL ethyl acetate was added to 500 mL of fermentation broth prepared as described in Example 1 above. The mixture was agitated for 30 minutes on an orbital shaker at 200 rpm to create an emulsion. The phases were separated by centrifugation and decantation. Between 4 and 5 g of anhydrous MgSO₄was added to the organic phase, which was then filtered and the solvents removed in vacuo.
An ethyl acetate extract from 2 L fermentation was mixed with HP-20 resin (100 mL; Mitsubishi Casei Corp., Tokyo, Japan) in water (300 mL). Ethyl acetate was removed in vacuo, the resin was filtered on a Büchner funnel and the filtrate was discarded. The adsorbed HP-20 resin was then washed successively with 2×125 mL of 50% acetonitrile in water, 2×125 mL of 75% acetonitrile in water and 2×125 mL of acetonitrile.
Fractions containing ECO-04601 were evaporated to dryness and 100 mg was digested in the 5 mL of the upper phase of a mixture prepared from chloroform, cyclohexane, methanol, and water in the ratios, by volume, of 5:2:10:5. The sample was subjected to centrifugal partition chromatography using a High Speed Countercurrent (HSCC) system (Kromaton Technologies, Angers, France) fitted with a 200 mL cartridge and prepacked with the upper phase of this two-phase system. The HSCC was run with the lower phase mobile and ECO-04601 was eluted at approximately one-half column volume. Fractions were collected and ECO-04601 was detected by TLC of aliquots of the fractions on commercial Kieselgel 60F₂₅₄plates. Compound could be visualized by inspection of dried plates under UV light or by spraying the plates with a spray containing vanillin (0.75%) and concentrated sulfuric acid (1.5%, v/v) in ethanol and subsequently heating the plate. Fractions contained substantially pure ECO-04601, although highly colored. A buff-colored sample could be obtained by chromatography on HPLC as follows.
6 mg of sample was dissolved in acetonitrile and injected onto a preparative HPLC column (XTerra ODS (10 μm), 19×150 mm, Waters Co., Milford, Mass.), with a 9 mL/min flow rate and UV peak detection at 300 nm. The column was eluted with acetonitrile/buffer (20 mM of NH₄HCO₃) according to the following gradient shown in Table 3.

TABLE 3

Time (min)	Water (%)	Acetonitrile (%)

0	70	30
10	5	95
15	5	95
20	70	30

Fractions containing ECO-04061 eluted at approximately 11:0 min and were combined, concentrated and lyophilized to give a yield of 3.8 mg compound.

Example 3

Elucidation of the Structure of ECO-04601

The structure of ECO-04601 above was derived from spectroscopic data, including mass, UV, and NMR spectroscopy. Mass was determined by electrospray mass spectrometry to be 462.6, UVmax 230 nm with a shoulder at 290 nm. NMR data were collected dissolved in MeOH-d4 including proton, and multidimensional pulse sequences. Proton and carbon NMR data are detailed in Table 4 below.

TABLE 4

¹H and ¹³C NMR (δ_H, ppm) of ECO-04601 in MeOH-D₄

	Assignment	¹H	¹³C	Group

1	7.15	122.3	CH
2	6.74	121.0	CH
3	6.83	116.9	CH
4	—	146.0	C—OH
4a	—	142.0	C
5a	—	126.0	C
6	—	148.2	C—OH
7	6.20	100.0	CH
8	—	153.0	C—OH
9	6.25	101.0	CH
9a	—	135.0	C
11	—	170.0	C(O)
11a	—	125.0	C
1′	4.52	48.7	CH ₂
2′	5.35	121.1	CH
3′	—	138.5	C
4′	2.03	39.5	CH ₂
5′	2.08	26.7	CH ₂
6′	5.09	124.1	CH
7′	—	135.0	C
8′	1.95	39.6	CH ₂
9′	2.02	26.3	CH ₂
10′	5.06	124.4	CH
11′	—	130.9	C
12′	1.64	24.8	CH ₃
1″	1.72	15.5	CH ₃
2″	1.59	14.9	CH ₃
3″	1.55	16.5	CH₃

A number of cross peaks in the 2D spectra of ECO-04601 are key in the structural determination. For example, the farnesyl chain is placed on the amide nitrogen by a strong cross peak between the proton signal of the terminal methylene of that chain at 4.52 ppm and the amide carbonyl carbon at 170 ppm in the gHMBC experiment. This conclusion is confirmed by a cross peak in the NOESY spectrum between the same methylene signals at 4.52 ppm and the aromatic proton signal at 6.25 ppm from one of the two protons of the tetra substituted benzenoid ring.
Based on the mass, UV and NMR spectroscopy data, the structure of the compound was determined to be the structure of ECO-04601.

Example 4

In Vivo Efficacy in a Glioma Model

The aim of this study was to test whether ECO-04601 when administered by the i.p. route prevents or delays tumor growth in C6 glioblastoma cell-bearing mice, and to determine an effective dosage regimen.
Animals: A total of 60 six-week-old female mice (Mus musculus nude mice), ranging between 18 to 25 g in weight, were observed for 7 days before treatment. Animal experiments were performed according to ethical guidelines of animal experimentation (Charté du comite d'éthique du CNRS, juillet 2003) and the English guidelines for the welfare of animals in experimental neoplasia (WORKMAN, P., TWENTYMAN, P., BALKWILL, F., et al. (1998). United Kingdom Coordinating Committee on Cancer Research (UKCCCR) Guidelines for the welfare of animals in experimental neoplasia (Second Edition, July 1997; British Journal of Cancer, 77:1-10). Any dead or apparently sick mice were promptly removed and replaced with healthy mice. Sick mice were euthanized upon removal from the cage. Animals were maintained in rooms under controlled conditions of temperature (23+2° C.), humidity (45±5%), photoperiodicity (12 hrs light/12 hrs dark) and air exchange. Animals were housed in polycarbonate cages (5/single cage) that were equipped to provide food and water. Animal bedding consisted of sterile wood shavings that were replaced every other day. Food was provided ad libitum, being placed in the metal lid on the top of the cage. Autoclaved tap water was provided ad libitum. Water bottles were equipped with rubber stoppers and sipper tubes. Water bottles were cleaned, sterilized and replaced once a week. Two different numbers engraved on two earrings identified the animals. Each cage was labeled with a specific code.
Tumor Cell Line: The C6 cell line was cloned from a rat glial tumor induced by N-nitrosomethyurea (NMU) according to Premont et al. (Premont J, Benda P, Jard S., [3H] norepinephrine binding by rat glial cells in culture. Lack of correlation between binding and adenylate cyclase activation. Biochim Biophys Acta. 1975 Feb. 13; 381(2):368-76.) after series of alternate culture and animal passages. Cells were grown as adherent monolayers at 37° C. in a humidified atmosphere (5% CO₂, 95% air). The culture medium was DMEM supplemented with 2 mM L-glutamine and 10% fetal bovine serum. For experimental use, tumor cells were detached from the culture flask by a 10 min treatment with trypsin-versen. The cells were counted in a hemocytometer and their viability assessed by 0.25% trypan blue exclusion.
Preparation of the Test Article: for the Test Article, the Following Procedure was followed for reconstitution (performed immediately preceding injection). The vehicle consisted of a mixture of benzyl alcohol (1.5%), ethanol (8.5%), propylene glycol (27%), PEG 400 (27%), dimethylacetamide (6%) and water (30%). The vehicle solution was first vortexed in order to obtain a homogeneous liquid. 0.6 mL of the vortexed vehicle solution was added to each vial containing the test article (ECO-04601). Vials were mixed thoroughly by vortexing for 1 minute and inverted and shaken vigorously. Vials were mixed again prior to injection into each animal.
Animal Inoculation with tumor cells: Experiment started at day 0 (D₀). On D₀, mice received a superficial intramuscular injection of C6 tumor cells (5×10⁵cells) in 0.1 mL of DMEM complete medium into the upper right posterior leg.
Treatment Regimen and Results:
First series of experiments: In a first series of experiments, treatment started 24 hrs following inoculation of C6 cells. On the day of the treatment, each mouse was slowly injected with 100 μL of test or control articles by the i.p. route. For all groups, treatment was performed until the tumor volume of the saline-treated mice (group 1) reached approximately 3 cm³(around day 16). Mice of group 1 were treated daily with a saline isosmotic solution for 16 days. Mice of group 2 were treated daily with the vehicle solution for 16 days. Mice of group 3 were treated daily with 10 mg/kg of ECO-04601 for 16 days. Mice of group 4 were treated every two days with 30 mg/kg of ECO-04601 and received 8 treatments. Mice of group 5 were treated every three days with 30 mg/kg of ECO-04601 and received 6 treatments. Measurement of tumor volume started as soon as tumors became palpable (>100 mm³; day 11 post-inoculation) and was evaluated every second day until the end of the treatment using callipers. As shown in Table 5 and FIG. 1, the mean value of the tumor volume of all ECO-4061-treated groups (6 mice/group) was significantly reduced as demonstrated by the one-way analysis of variance (Anova) test followed by the non-parametric Dunnett's multiple comparison test comparing treated groups to the saline group. An asterisk in the P value column of Table 5 indicates a statistically significant value, while “ns” signifies not significant.

TABLE 5

ECO-04601 in vivo antitumor efficacy against C6 glioblastoma

		Tumor volume
	Treatment	(mm³)		P
Treatment	regimen	(mean ± SEM)	% Inhibition	value

Saline	Q1 × 16	3,004.1 ± 249.64	—	—
Vehicle	Q1 × 16	2,162.0 ± 350.0	28.0%	>0.05 ns
solution
ECO-04601	Q1 × 16	1,220.4 ± 283.46	59.4%	<0.01 *
(10 mg/kg)
ECO-04601	Q2 × 8	1,236.9 ± 233.99	58.8%	<0.01 *
(30 mg/kg)
ECO-04601	Q3 × 6	1,184.1 ± 221.45	60.6%	<0.01 *
(30 mg/kg)

Second series experiments: In a second series of experiments, treatment started at day 10 following inoculation of C6 cells when tumors became palpable (around 100 to 200 mm³). Treatment was repeated daily for 5 consecutive days. On the day of the treatment, each mouse was slowly injected with 100 μL of ECO-04601 by i.p. route. Mice of group 1 were treated daily with saline isosmotic solution. Mice of group 2 were treated daily with the vehicle solution. Mice of group 3 were treated daily with 20 mg/kg of ECO-04601. Mice of group 4 were treated daily with 30 mg/kg of ECO-04601. Mice were treated until the tumor volume of the saline-treated control mice (group 1) reached around 4 cm³. Tumor volume was measured every second day until the end of the treatment using callipers. As shown in Table 6 and FIG. 2, the mean value of the tumor volume of all ECO-04601-treated groups (6 mice/group) was significantly reduced as demonstrated by the one-way analysis of variance (Anova) test followed by the non-parametric Dunnett's multiple comparison test comparing treated groups to the saline group. An asterisk in the P value column of Table 6 indicates a statistically significant value, while “ns” signifies not significant.
Histological analysis of tumor sections showed pronounced morphological changes between tumors from ECO-04601-treated mice and those from mice in the control groups. In tumors from mice treated with ECO-04601 (20-30 mg/kg), cell density was decreased and the nuclei of remaining tumor cells appeared larger and pycnotic while no such changes were observed for tumors from vehicle-treated mice (FIG. 3).

TABLE 6

ECO-04601 in vivo antitumor efficacy against C6 glioblastoma

		Tumor volume
	Treatment	(mm³)		P
Treatment	regimen	(mean ± SEM)	% Inhibition	value

Saline	Q1 × 5	4,363.1 ± 614.31	—	—
Vehicle	Q1 × 5	3,205.0 ± 632.37	26.5%	>0.05 ns
solution
ECO-04601	Q1 × 5	1,721.5 ± 374.79	60.5%	<0.01 *
(20 mg/kg)
ECO-04601	Q1 × 5	1,131.6 ± 525.21	74.1%	<0.01 *
(30 mg/kg)

Example 5

Genes and Proteins for the Production of Farnesyl Dibenzodiazepinones

Micromonospora sp. strain 046-ECO11 is a representative microorganism useful in the production of the compound of the invention. Strain 046-ECO11 has been deposited with the International Depositary Authority of Canada (IDAC), Bureau of Microbiology, Health Canada, 1015 Arlington Street, Winnipeg, Manitoba, Canada R3E 3R2 on Mar. 7, 2003 and was assigned IDAC accession no. 070303-01. The biosynthetic locus for the production of ECO-04601 was identified in the genome of Micromonospora sp. strain 046-ECO11 using the genome scanning method described in U.S. Ser. No. 10/232,370, CA 2, 352, 451 and Zazopoulos et. al., Nature Biotechnol., 21, 187-190 (2003).
The biosynthetic locus spans approximately 52,400 base pairs of DNA and encodes 43 proteins. More than 10 kilobases of DNA sequence were analyzed on each side of the locus and these regions were deemed to contain primary genes or genes unrelated to the synthesis of ECO-04601. As illustrated in FIG. 4, the locus is contained within three sequences of contiguous base pairs, namely Contig 1 having the 36,602 contiguous base pairs of SEQ ID NO: 1 and comprising ORFs 1 to 31 (SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61 and 63), Contig 2 having the 5,960 contiguous base pairs of SEQ ID NO: 64 and comprising ORFs 32 to 35 (SEQ ID NOS: 66, 68, 70 and 72), and Contig 3 having the 9,762 base pairs of SEQ ID NO: 73 and comprising ORFs 36 to 43 (SEQ ID NOS: 75, 77, 79, 81, 83, 85, 87 and 89). The order, relative position and orientation of the 43 open reading frames representing the proteins of the biosynthetic locus are illustrated schematically in FIG. 4. The top line in FIG. 4 provides a scale in base pairs. The gray bars depict the three DNA contigs (SEQ ID NOS: 1, 64 and 73) that cover the locus. The empty arrows represent the 43 open reading frames of this biosynthetic locus. The black arrows represent the two deposited cosmid clones covering the locus.
The biosynthetic locus will be further understood with reference to the sequence listing which provides contiguous nucleotide sequences and deduced amino acid sequences of the locus from Micromonospora sp. strain 046-ECO11. The contiguous nucleotide sequences are arranged such that, as found within the biosynthetic locus, Contig 1 (SEQ ID NO: 1) is adjacent to the 5′ end of Contig 2 (SEQ ID NO: 64), which in turn is adjacent to Contig 3 (SEQ ID NO: 73). The ORFs illustrated in FIG. 4 and provided in the sequence listing represent open reading frames deduced from the nucleotide sequences of Contigs 1, 2 and 3 (SEQ ID NOS: 1, 64 and 73). Referring to the Sequence Listing, ORF 1 (SEQ ID NO: 3) is the polynucleotide drawn from residues 2139 to 424 of SEQ ID NO: 1, and SEQ ID NO: 2 represents that polypeptide deduced from SEQ ID NO: 3. ORF 2 (SEQ ID NO: 5) is the polynucleotide drawn from residues 2890 to 4959 of SEQ ID NO: 1, and SEQ ID NO: 4 represents the polypeptide deduced from SEQ ID NO: 5. ORF 3 (SEQ ID NO: 7) is the polynucleotide drawn from residues 7701 to 5014 of SEQ ID NO: 1, and SEQ ID NO: 6 represents the polypeptide deduced from SEQ ID NO: 7. ORF 4 (SEQ ID NO: 9) is the polynucleotide drawn from residues 8104 to 9192 of SEQ ID NO: 1, and SEQ ID NO: 8 represents the polypeptide deduced from SEQ ID NO: 9. ORF 5 (SEQ ID NO: 11) is the polynucleotide drawn from residues 9192 to 10256 of SEQ ID NO: 1, and SEQ ID NO: 10 represents the polypeptide deduced from SEQ ID NO: 11. ORF 6 (SEQ ID NO: 13) is the polynucleotide drawn from residues 10246 to 11286 of SEQ ID NO: 1, and SEQ ID NO: 12 represents the polypeptide deduced from SEQ ID NO: 13. ORF 7 (SEQ ID NO: 15) is the polynucleotide drawn from residues 11283 to 12392 of SEQ ID NO: 1, and SEQ ID NO: 14 represents the polypeptide deduced from SEQ ID NO: 15. ORF 8 (SEQ ID NO: 17) is the polynucleotide drawn from residues 12389 to 13471 of SEQ ID NO: 1, and SEQ ID NO: 16 represents the polypeptide deduced from SEQ ID NO: 17. ORF 9 (SEQ ID NO: 19) is the polynucleotide drawn from residues 13468 to 14523 of SEQ ID NO: 1, and SEQ ID NO: 18 represents the polypeptide deduced from SEQ ID NO: 19. ORF 10 (SEQ ID NO: 21) is the polynucleotide drawn from residues 14526 to 15701 of SEQ ID NO: 1, and SEQ ID NO: 20 represents the polypeptide deduced from SEQ ID NO: 21. ORF 11 (SEQ ID NO: 23) is the polynucleotide drawn from residues 15770 to 16642 of SEQ ID NO: 1, and SEQ ID NO: 22 represents the polypeptide deduced from SEQ ID NO: 23. ORF 12 (SEQ ID NO: 25) is the polynucleotide drawn from residues 16756 to 17868 of SEQ ID NO: 1, and SEQ ID NO: 24 represents the polypeptide deduced from SEQ ID NO: 25. ORF 13 (SEQ ID NO: 27) is the polynucleotide drawn from residues 17865 to 18527 of SEQ ID NO: 1, and SEQ ID NO: 26 represents the polypeptide deduced from SEQ ID NO: 27. ORF 14 (SEQ ID NO: 29) is the polynucleotide drawn from residues 18724 to 19119 of SEQ ID NO: 1, and SEQ ID NO: 28 represents the polypeptide deduced from SEQ ID NO: 29. ORF 15 (SEQ ID NO: 31) is the polynucleotide drawn from residues 19175 to 19639 of SEQ ID NO: 1, and SEQ ID NO: 30 represents the polypeptide deduced from SEQ ID NO: 31. ORF 16 (SEQ ID NO: 33) is the polynucleotide drawn from residues 19636 to 21621 of SEQ ID NO: 1, and SEQ ID NO: 32 represents the polypeptide deduced from SEQ ID NO: 33. ORF 17 (SEQ ID NO: 35) is the polynucleotide drawn from residues 21632 to 22021 of SEQ ID NO: 1, and SEQ ID NO: 34 represents the polypeptide deduced from SEQ ID NO: 35. ORF 18 (SEQ ID NO: 37) is the polynucleotide drawn from residues 22658 to 22122 of SEQ ID NO: 1, and SEQ ID NO: 36 represents the polypeptide deduced from SEQ ID NO: 37. ORF 19 (SEQ ID NO: 39) is the polynucleotide drawn from residues 24665 to 22680 of SEQ ID NO: 1, and SEQ ID NO: 38 represents the polypeptide deduced from SEQ ID NO: 39. ORF 20 (SEQ ID NO: 41) is the polynucleotide drawn from residues 24880 to 26163 of SEQ ID NO: 1, and SEQ ID NO: 40 represents the polypeptide deduced from SEQ ID NO: 41. ORF 21 (SEQ ID NO: 43) is the polynucleotide drawn from residues 26179 to 27003 of SEQ ID NO: 1, and SEQ ID NO: 42 represents the polypeptide deduced from SEQ ID NO: 43. ORF 22 (SEQ ID NO: 45) is the polynucleotide drawn from residues 27035 to 28138 of SEQ ID NO: 1, and SEQ ID NO: 44 represents the polypeptide deduced from SEQ ID NO: 45. ORF 23 (SEQ ID NO: 47) is the polynucleotide drawn from residues 28164 to 28925 of SEQ ID NO: 1, and SEQ ID NO: 46 represents the polypeptide deduced from SEQ ID NO: 47. ORF 24 (SEQ ID NO: 49) is the polynucleotide drawn from residues 28922 to 30238 of SEQ ID NO: 1, and SEQ ID NO: 48 represents the polypeptide deduced from SEQ ID NO: 49. ORF 25 (SEQ ID NO: 51) is the polynucleotide drawn from residues 30249 to 31439 of SEQ ID NO: 1, and SEQ ID NO: 50 represents the polypeptide deduced from SEQ ID NO: 51. ORF 26 (SEQ ID NO: 53) is the polynucleotide drawn from residues 31439 to 32224 of SEQ ID NO: 1, and SEQ ID NO: 52 represents the polypeptide deduced from SEQ ID NO: 53. ORF 27 (SEQ ID NO: 55) is the polynucleotide drawn from residues 32257 to 32931 of SEQ ID NO: 1, and SEQ ID NO: 54 represents the polypeptide deduced from SEQ ID NO: 55. ORF 28 (SEQ ID NO: 57) is the polynucleotide drawn from residues 32943 to 33644 of SEQ ID NO: 1, and SEQ ID NO: 56 represents the polypeptide deduced from SEQ ID NO: 57. ORF 29 (SEQ ID NO: 59) is the polynucleotide drawn from residues 34377 to 33637 of SEQ ID NO: 1, and SEQ ID NO: 58 represents the polypeptide deduced from SEQ ID NO: 59. ORF 30 (SEQ ID NO: 61) is the polynucleotide drawn from residues 34572 to 34907 of SEQ ID NO: 1, and SEQ ID NO: 60 represents the polypeptide deduced from SEQ ID NO: 61. ORF 31 (SEQ ID NO: 63) is the polynucleotide drawn from residues 34904 to 36583 of SEQ ID NO: 1, and SEQ ID NO: 62 represents the polypeptide deduced from SEQ ID NO: 63. ORF 32 (SEQ ID NO: 66) is the polynucleotide drawn from residues 23 to 1621 of SEQ ID NO: 64, and SEQ ID NO: 65 represents the polypeptide deduced from SEQ ID NO: 66. ORF 33 (SEQ ID NO: 68) is the polynucleotide drawn from residues 1702 to 2973 of SEQ ID NO: 64, and SEQ ID NO: 67 represents the polypeptide deduced from SEQ ID NO: 68. ORF 34 (SEQ ID NO: 70) is the polynucleotide drawn from residues 3248 to 4270 of SEQ ID NO: 64, and SEQ ID NO: 69 represents the polypeptide deduced from SEQ ID NO: 70. ORF 35 (SEQ ID NO: 72) is the polynucleotide drawn from residues 4452 to 5933 of SEQ ID NO: 64, and SEQ ID NO: 71 represents the polypeptide deduced from SEQ ID NO: 72. ORF 36 (SEQ ID NO: 75) is the polynucleotide drawn from residues 30 to 398 of SEQ ID NO: 73, and SEQ ID NO: 74 represents the polypeptide deduced from SEQ ID NO: 75. ORF 37 (SEQ ID NO: 77) is the polynucleotide drawn from residues 395 to 1372 of SEQ ID NO: 73, and SEQ ID NO: 76 represents the polypeptide deduced from SEQ ID NO: 77. ORF 38 (SEQ ID NO: 79) is the polynucleotide drawn from residues 3388 to 1397 of SEQ ID NO: 73, and SEQ ID NO: 78 represents the polypeptide deduced from SEQ ID NO: 79. ORF 39 (SEQ ID NO: 81) is the polynucleotide drawn from residues 3565 to 5286 of SEQ ID NO: 73, and SEQ ID NO: 80 represents the polypeptide deduced from SEQ ID NO: 81. ORF 40 (SEQ ID NO: 83) is the polynucleotide drawn from residues 5283 to 7073 of SEQ ID NO: 73, and SEQ ID NO: 82 represents the polypeptide deduced from SEQ ID NO: 83. ORF 41 (SEQ ID NO: 85) is the polynucleotide drawn from residues 7108 to 8631 of SEQ ID NO: 73, and SEQ ID NO: 84 represents the polypeptide deduced from SEQ ID NO: 85. ORF 42 (SEQ ID NO: 87) is the polynucleotide drawn from residues 9371 to 8673 of SEQ ID NO: 73, and SEQ ID NO: 86 represents the polypeptide deduced from SEQ ID NO: 87. ORF 43 (SEQ ID NO: 89) is the polynucleotide drawn from residues 9762 to 9364 of SEQ ID NO: 73, and SEQ ID NO: 88 represents the polypeptide deduced from SEQ ID NO: 89.
Some open reading frames provided in the Sequence Listing, namely ORF 2 (SEQ ID NO: 5), ORF 5 (SEQ ID NO: 11), ORF 12 (SEQ ID NO: 25), ORF 13 (SEQ ID NO: 27), ORF 15 (SEQ ID NO: 31), ORF 17 (SEQ ID NO: 35), ORF 19 (SEQ ID NO: 39), ORF 20 (SEQ ID NO: 41), ORF 22 (SEQ ID NO: 45), ORF 24 (SEQ ID NO: 49), ORF 26 (SEQ ID NO: 53) and ORF 27 (SEQ ID NO: 55) initiate with non-standard initiation codons (eg. GTG—Valine, or CTG—Leucine) rather than standard initiation codon ATG methionine. All ORFs are listed with the appropriate M, V or L amino acids at the amino-terminal position to indicate the specificity of the first codon of the ORF. It is expected, however, that in all cases the biosynthesized protein will contain a methionine residue, and more specifically a formylmethionine residue, at the amino terminal position, in keeping with the widely accepted principle that protein synthesis in bacteria initiate with methionine (formylmethionine) even when the encoding gene specifies a non-standard initiation codon (e.g. Stryer BioChemistry 3^rdedition, 1998, W.H. Freeman and Co., New York, pp. 752-754).
ORF 32 (SEQ ID NO: 65) is incomplete and contains a truncation of 10 to 20 amino acids from its carboxy terminus. This is due to incomplete sequence information between Contigs 2 and 3 (SEQ ID NOS: 64 and 73, respectively).
Deposits of E. coli DH10B vectors, each harbouring a cosmid clone (designated in FIG. 4 as 046KM and 046KQ respectively) of a partial biosynthetic locus for the farnesyl dibenzodiazepinone from Micromonospora sp. strain 046-ECO11 and together spanning the full biosynthetic locus for production of ECO-04601 have been deposited with the International Depositary Authority of Canada, Bureau of Microbiology, Health Canada, 1015 Arlington Street, Winnipeg, Manitoba, Canada R3E 3R2 on Feb. 25, 2003. The cosmid clone designated 046KM was assigned deposit accession numbers IDAC 250203-06, and the cosmid clone designated 046KQ was assigned deposit accession numbers IDAC 250203-07. Cosmid 046KM covers residue 1 to residue 32,250 of Contig 1 (SEQ ID NO: 1). Cosmid 046KQ covers residue 21,700 of Contig 1 (SEQ ID NO: 1) to residue 9,762 of Contig 3 (SEQ ID NO: 73). The sequence of the polynucleotides comprised in the deposited strains, as well as the amino acid sequence of any polypeptide encoded thereby are controlling in the event of any conflict with any description of sequences herein.
The deposit of the deposited strains has been made under the terms of the Budapest Treaty on the International Recognition of the Deposit of Micro-organisms for Purposes of Patent Procedure. The deposited strains will be irrevocably and without restriction or condition released to the public upon the issuance of a patent. The deposited strains are provided merely as convenience to those skilled in the art and are not an admission that a deposit is required for enablement, such as that required under 35 U.S.C. §112. A license may be required to make, use or sell the deposited strains, and compounds derived therefrom, and no such license is hereby granted.
In order to identify the function of the proteins coded by the genes forming the biosynthetic locus for the production of ECO-04601 the gene products of ORFs 1 to 43, namely SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 71, 74, 76, 78, 80, 82, 84, 86 and 88 were compared, using the BLASTP version 2.2.10 algorithm with the default parameters, to sequences in the National Center for Biotechnology Information (NCBI) nonredundant protein database and the DECIPHER® database of microbial genes, pathways and natural products (Ecopia BioSciences Inc. St.-Laurent, QC, Canada).
The accession numbers of the top GenBank™ hits of this BLAST analysis are presented in Table 7 along with the corresponding E values. The E value relates the expected number of chance alignments with an alignment score at least equal to the observed alignment score. An E value of 0.00 indicates a perfect homolog. The E values are calculated as described in Altschul et al. J. Mol. Biol., 215, 403-410 (1990). The E value assists in the determination of whether two sequences display sufficient similarity to justify an inference of homology.

TABLE 7

Sequence comparison and ORF correlation

	SEQ			GenBank		% Identity
ORF	ID	Family	# aa	homology	Probability	(% Similarity)	Proposed function of GenBank match

1	2	ABCC	571	NP_736627.1	1E−107	45% (56%)	ABC transporter Corynebacterium efficiens
				590aa
				NP_600638.1	5E−80	37% (52%)	ABC transporter Corynebacterium efficiens
				510aa
				NP_600638.1	3E−12	30% (43%)	ABC transporter Corynebacterium efficiens
				510aa
2	4	RECH	689	CAC93719.1	3E−17	36% (55%)	regulator[Lechevalieria aerocolonigenes]
				923aa
				BAC55205.1	3E−12	30% (48%)	transcriptional activator [Streptomyces sp.
				943aa
				NP_631154.1	3E−07	46% (63%)	regulator. [Streptomyces coelicolor A3(2)
				932aa
3	6	REGD	895	CAC93719.1	3E−20	28% (43%)	regulator [Lechevalieria aerocolonigenes]
				923aa
				BAC55205.1	1E−15	29% (36%)	activator [Streptomyces sp. TP-A0274]
				943aa
				NP_733725.1	3E−12	28% (41%)	regulator [Streptomyces coelicolor A3(2)]
				908aa
4	8	IDSA	362	NP_601376.2	2E−80	49% (65%)	GGPP synthase [Corynebacterium glutamicum
				371aa
				NP_738677.1	3E−79	48% (62%)	polyprenyl synthase, Corynebacterium efficiens
				366aa
				NP_216689.1	2E−78	46% (61%)	idsA2 [Mycobacterium tuberculosis H37Rv]
				352aa
5	10	MVKA	354	BAB07790.1	2E−71	46% (59%)	mevalonate kinase [Streptomyces sp. CL190]
				345aa
				BAB07817.1	5E−66	45% (57%)	mevalonate kinase [Kitasatospora griseola]
				334aa
				NP_720650.1	3E−36	29% (48%)	mevalonate kinase [Streptococcus mutans
				332aa
6	12	DMDA	346	BAB07791.1	2E−88	58% (65%)	diphosphomevalonate decarboxylase
				350aa			[Streptomyces sp.
				BAB07818.1	2E−69	53% (61%)	mevalonate diPH decaroboxylase
				300aa			[Kitasatospora griseola]
				NP_785307.1	3E−44	34% (46%)	diphosphomevalonate decarboxylase
				325aa			[ Lactobacillus plantarum
7	14	MVKP	369	BAB07792.1	4E−93	50% (60%)	phosphomevalonate kinase [Streptomyces
				374aa			sp. CL190]
				BAB07819.1	6E−77	48% (56%)	phosphomevalonate kinase [Kitasatospora
				360aa			griseola]
				AAG02442.1	2E−31	29% (42%)	3 phosphomevalonate kinase [Enterococcus
				368aa			faecalis]
8	16	IPPI	360	Q9KWF6	1E−128	66% (74%)	Isopentenyl-diphosphate delta-isomerase
				364aa
				Q9KWG2	1E−128	66% (77%)	Isopentenyl-diphosphate delta-isomerase
				363aa
				NP_814639.1	5E−73	44% (61%)	isopentenyl diphosphate isomerase
				347aa			[ Enterococcus faecalis
9	18	HMGA	351	BAA70975.1	1E−165	82% (91%)	3-hydroxy-3-methylglutaryl coenzyme A
				353aa			reductase [Streptomyces sp.]
				BAA74565.1	1E−160	81% (89%)	3-hydroxy-3-methylglutaryl coenzyme A
				353aa			reductase [Kitasatospora griseola]
				BAA74566.1	1E−155	80% (86%)	3-hydroxy-3-methylglutaryl coenzyme A
				353aa			reductase [Streptomyces sp.]
10	20	KASH	391	BAB07795.1	1E−148	67% (78%)	3-hydroxy-3-methylglutaryl CoA synthase
				389aa			[Streptomyces sp. CL190]
				BAB07822.1	1E−136	70% (78%)	HMG-CoA synthase [Kitasatospora griseola]
				346aa
				CAD24420.1	6E−79	43% (54%)	HMG-CoA synthase [Paracoccus
				388aa			zeaxanthinifaciens]
11	22	IPTN	290	NP_631248.1	5E−22	28% (44%)	hypothetical protein [Streptomyces
				295aa			coelicolor A3(2)]
				AAN65239.1	5E−06	25% (40%)	cloQ [Streptomyces roseochromogenes
				324aa			subsp. oscitans]
12	24	SPKG	370	AAM78435.1	5E−48	54% (63%)	two-component sensor [Streptomyces
				344aa			coelicolor A3(2)]
				NP_630507.1	5E−48	54% (63%)	sensor kinase [Streptomyces coelicolor
				382aa			A3(2)]
				ZP_00058991.1	9E−34	44% (58%)	Signal transduction histidine kinase
				407aa			[Thermobifida fusca]
13	26	RREB	220	NP_630508.1	3E−79	67% (81%)	regulatory protein [Streptomyces coelicolor
				224aa			A3(2)]
				ZP_00058992.1	4E−67	59% (75%)	Response regulator [Thermobifida fusca]
				221aa
				NP_625364.1	6E−66	60% (74%)	response regulator [Streptomyces
				221aa			coelicolor A3(2)]
14	28	UNES	131	No hit	—	—	—
15	30	UNEZ	154	NP_649459.2	7.6E−02	38% (60%)	CG1090-PB [Drosophila melanogaster]
				628aa
				NP_730819.1	7.6E−02	38% (60%)	CG1090-PA [Drosophila melanogaster]
				473aa
				AAM11079.1	7.6E−02	38% (60%)	GH23040p [Drosophila melanogaster]
				428aa
16	32	OXDS	661	NP_242948.1	1E−52	30% (46%)	unknown conserved protein [Bacillus
				500aa			halodurans]
				ZP_00091617.1	3E−32	29% (41%)	Putative multicopper oxidases [Azotobacter
				480aa			vinelandii]
				NP_252457.1	1E−31	28% (42%)	metallo-oxidoreductase [Pseudomonas
				463aa			aeruginosa PA01]
17	34	UNFD	129	NP_437360.1	7E−33	60% (72%)	bleomycin resistance protein family
				127aa			[Sinorhizobium meliloti]
				AAO91879.1	1E−31	58% (74%)	unknown [uncultured bacterium]
				123aa
				NP_103287.1	1E−23	48% (62%)	unknown protein [Mesorhizobium loti]
				131aa
18	36	UNFA	178
19	38	CSMB	661	ZP_00137697.1	1E−166	51% (66%)	Anthranilate/para-aminobenzoate synthase
				769aa			[Pseudomonas aeruginosa
				NP_250594.1	1E−166	51% (66%)	phenazine biosynthesis protein PhzE
				627aa			[Pseudomonas aeruginosa PA01]
				ZP_00137701.1	1E−166	51% (66%)	Anthranilate/para-aminobenzoate synthas
				687aa			[Pseudomonas aeruginosa
20	40	AAKD	427	P41403	1E−64	38% (51%)	Aspartokinase (Aspartate kinase)
				421aa
				ZP_00057166.1	2E−64	37% (52%)	Aspartokinases [Thermobifida fusca]
				445aa
				AAD49567.1	6E−64	37% (52%)	aspartokinase subunit A [Amycolatopsis
				421aa			mediterranei]
21	42	ALDB	274	NP_275722.1	2E−53	45% (64%)	conserved protein [Methanothermobacter
				266aa			thermautotrophicus]
				NP_614692.1	2E−52	43% (61%)	Fructose-1,6-bisphosphate aldolase
				270aa			[Methanopyrus kandleri AV19]
				NP_615406.1	2E−50	43% (61%)	fructose-bisphosphate aldolase
				267aa			[Methanosarcina acetivorans str. C2A]
22	44	UNFC	367	NP_275723.1	4E−46	38% (56%)	conserved protein [Methanothermobacter
				378aa			thermautotrophicus]
				NP_614691.1	2E−45	39% (55%)	alternative 3-dehydroquinate synthase
				402aa			[Methanopyrus kandleri
				NP_248244.1	2E−43	40% (59%)	conserved hypothetical protein
				361aa			[Methanococcus jannaschii
23	46	HYDK	253	NP_577771.1	4E−14	31% (49%)	metal-dependent hydrolase [Pyrococcus
				247aa			furiosus DSM 3638]
				NP_142108.1	1E−12	33% (52%)	hypothetical protein PH0093 [Pyrococcus
				247aa			horikoshii]
				NP_125791.1	1E−11	28% (50%)	hypothetical protein [Pyrococcus abyssi]
				248aa
24	48	ADSA	438	NP_070499.1	2E−41	35% (49%)	coenzyme F390 synthetase
				433aa			[Archaeoglobus fulgidus
				NP_618724.1	5E−41	34% (50%)	coenzyme F390 synthetase
				434aa			[Methanosarcina acetivorans
				NP_632700.1	7E−41	35% (50%)	Coenzyme F390 synthetase
				437aa			[Methanosarcina mazei Goe1]
25	50	HOXV	396	ZP_00027430.1	8E−76	42% (59%)	2-polyprenyl-6-methoxyphenol hydroxylase
				442aa			[Burkholderia fungorum]
				NP_627457.1	1E−71	38% (51%)	salicylate hydroxylase [Streptomyces
				420aa			coelicolor A3(2)]
				ZP_00033877.1	2E−68	37% (51%)	2-polyprenyl-6-methoxyphenol hydroxylase
				403aa			[Burkholderia fungorum]
26	52	SDRA	261	NP_391080.1	6E−58	46% (57%)	2,3-dihydro-2,3-dihydroxybenzoate
				261aa			dehydrogenase [Bacillus subtilis]
				ZP_00059512.1	1E−55	45% (56%)	Dehydrogenase [Thermobifida fusca]
				260aa
				AAG31126.1	9E−55	46% (56%)	MxcC [Stigmatella aurantiaca]
				257aa
27	54	DHBS	224	Q51790	7E−60	56% (72%)	isochorismatase
				207aa
				Q51518	1E−58	56% (71%)	isochorismatase
				207aa
				NP_391077.1	2E−58	52% (69%)	isochorismatase [Bacillus subtilis]
				312aa
28	56	SDRA	233	NP_103491.1	9E−21	32% (49%)	acyl-carrier protein reductase
				242aa			[Mesorhizobium loti]
				AAL14912.1	1E−15	28% (44%)	short-chain dehydrogenase [Rhizobium
				245aa			leguminosarum bv. trifolii]
				NP_902480.1	7E−15	29% (44%)	oxidoreductase [Chromobacterium
				235aa			violaceum
29	58	UNIQ	246	S18541	4.5E−02	29% (43%)	hypothetical protein 3 - Streptomyces
				281aa			coelicolor
				NP_629228.1	5.9E−02	29% (43%)	hypothetical protein [Streptomyces
				281aa			coelicolor A3(2)]
30	60	UNFE	111	ZP_00058149.1	1E−10	36% (48%)	membrane protein [Thermobifida fusca]
				130aa
				NP_737701.1	1E−09	33% (46%)	hypothetical protein [Corynebacterium
				120aa			efficiens
				NP_827629.1	7E−09	33% (49%)	hypothetical protein [Streptomyces
				118aa			avermitilis MA-4680]
31	62	EFFT	559	ZP_00058148.1	2E−67	32% (49%)	Predicted symporter [Thermobifida fusca]
				537aa
				NP_626090.1	4E−66	31% (49%)	transport protein [Streptomyces coelicolor
				544aa			A3(2)]
				NP_827630.1	7E−63	31% (49%)	sodium-dependent symporter [Streptomyces
				549aa			avermitilis
32	65	HOYH	532	AAM96655.1	2E−92	39% (53%)	2,4-dihydroxybenzoate monooxygenase
				544aa			[Sphingobium chlorophenolicum]
				ZP_00029353.1	1E−73	35% (49%)	2-polyprenyl-6-methoxyphenol hydroxylase
				543aa			[Burkholderia fungorum]
				NP_769326.1	5E−62	33% (48%)	blr2686 [Bradyrhizobium japonicum] dbj
				569aa
33	67	DAHP	423	T03226	1E−111	54% (68%)	hypothetical protein - Streptomyces
				391aa			hygroscopicus
				ZP_00137693.1	3E−87	45% (61%)	DAHP synthase [Pseudomonas aeruginosa
				405aa			UCBPP-PA14]
				NP_250592.1	1E−86	45% (61%)	phenazine biosynthesis protein PhzC
				405aa			[Pseudomonas aeruginosa
34	69	REGG	340	BAC53615.1	1E−67	46% (62%)	regulator protein [Streptomyces
				346aa			kasugaensis]
				S44506	3E−66	46% (60%)	regulator protein - Streptomyces
				424aa			glaucescens
				AAK81822.1	1E−65	44% (59%)	transcriptional regulator [Streptomyces
				348aa			lavendulae]
35	71	UNFJ	493	ZP_00073237.1	7E−35	27% (43%)	RTX toxins [Trichodesmium erythraeum
				678aa			IMS101]
				NP_484716.1	3E−05	23% (37%)	similar to vanadium chloroperoxidase
				433aa			[Nostoc sp.
				ZP_00067005.1	7.4E−02	27% (37%)	hypothetical protein [Microbulbifer
				667aa			degradans 2-40]
36	74	RECI	112	NP_627088.1	3E−17	48% (59%)	hypothetical protein. [Streptomyces
				125aa			coelicolor A3(2)]
				NP_846017.1	7E−15	40% (59%)	hypothetical protein [Bacillus anthracis str.
				109aa			Ames]
				NP_241272.1	9E−15	37% (58%)	unknown conserved protein [Bacillus
				174aa			halodurans]
37	76	UNIQ	325	NP_422203.1	1E−03	39% (59%)	hypothetical protein [Caulobacter
				187aa			crescentus CB15]
38	78	OXAH	663	ZP_00058724.1	0E+00	57% (67%)	Acyl-CoA dehydrogenases [Thermobifida fusca]
				659aa
				AAB97825.1	5E−93	46% (56%)	acyl-CoA oxidase [Myxococcus xanthus]
				433aa
				AAF14635.1,	5E−85	37% (52%)	1 acyl-CoA oxidase [Petroselinum crispum]
				694aa
39	80	ABCA	537	T14162	9E−62	37% (47%)	hABC transport protein - Mycobacterium
				574aa			smegmatis
				NP_624808.1	4E−60	35% (46%)	ABC transporter [Streptomyces coelicolor
							A3(2)]
				NP_822745.1	8E−32	31% (42%)	ABC transportert [Streptomyces avermitilis
							MA-4680]
40	82	ABCA	596	T14180	1E−107	40% (51%)	exiT protein - Mycobacterium smegmatis
				1122aa
				AAC82548.1	1E−107	40% (51%)	unknown [Mycobacterium smegmatis]
				589aa
				NP_624810.1	3E−97	37% (48%)	ABC-transporter [Streptomyces coelicolor
				601aa			A3(2)]
41	84	UNIQ	507	NP_831570.1	8E−07	24% (44%)	methyltransferases [Bacillus cereus
				676aa
				NP_655735.1	2E−06	23% (44%)	ubiE/COQ5 methyltransferase family
				676aa			[Bacillus anthracis
				NP_844290.1	2E−06	23% (44%)	hypothetical protein [Bacillus anthracis str.
				681aa			Ames]
42	86		232	NP_830809.1	8E−08	22% (35%)	Transporter, LysE family [Bacillus cereus]
				208aa
				NP_844737.1	2E−07	22% (35%)	homoserine/threonine efflux protein[Bacillus
				210aa			anthracis
				NP_655752.1	1E−06	22% (36%)	LysE, LysE type translocator [Bacillus
				208aa			anthracis
43	88		132	NP_827272.1	4E−09	36% (49%)	hypothetical protein [Streptomyces
				127aa			avermitilis MA-4680]
				NP_246491.1,	5E−02	22% (47%)	unknown [Pasteurella multocida]
				112aa

The ORFs encoding proteins involved in the biosynthesis of farnesyl dibenzodiazepinones are assigned a putative function and grouped together in families based on sequence similarity to known proteins. To correlate structure and function, the protein families are given a four-letter designation used throughout the description and figures as indicated in Table 8. The meaning of the four letter designations is as follows: AAKD designates an amino acid kinase; ABCA and ABCC designate ABC transporters; ADSA designates an amide synthetase; ALDB designates an aldolase function; CSMB designates a chorismate transaminase; DAHP designates a 3,4-dideoxy-4-amino-D-arabino-heptulosonic acid 7-phosphate synthase activity; DHBS designates a 2,3-dihydro-2,3-dihydroxybenzoate synthase activity; DMDA designates a diphosphomevalonate decarboxylase; EFFT designates an efflux protein; HMGA designates a 3-hydroxy-3-methylglutaryl-CoA reductase; HOXV designates a monooxygenase activity; HOYH designates a hydroxylase/decarboxylase activity; HYDK designates a hydrolase activity; IDSA designates an isopentenyl diphosphate synthase; IPPI designates an isopentenyl diphosphate isomerase; IPTN designates an isoprenyltransferase; KASH designates 3-hydroxy-3-methylglutaryl-CoA synthase; MVKA designates a mevalonate kinase; MVPK designates a phosphomevalonate kinase; OXAH designates an acylCoA oxidase; OXDS designates an oxidoreductase; RECH, RECI, REGD, REGG and RREB designate regulators; SDRA designates a dehydrogenase/ketoreductase, SPKG designates a sensory protein kinase; UNES, UNEZ, UNFA, UNFC, UNFD, UNFE, UNFJ and UNIQ designate proteins of unknown function.

	TABLE 8

	FAMILY	FUNCTION:

	AAKD	amino acid kinase; strong homology to
		primary aspartate kinases, converting L-
		aspartate to 4-phospho-L-aspartate
	ABCA	ABC transporter
	ABCC	ABC transporter
	ADSA	adenylating amide synthetase
	ALDB	aldolase; similarity to fructose-1,6-
		biphosphate aldolase that generates D-
		glyceraldehyde-3Ph, precursor of D-
		erythrose-4Ph involved in the shikimate
		pathway
	CSMB	chorismate transaminase, similarity to
		anthranilate synthase
	DAHP	DAHP synthase, class II; involved in
		formation of aminoDAHP from PEP and
		erythrose-4-phosphate
	DHBS	2,3-dihydro-2,3-dihydroxybenzoate
		synthase (isochorismatase)
	DMDA	diphosphomevalonate decarboxylase
		(mevalonate pyrophosphate decarboxylase)
	EFFT	efflux protein
	HMGA	HMG-CoA reductase; converts 3-hydroxy-3-
		methylglutaryl-CoA to mevalonate plus
		CoA in isoprenoid biosynthesis
	HOXV	FAD monooxygenase; shows homology to a
		variety of monooxygenases including
		salicylate hydroxylases, zeaxanthin
		epoxidases
	HOYH	hydroxylase/decarboxylase; FAD-
		dependent monooxygenase
	HYDK	hydrolase
	IDSA	isoprenyl diphosphate synthase, catalyzes
		the addition of 2 molecules of isopentenyl
		pyrophosphate to dimethylallyl pyrophos-
		phate to generate GGPP
	IPPI	isopentenyl diphosphate isomerase, catalyzes
		the isomerization of IPP to produce
		dimethylallyl diphosphate
	IPTN	isoprenyltransferase; catalyzes covalent N-
		terminal attachment of isoprenyl units to
		amide groups of nitrogen-containing
		heterocycle rings
	KASH	HMG-CoA synthase; condenses acetyl-CoA
		with acetoacetyl-CoA to form 3-
		hydroxy-3-methylglutaryl-CoA
	MEBI	membrane protein
	MVKA	mevalonate kinase; converts mevalonate
		to 5-phosphomevalonate in the mevalonate
		pathway of isoprenoid biosynthesis
	MVKP	phosphomevalonate kinase; converts 5-
		phosphomevalonate to 5-diphosphomevalonate
		in the mevalonate pathway of isoprenoid
		biosynyhesis
	OXAH	acyl CoA oxidase
	OXDS	oxidoreductase
	RECH	regulator
	RECI	regulator; similarity to PadR transcriptional
		regulators involved in repression of phenolic
		acid metabolism
	REGD	transcriptional regulator; relatively large
		regulators with an N-terminal ATP-binding
		domain containing Walker A and B motifs and
		a C-terminal LuxR type DNA-binding domain
	REGG	regulator
	RREB	response regulator; similar to response
		regulators that are known to bind DNA and
		act as transcriptional activators
	SDRA	dehydrogenase/ketoreductase, NAD-dependent
	SPKD	sensory protein kinase, two component system
	SPKG	sensory protein kinase, two component system
	UNES	unknown function
	UNEZ	unknown function
	UNFA	unknown function
	UNFC	unknown function
	UNFD	unknown function
	UNFE	putative membrane protein
	UNFJ	unknown function
	UNIQ	unknown function

Biosynthesis of ECO-04601 involves the action of various enzymes that synthesize the three building blocks of the compound, namely the farnesyl-diphosphate component (FIG. 5), the 3-hydroxy-anthranilate-adenylate component (FIG. 6) and the 2-amino-6-hydroxy-benzoquinone component (FIG. 7) that are subsequently condensed to form the final compound (FIG. 8).
The farnesyl-diphosphate biosynthesis involves the concerted action of seven enzymes (FIG. 5). ORF 10 (KASH) (SEQ ID NO: 20) encodes a hydroxymethylglutaryl-CoA synthase that catalyzes an aldol addition of acetyl-CoA onto acetoacyl-CoA to yield 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA). This product is subsequently reduced through the action of ORF 9 (HMGA) (SEQ ID NO: 18) to form mevalonic acid (MVA). ORF 5 (MVKA) (SEQ ID NO: 10) phosphorylates mevalonate to 5′-phosphomevalonate using ATP as the phosphate donor. The next step in the farnesyl-diphosphate biosynthesis is the phosphorylation reaction of the 5′-phosphomevalonate to 5′-pyrophosphomevalonate (DPMVA) that is catalyzed by ORF 7 (MVKP) (SEQ ID NO: 14). Subsequent decarboxylation of 5′-pyrophosphomevalonate catalyzed by ORF 6 (DMDA) (SEQ ID NO: 12) yields isopentenyl diphosphate (IPP) which is then converted to dimethylallyldiphosphate (DMADP) through the action of ORF 8 (IPPI) (SEQ ID NO: 16) that has isomerase enzymatic activity. The final step in the biosynthesis of farnesyl-diphosphate is the condensation of one molecule of dimethylallyldiphosphate with two molecules of isopentenyl diphosphate catalyzed by the isoprenyl diphosphate synthase ORF 4 (IDSA) (SEQ ID NO: 8). The described pathway involved in synthesis of farnesyl-diphosphate is entirely consistent with related mevalonate pathways described in other actinomycete species (Takagi et al., J. Bacteriol. 182, 4153-4157, (2000)).
Biosynthesis of the 3-hydroxy-anthranilate component involves the use of precursors derived from the shikimate pathway (FIG. 6). Chorismic acid is transaminated through the action of ORF 19 (CSMB) (SEQ ID NO: 38) to form aminodeoxyisochorismic acid. This enzyme resembles anthranilate synthases and is likely to catalyze specifically the transfer of the amino group using glutamine as the amino donor. The next step involves isochorismatase activity and is mediated by ORF 27 (DHBS) (SEQ ID NO: 54). This reaction consists in the removal of the pyruvate side chain from aminodeoxyisochorismic acid to form 6-amino-5-hydroxy-cyclohexa-1,3-dienecarboxylic acid. This compound is subsequently oxidized through the action of ORF 26 (SDRA) (SEQ ID NO: 52) yielding 3-hydroxy-anthranilic acid. ORF 24 (ADSA) (SEQ ID NO: 48) catalyzes the activation of 3-hydroxy-anthranilic acid through adenylation generating the 3-hydroxy-anthranilate-adenylate component (FIG. 6).
Biosynthesis of the 2-amino-6-hydroxy-benzoquinone component of the farnesyl dibenzodiazepinone, requires components derived from the aminoshikimate pathway. FIG. 7 depicts the series of enzymatic reactions involved in the biosynthesis of this constituent. ORF 21 (ALDB) (SEQ ID NO: 42) resembles aldolases involved in the generation of precursors of D-erythrose-4-phosphate which is part of the aminoshikimate pathway used for the generation of 2-amino-6-hydroxy-[1,4]-benzoquinone. ORF 33 (DAHP) (SEQ ID NO: 67) catalyzes the initial step in the aminoshikimate pathway that corresponds to the formation of 3,4-dideoxy-4-amino-D-arabino-heptulosonic acid 7-phosphate (amino DAHP) from phosphoenolpyruvate (PEP) and erythrose 4-phosphate (E-4Ph). Subsequent reactions leading to 3-amino-5-hydroxy-benzoic acid are catalyzed by enzymes provided by primary metabolism biosynthetic pathways present in Micromonospora sp. strain 046-ECO11. ORF 25 (HOXV) (SEQ ID NO: 50) hydroxylates 3-amino-5-hydroxy-benzoic acid at position 2, generating 3-amino-2,5-dihydroxy-benzoic acid. This intermediate is further modified by ORF 32 (HOYH) (SEQ ID NO: 65) that catalyzes a decarboxylative oxidation reaction yielding 6-amino-benzene-1,2,4-triol. A final oxidation reaction is performed by ORF 16 (OXDS) (SEQ ID NO: 32) yielding 2-amino-6-hydroxy-[1,4]-benzoquinone (FIG. 7).
Assembly of the three components resulting in the farnesyl dibenzodiazepinone is catalyzed by ORFs 24 and 11 (FIG. 8). ORF 24 (ADSA) (SEQ ID NO: 48) catalyzes the condensation of the adenylated 3-hydroxy-anthranilate with the 2-amino-6-hydroxy-[1,4]-benzoquinone component. A spontaneous condensation between the free amino group of the 3-hydroxy-anthranilate and one of the carbonyl groups present on the 2-amino-6-hydroxy-[1,4]-benzoquinone component occurs yielding a dibenzodiazepinone intermediate. This compound is further modified through transfer of the farnesyl group of the farnesyl-diphosphate intermediate onto the nitrogen of the amide of the dibenzodiazepinone catalyzed by ORF 11 (IPTN) (SEQ ID NO: 22) and resulting in the formation of the farnesyl dibenzodiazepinone (FIG. 8).
Additional ORFs, namely ORF 2 (RECH) (SEQ ID NO: 4), ORF 3 (REGD) (SEQ ID NO: 6), ORF 12 (SPKG) (SEQ ID NO: 24), ORF 13 (RREB) (SEQ ID NO: 26), ORF 34 (REGG) (SEQ ID NO: 69) and ORF 36 (RECI) (SEQ ID NO: 74) are involved in the regulation of the biosynthetic locus encoding the farnesyl dibenzodiazepinone. Other ORFs, namely ORF 1 (ABCC) (SEQ ID NO: 2), ORF 31 (EFFT) (SEQ ID NO: 62), ORFs 39 and 40 (ABCA) (SEQ ID NOS: 80 and 82, respectively) and ORF 42 (SEQ ID NO: 86) are involved in transport. Other ORFs involved in the biosynthesis of the farnesyl dibenzodiazepinone include ORF 20 (AAKD) (SEQ ID NO: 40), ORF 23 (HYDK) (SEQ ID NO: 46), ORF 38 (OXAH) (SEQ ID NO: 78) as well as ORFs 14, 15, 17, 18, 22, 29, 30, 35, 37, 41 and 43 (SEQ ID NOS: 28, 30, 34, 34, 44, 58, 60, 71, 76, 84 and 88, respectively) of unknown function.

Example 6

Farnesyl Dibenzodiazepinone Loci from Actinomycetes Species

A. Correlation of Loci A, B and C
Loci related to the biosynthetic locus present in Micromonospora sp. strain 046ECO-11 as described in Example 5 (referred to herein as locus A) and directing the biosynthesis of farnesyl diabenodiazepinones related to ECO-04601 were detected in the genome of two actinomycetes using the genome scanning method described in U.S. Ser. No. 10/232,370, CA 2,352,451 and Zazopoulos et. al., Nature Biotechnol., 21, 187-190 (2003).
Locus B (052E) was detected in Micromonospora echinospora challisensis NRRL 12255. The locus spans approximately 38,000 base pairs of DNA and encodes 33 proteins. Locus C (237C) was detected in Streptomyces carzinostaticus neocarzinostaticus ATCC 15944. This locus spans approximately 37,000 base pairs of DNA and encodes 33 proteins. More than 10 kilobases of DNA sequence were analyzed on each side of the two loci and these regions were deemed to contain primary genes.
In order to identify the function of the proteins coded by the genes forming the biosynthetic loci B an C the gene products of their ORFs 1 to 33, were compared, using the BLASTP version 2.2.10 algorithm with the default parameters, to sequences in the National Center for Biotechnology Information (NCBI) nonredundant protein database and the DECIPHER® database of microbial genes, pathways and natural products (Ecopia BioSciences Inc. St.-Laurent, QC, Canada).
The ORFs encoding proteins present in loci A, B, and C are assigned a putative function and grouped together in families based on sequence similarity to known proteins. To correlate structure and function, the protein families are given a four-letter designation used throughout the description and figures as indicated in Table 8 of Example 5.
Comparison of loci A, B and C clearly indicates that all three loci are related and encode similar enzymatic functions. Therefore, the compounds produced by the enzymes encoded by loci B and C are structurally closely related to ECO-04601. Table 9 correlates the protein families of loci B and C to those of locus A. All 33 ORFs found in locus B have counterparts in locus A. Similarly, all 33 ORFs present in locus C have counterpart proteins in locus A, with the exception of ORFs 30, 31, and 32 that encode a sensory protein kinase protein, a response regulator and a membrane protein. These observations suggest that the compounds produced by loci B and C encoded proteins share a high degree of similarity with ECO-04601.

TABLE 9

Loci A, B and C ORFs function and correlation

	A	B	C

ABCC

1	—	—
RECH	2	1	1
REGD	3	2	2
IDSA	4	3	3
MVKA	5	4	4
DMDA	6	5	5
MVKP	7	6	6
IPPI	8	7	7
HMGA	9	8	8
KASH	10	—	9
IPTN	11	9	10
SPKG	12	15	12
RREB	13	16	11
UNES	14	10	33
UNEZ	15	14	—
OXDS	16	13	—
UNFD	17	12	—
UNFA	18	11	—
CSMB	19	17	14
AAKD	20	18	15
ALDB	21	19	16
UNFC	22	20	17
HYDK	23	21	18
ADSA	24	22	19
HOXV	25	23	20
SDRA	26	24	21
DHBS	27	25	22
SDRA	28	26	23
UNGA	29	27	24
UNFE	30	28	25
EFFT	31	29	26
HOYH	32	30	27
DAHP	33	31	28
REGG	34	32	—
UNFJ	35	33	13/29
RECI	36	—	—
UNIQ	37	—	—
OXAH	38	—	—
ABCA	39	—	—
ABCA	40	—	—
UNIQ	41	—	—
SPKD	—	—	30
RREB	—	—	31
MEBI	—	—	32

FIG. 5 depicts the three biosynthetic loci A, B and C. All ORFs are represented by arrows and their orientation indicate the direction of the transcription of each ORF; highlighted ORFs are involved in the biosynthesis of the farnesyl unit. ORFs 4, 5, 6, 7, 8, 9, and 10 in locus A participate in the synthesis of the farnesyl unit present in the farnesyl dibenzodiazepinone. Counterparts of these ORFs are found in locus B ( ORFs 3, 4, 5, 6, 7 and 8) as well as in locus C( ORFs 3, 4, 5, 6, 7, 8 and 9). As shown in FIG. 5, proteins encoded by these ORFs participate in an orderly fashion in the biosynthesis of farnesyl-diphosphate component starting with acetoacetyl-CoA and acetyl-CoA. All enzymes necessary for the synthesis of farnesyl-diphosphate are present in all three loci with the exception of a hydroxymethylglutaryl-CoA synthase (KASH) which is absent from locus B. The product of this enzymatic reaction, 3-hydroxy-3-methylglutaryl-CoA is provided by an alternative biosynthetic pathway of the primary metabolism of the microorganism or by a hydroxymethylglutaryl-CoA synthase located elsewhere in the genome. The described pathway involved in synthesis of farnesyl-diphosphate is entirely consistent with related mevalonate pathways described in other actinomycete species (Takagi et al., J. Bacteriol. 182, 4153-4157, (2000) and FIG. 5).
FIG. 6 depicts ORFs 19, 21, 24, 26 and 27 in locus A involved in the biosynthesis of the 3-hydroxy-anthranilate component of the farnesyl dibenzodiazepinone. Counterparts of these ORFs are found in locus B (ORFs 17, 19, 22, 24 and 25) as well as in locus C( ORFs 14, 16, 19, 21 and 22). As shown in FIG. 6, proteins encoded by these ORFs participate in an orderly fashion to the biosynthesis of the 3-hydroxy-anthranilate-adenylate component starting with precursors from the pentose phosphate pathway and chorismic acid. In particular, the enzyme responsible for the adenylation of 3-hydroxy-anthranilic acid (ADSA) that corresponds to ORFs 24, 22 and 19 in loci A, B and C respectively is present in all three loci as well as the remaining enzymes that participate in the biosynthesis of 3-hydroxy-anthranilate component present in dibenzodiazepinones.
FIG. 7 highlights ORFs 16, 24, 25, 32 and 33 in locus A involved in the biosynthesis of the 2-amino-6-hydroxy-[1,4]benzoquinone component of the farnesyl dibenzodiazepinone. Counterparts of these ORFs are found in locus B ( ORFs 13, 19, 23, 30 and 31) as well as in locus C( ORFs 16, 20, 27 and 28) with the exception of ORF corresponding to the oxidoreductase (OXDS) present in loci A and B. As shown in FIG. 7, proteins encoded by these ORFs participate in an orderly fashion in the biosynthesis of the 2-amino-6-hydroxy-[1,4]benzoquinone component starting with precursors from the pentose phosphate pathway and 3,4-dideoxy-4-amino-D-arabino-heptulosonic acid 7-phosphate (amino DAHP).
FIG. 8 highlights ORFs 11 (SEQ ID NO: 22) and 24 (SEQ ID NO: 48) in locus A involved in the assembly of all three components, 3-hydroxy-anthranilate, 2-amino-6-hydroxy-[1,4]benzoquinone and farnesyl-diphosphate to form the farnesyl dibenzodiazepinone. Counterparts of these ORFs are found in locus B (ORFs 9 (SEQ ID NO: 90) and 22 (SEQ ID NO: 92)) as well as in locus C(ORFs 10 (SEQ ID NO: 94) and 19 (SEQ ID NO: 96)). The isoprenyltransferase ORF 10 of locus C (SEQ ID NO: 96) is partial and represents the N-terminal part of the protein. IPTN ORFs 11 (SEQ ID NO: 22), 9 (SEQ ID NO: 90) and 10 (SEQ ID NO: 94) in loci A, B and C respectively catalyze the transfer of the farnesyl unit onto the core element of the farnesyl dibenzodiazepinone and related compounds produced by loci B and C. ADSA ORFs 24 (SEQ ID NO: 48), 22 (SEQ ID NO: 92) and 19 (SEQ ID NO: 96) in loci A, B and C respectively catalyze the condensation of 3-hydroxy-anthranilate and 2-amino-6-hydroxy-[1,4]benzoquinone and farnesyl-diphosphate to form the dibenzodiazepinone core element of ECO-04601 and related compounds produced by loci B and C.
B. Clustal™ Alignments
Alignements of isoprenyl transferases (IPTN) and adenylating amide synthetases (ADSA) of loci A, B and C, respectively presented in FIGS. 9 and 10, were generated by the Clustal™ alignment method.
FIG. 9 shows an alignment of ORFs 11 (SEQ ID NO: 22), 9 (SEQ ID NO: 90, which represents the polypeptide deduced from SEQ ID NO:91) and 10 (SEQ ID NO: 94, which represents the polypeptide deduced from SEQ ID NO:95) in loci A, B and C respectively, highlighting the phylogenetic relatedness of these three proteins. The amino acid sequence of all three proteins is extremely conserved as shown by the codes on the fourth line, suggesting that these proteins share a well-conserved and related isoprenyltransferase enzymatic function. The following consensus amino acid sequence (also as SEQ ID NO: 98) that represents all three sequences was generated using the hmmemit algorithm (HMMER, Washington University in St-Louis, School of Medicine, MO, USA, http://hmmer.wustl.edu):
“AaELysviEesARILdvaCsrDrvwpiLsaYGDaFaHpaavvAFRvAtalRHvGELD CRFttHPddRDPYAIALsrGLtPktdHPvGsLLsevqeRIPvesyGiDFGvvGGFKKiYafFtPDe LqevaaLAgiPamPRsLAgnadFFeRyGlddrvGvlGiDYPartvnvyfndvpaesfesetirstlreiGma epsermI kIGekafGlyvtlGwdsseiericyaaattdIttIpvpvepeiekfvksvpyGGedrkfvyGvaltpkGey ykleshykwkpGavdfi”
FIG. 10 shows an alignment of ORFs 24 (SEQ ID NO: 48), 22 (SEQ ID NO: 92, which represents the polypeptide deduced from SEQ ID NO: 93) and 19 (SEQ ID NO: 96, which represents the polypeptide deduced from SEQ ID NO: 97) in loci A, B and C respectively, highlighting the phylogenetic relatedness of these three proteins. The amino acid sequence of all three proteins is extremely conserved as shown by the codes on the fourth line, suggesting that these proteins share a well-conserved and related adenylating amide synthetase enzymatic function. The following consensus amino acid sequence (also as SEQ ID NO: 99) that represents all three sequences was generated using the hmmemit algorithm:

“VneprssLPrLGqWhGpEDLrrLqEKqLaqtvtWAaRsPFYRdRLds

gAlPvtaaDLAdLPLttKqDLRDnYPFGmLAvPkERLAtYHEssGtAGr

PtPsYYtAeDWtDLAERFARKWiGmsAeDvFLvRtPYALLLtGHLAH

AAgRLrGAtvvPGDnRsLAmPYARvvRvmHDLgvtLtWsvPtECLiW

AAAAtAAGHRPdvDFPALRALFvGGEPltdARRrRisRLWGvPviEE

YGstEtGsLAGECPeGRIHLWADRALFEvYDPdtGtvrAdGdGqLvv

tPLfREAmPLLRYnLEDnvsvsYDDCaCGWkLPtvrvLGRaAFGyRv

GattitqHrLEElvFsLPeahrvvFWRAkAEPavLRiEiEvaeeHRv

AAeAELtasvRaaFGvDsevtGLaPGtLiPreALtsmPDvvKPRsLF

GPDEDWgKALLYY”

The amino acid shown for the consensus sequences (SEQ ID NOs: 98 and 99) are the highest probability amino acid at that position according to the HMM (hidden Markov model). Highly conserved residues (those with a probability of >0.5) are shown by capital letters while other residues (lowercase letters) are deduced by the program from the most common amino acid found at the specific position in the aligned proteins (HMMER User's Guide, Sean Eddy, October 2003, Washington University of Medicine, MO, USA, p 23-24).

Example 7

Labeled 3-Hydroxyanthranilic Acid Feeding

This experiment was designed to confirm the farnesyl dibenzodiazepinone biosynthetic pathway involves a 3-hydroxyanthranilate intermediate. First, labeled 4,6-dideuterio-3-hydroxyanthranilic acid was prepared. Then the labeled intermediate was fed to the Micromonospora sp. strain, the product was purified (see Example 2) and the results were analyzed. The following is an exemplary procedure to accomplish the feeding experiment:
A. Preparation of 4,6-dideuterio-3-hydroxyanthranilic acid
3-Hydroxyanthranilic acid (108 mg, Sigma-Aldrich) was suspended in D₂O (2 mL). Potassium t-butoxide (154 mg) was added to give a brown solution. The solution was stirred at 100° C. under nitrogen for about 6 days. The reaction mixture was cooled to room temperature. The solution was acidified to pH 6 with 10N hydrochloric acid and white solid precipitated. The solid was filtered and dried in vacuo (93 mg). The ¹H NMR of the isolated product showed about 92-96% reduction of the proton signals (doublets) at the 4 and 6 positions. The ¹H NMR signal of the unchanged proton (5 position) also reflected the incorporation of the two deuterium; coupling to the 4 and 6 protons was nearly lost (triplet changed to a singlet having two very small side peaks).
B. 4,6-dideuterio-3-hydroxyanthranilic Acid Feeding

B.1. Culture Conditions:

To prepare a vegetative culture, Micromonospora sp. 046-Eco11 was grown on ISP2 agar (Difco) for 10 to 15 days, and the surface growth from the agar plate was homogenized and transferred to a 125 ml flask containing three glass beads (5 mm diameter), and 25 ml of sterile medium KH composed of 10 g glucose, 20 g potato dextrin, 5 g yeast extract, 5 g NZ-Amine A, and 1 g CaCO₃made up to one liter with tap water and adjusted to pH 7 with 1 M NaOH.). This vegetative culture was incubated at 28° C. for about 70 hours on a shaker at 250 rpm with a 1-inch throw.
Following incubation, 18 ml was used to inoculate 2 L baffled flasks each containing 600 ml of sterile Hi production medium consisting of 20 g potato dextrin, 30 g glycerol, 2.5 g Bacto-peptone, 8.34 g yeast extract, and 3 g CaCO₃made to one liter with distilled water and adjusted at pH 7.0 with 1 M NaOH. The culture was incubated at 28° C. for about 96 hours on a shaker at 250 rpm with 1-inch throw.

B.2. Feeding Experiment:

Vegetative cultures of Micromonospora sp. 046-Ecol 1 prepared in medium KH as explained above were used to inoculate Hi medium (four 125-mL flasks containing 25 mL). The medium was fed with 4,6-D₂-3-hydroxyanthranilic acid at 0.5 mg/mL before inoculation with the vegetative culture at 2% level. Control cultures without adding the labeled compound were prepared for each medium in the same way mentioned above. Effect of adding 4,6-D₂-3-hydroxyanthranilic acid on the production titre and growth was measured by adding the unlabeled compound to each medium in the same fashion explained above. The purified compound obtained from each experiment was tested by ¹H-NMR for incorporation ratio of the labeled substrate.
C. Results:
The purified farnesyl dibenzodiazepinone from the feeding experiment was analyzed both by ¹H NMR and mass spectrum. The ¹H NMR (in DMSO-d₆) was compared to the unlabelled standard. About 31% reduction in the intensity of the signals at 6.82 and 7.06 ppm in DMSO-d₆(correspond to protons signals at 6.83 and 7.14 ppm in MeOH-d₄) was observed, which reflected a 31% incorporation of the deuterium at these positions. Mass spectral analysis gave about 47% incorporation of the deuterium labeled precursor.
The result indicated a direct incorporation of 3-hydroxyanthranilate as a precursor in the biosynthesis of ECO-04601.

Example 8

Methods of Using the Deposited Cosmids

Two deposits of E. coli DH10B vectors (046KM and 046KQ), having deposit accession numbers IDAC 250203-06 and IDAC 250203-07 respectively, each contain a cosmid clone and together span the whole biosynthetic locus of ECO-04601. The coverage of the locus by each deposited cosmid is described in Example 5 and shown on FIG. 4.
Culture conditions to be employed for growing the deposited cosmid-containing DH10B™ E. coli are understood by a person of skill in the art (Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 2^nded., Cold Spring Harbor Laboratory Press). As a non-limiting example, upon receiving a sample of the deposited strain, either as a frozen glycerol stock or as an agar stab or in a liquid media, a small aliquot of the strain is gathered using a sterile metal loop and thereafter streaked onto a selective media agar on freshly prepared growth plates (e.g. disposable plastic Petri® plates). The aliquot is streaked so that single bacterial colonies can be isolated. A number of different growth media can be used, provided that the media contain an appropriate amount of a selective agent, for example an antibiotic. Standard growth media are known in the art, such as standard Luria Bertani (LB) media (10 grams of NaCl, 10 grams of tryptone, 5 grams of yeast extract, 20 grams of agar, with pH adjusted to 7.0 with 5.0 N NaOH add deionized water to a final volume of 1.0 liters, autoclaved then cooled to 55° C. followed by addition of 10 mL of 10-mg/mL filter-sterilized ampicilin or 5 ml of 10-mg/mL filter-sterilized kanamycin). Plates with streaked bacteria are incubated overnight (approximately 16 hours) at 37° C. to allow for growth of the bacterial colonies.
Cosmid DNA containing insert DNA are prepared from the above-noted strains by methods that are known in the art. As a non-limiting example, a single bacterial colony is selected from an agar plate (as referred to above) and re-streaked onto a fresh agar plate, containing the appropriate selective agent as noted above, and allowed to grow overnight at 37° C. From this second agar plate, a single bacterial colony is selected and inoculated into 2.0 to 5.0 ml of liquid broth containing the appropriate amount of a selective agent, for example LB broth (prepared as per LB media, but lacking agar) containing ampicillin or kanamycin in a concentration as noted in the preceding paragraph, in order to generate a liquid starter culture of the single bacterial colony. This starter culture is grown to late logarithmic stage (approximately 8 hours), at which time an aliquot of the starter culture is withdrawn and diluted, by a factor of 500 to 1000, into a volume of broth containing the selective agent and grown with vigorous shaking (approximately 300 revolutions per minute) to late logarithmic/stationary phase (approximately 10 to 12 hours) to achieve a cell density of approximately 3 to 4×10⁹cells per ml. Cell density is estimated by taking an aliquot of the liquid culture and obtaining an OD₆₀₀reading using a spectrophotometer, or by centrifuging the liquid culture and thereafter measuring the weight of the resulting bacterial pellet. Typically, 1.0 liter volume of an liquid culture of E. coli that is grown overnight at 37° C., 300 rpm with a cell density of approximately 3 to 4×10⁹cells per ml will correspond to a pellet weight of approximately 3 g/l. Depending on the desired amount of insert-bearing cosmid DNA that is required, a person skilled in the art would understand that either a liquid “mini-culture” of 2.0 to 5.0 ml or a liquid “maxi-culture” of 500 ml may be required to be grown to result in the desired amount of cosmid DNA to be isolated.
Cosmid DNA, bearing the insert DNA of interest, is isolated from the bacteria grown in liquid cultures, as described in the preceding paragraph, using procedures that are known in the art. Non-limiting examples include the use of commercially available kits, for example the QIAGEN® Large-Construct Kit (QIAGEN Inc., Catalogue No. 12462) or Perfectprep® BAC 96 Kit (catalogue order number 955150431) available from Eppendorf North America (Westbury, N.Y.). Alternatively, the insert-bearing cosmid DNA is isolated by following procedures detailed for a traditional alkaline lysis method as described in Birnboim and Doly (1979) Nucleic Acids Research 7(6): 1513-1523, or in a cosmid-specific manual (e.g. the SuperCos™ 1 Cosmid Vector Kit Instruction Manual published online at www.stratagene.com). As an example of an alkaline lysis procedure, insert-bearing cosmid-containing bacterial cells from a 5.0 ml culture are collected by centrifugation (using an appropriate, sterile centrifuge tube) for 2 minutes followed by aspiration of the supernatant and resuspension of the pellet by vortexing in 200 μl of an ice cold solution of 50 mM glucose, 10 mM EDTA, 25 Mm Tris-HCl (pH 8.0). Following resuspension of the bacteria, 400 μl of a freshly prepared solution of 0.2 N NaOH, 1% SDS is added and the contents gently mixed by inversion (vortexing must be avoided), followed by incubation on ice for 5 minutes. Following incubation on ice, 300 μl of ice-cold potassium acetate (approximate pH 4.8) is added, and the tube gently inverted twice and incubated on ice for a further 5 minutes. The tube is then centrifuged for 5 minutes at 4° C. and 500 μl of the supernatant is transferred to a fresh (sterile) tube. The transferred supernatant is deproteinated by extraction with phenol-chloroform, keeping the upper phase to which is then added 1.0 ml of ethanol. The tube is left standing at room temperature for 5 minutes, and thereafter microfuged for 30 minutes, followed by aspiration of the liquid from the tube. The remaining DNA pellet is washed in 70% ethanol, centrifuged (in a microfuge), and after aspiration of the liquid and drying (avoiding complete dryness) of the pellet, the DNA is resuspended in 50 μl of Tris-EDTA (TE). DNA concentration is estimated by taking an OD₆₀₀reading on a 1/100 diluted aliquot of the purified insert-bearing cosmid DNA. The insert-bearing cosmid DNA is thereafter used in any number of downstream applications that would be appreciated by a person skilled in the art.
Segments or regions of the insert DNA can be generated by performing a restriction digestion on the insert-bearing cosmid DNA using protocols that are known to those of skill in the art. The segments or regions of the insert DNA may be of interest to the person of skill in the art as the particular nucleotide may be that for a gene(s) that is to be manipulated for a downstream application. As well, the segments or regions of the insert DNA may be of interest to the person of skill in the art as the particular nucleotide may be that for an entire biosynthetic locus, or a portion thereof, that encodes for the production of a natural product. It is possible that the nucleotide sequence of the insert DNA encodes one or more modules, which may be comprised of one or more domains, of a nonribosomal peptide synthetase or a polyketide synthase locus encodes for the production of a bioactive natural product.
As an example that is not intended to be limiting, if the sequence of the insert DNA is known, the presence of particular restriction enzyme sites within the insert DNA are determined and the region (i.e. the fragment) of DNA situated between two restriction enzyme sites cut or digested from the cosmid DNA. Generally, it is preferred in the art to use a restriction enzyme that recognizes a six base pair (bp) DNA recognition sequence as opposed to a four base pair recognition site, as there will be fewer restriction sites in a given stretch of DNA for six bp restriction enzyme, thereby offering less chance of digesting the cosmid (i.e. the vector) DNA per se. Selection of a given restriction enzyme may also be dependent upon whether the ends of the generated DNA fragment are to be blunt or are to possess overhangs so as to facilitate sub-cloning of the DNA fragment. Restriction digestion conditions are known to those skilled in the art. While not intending to be limiting, a digestion is usually performed using a minimum of 0.2 μg of DNA. If the DNA fragment to be generated is to be used as a probe, for example in Southern blotting, then an amount of DNA of at least 10 μg will be required for digestion. A restriction digestion can usually be performed in a range of reaction volume between 10 μl to 50 μl, using a requisite number of units of the given restriction endonuclease plus the particular buffer for the restriction enzyme and a necessary amount of sterile water to give the desired reaction volume. One unit of a restriction endonuclease will digest 1 μg of DNA in one hour, and it is common to use a ten-fold excess of the restriction enzyme to ensure complete digestion, provided that the volume of the restriction enzyme used does not exceed 10% of the final reaction volume. Upon addition of the restriction enzyme as the last component of the reaction mixture, the tube containing the mixture should be gently flicked with a finger to ensure proper mixing of the tube contents, followed by a brief centrifugation and incubation of the tube at 37° C., or at an elevated temperature 50-65° C. if the restriction enzyme is one isolated from a thermophilic bacteria, for a time span ranging from one to four hours. The reaction time may be extended beyond for greater lengths of time if it is desired. Reaction and deproteination may be accomplished by heat inactivating the restriction enzyme followed by phenol-chloroform extraction of the reaction (as described above), or by using a commercially available kit such as the MinElute® Reaction Cleanup Kit from QIAGEN.
Downstream uses of the insert DNA are discussed in Section V11 above and include: Labeling and use of the fragments as probes to detect the presence of the given gene or the expression of the given gene in a different organism; Use of the fragment in hybridization experiments; PCR amplification of the insert DNA or regions of interest of the insert DNA; Mutagenesis of the particular DNA segment of interest in order to produce substitutions, additions, deletions, fusions or truncations in the expressed polypeptide, which can be accomplished by random chemical mutagenesis, site directed mutagenesis, error-prone PCR, exonuclease II deletion, oligonucleotide mutagenesis for PCR; Generation of variant forms of the peptide of interest with conservative vs. non-conservative changes in the amino acid sequence to result in the production of novel end-product compounds; Cloning and use of the DNA sequence of interest in a heterologous expression system (yeast, mammalian, insect, plant expression vectors) for the production of the peptide of interest, and the creation of tagged (e.g. His, c-myc, Ni-tagged, etc.) fusion proteins; Use of the peptide that is produced to raise polyclonal or monoclonal antibodies (via the production of hybridomas).
Antibodies (Ab's) are also used as probes to isolate interacting proteins—Ab's are generated against the peptides resulting from the heterologous expression of the DNA sequence of interest. Proteins that may potentially interact with that encoded by the DNA sequence of interest may also be identified by yeast two-hybird screening as described in U.S. Pat. No. 5,283,173.
All patents, patent applications, and published references cited herein are hereby incorporated by reference in their entirety. While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims

1. An isolated polynucleotide comprising a polynucleotide sequence, or a polynucleotide sequence complementary thereto, selected from the group consisting of:

a) a polynucleotide encoding a polypeptide having at least 90% sequence identity to a polypeptide consisting of amino acids 1-438 of SEQ ID NO: 48 and having adenylating amide synthetase activity;

b) a polynucleotide encoding a polypeptide having at least 90% sequence identity to a polypeptide consisting of amino acids 1-290 of SEQ ID NO: 22 and having isoprenyl transferase activity;

c) a polynucleotide comprising the nucleic acid sequence of SEQ ID NO:47; and

d) a polynucleotide comprising the nucleic acid sequence of SEQ ID NO:23.

2. An isolated polynucleotide comprising the nucleic acid sequence of SEQ ID NO:47.

3. An isolated polynucleotide comprising the nucleic acid sequence of SEQ ID NO:23.

4. The isolated polynucleotide of claim 1, wherein said polypeptide of a) has at least 95% sequence identity to a polypeptide consisting of amino acids 1-438 of SEQ ID NO: 48.

5. The isolated polynucleotide of claim 1, wherein said polypeptide of a) has at least 99% sequence identity to a polypeptide consisting of amino acids 1-438 of SEQ ID NO: 48.

6. The isolated polynucleotide of claim 1, wherein said polypeptide of b) has at least 95% sequence identity to a polypeptide consisting of amino acids 1-290 of SEQ ID NO: 22.

7. The isolated polynucleotide of claim 1, wherein said polypeptide of b) has at least 99% sequence identity to a polypeptide consisting of amino acids 1-290 of SEQ ID NO: 22.

8. A purified polypeptide selected from the group consisting of:

a) a polypeptide comprising amino acids 1-290 of SEQ ID NO: 22; and

b) a polypeptide having at least 90% sequence identity to a polypeptide comprising amino acids 1-290 of SEQ ID NO: 22 and having an isoprenyl transferase activity; and

c) a polypeptide encoded by a polynucleotide, the complement of which hybridizes under stringent conditions to a polynucleotide encoding a polypeptide comprising amino acids 1-290 of SEQ ID NO: 22, and having an isoprenyl transferase activity.

9. A purified polypeptide comprising amino acids 1-290 of SEQ ID NO: 22.

10. The purified polypeptide of claim 8, wherein said polypeptide of b) has at least 95% identity to a polypeptide comprising amino acids 1-290 of SEQ ID NO: 22.

11. An expression vector comprising a polynucleotide of claim 1.

12. The expression vector of claim 11, wherein said polynucleotide encodes a polypeptide having at least 90% sequence identity to a polypeptide comprising amino acids 1-438 of SEQ ID NO: 48 and having adenylating amide synthetase activity.

13. The expression vector of claim 11, wherein said polynucleotide encodes a polypeptide having at least 90% sequence identity to a polypeptide comprising amino acids 1-290 of SEQ ID NO: 22, and having isoprenyl transferase activity.

14. An isolated host cell transformed with an expression vector of claim 11.

15. The isolated host cell of claim 14, wherein said host cell is a bacterial host cell.

16. A method for producing a farnesyl dibenzodiazepinone compound, comprising:

a) providing a prokaryote transformed with an expression vector of claim 11; and

b) culturing the prokaryote under conditions such that (i) an adenylating amide synthetase or an isoprenyl transferase is expressed, and (ii) a farnesyl dibenzodiazepinone compound is synthesized.

17. The method of claim 16, wherein said prokaryote is E. coli.

18. The method of claim 16, wherein said prokaryote is an actinomycete.

19. An isolated polynucleotide encoding:

a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88; or

b) a polypeptide having at least 85% sequence identity to a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 41, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 65, 67, 69, 70, 71, 74, 76, 78, 80, 82, 84, 86 and 88, and having the same biological function as the corresponding polypeptide.

20. A cosmid selected from the group consisting of cosmid 046KM deposited under IDAC accession no. 250203-06 and cosmid 046KQ deposited under IDAC accession no. 250203-07.

21. The cosmid of claim 20, wherein said cosmid is inserted into a prokaryotic host for expressing a product.