WO1991016683A1 - System and method for determining three-dimensional structures of proteins - Google Patents

System and method for determining three-dimensional structures of proteins Download PDF

Info

Publication number
WO1991016683A1
WO1991016683A1 PCT/US1991/002786 US9102786W WO9116683A1 WO 1991016683 A1 WO1991016683 A1 WO 1991016683A1 US 9102786 W US9102786 W US 9102786W WO 9116683 A1 WO9116683 A1 WO 9116683A1
Authority
WO
WIPO (PCT)
Prior art keywords
conformation
xyz
ica
protein
residues
Prior art date
Application number
PCT/US1991/002786
Other languages
French (fr)
Inventor
Jeffrey Skolnick
Original Assignee
Scripps Clinic And Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Scripps Clinic And Research Foundation filed Critical Scripps Clinic And Research Foundation
Publication of WO1991016683A1 publication Critical patent/WO1991016683A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Definitions

  • This invention relates to modeling systems generally, and particularly to computer-based
  • the structure (native conformation) of the protein particularly the conformation of the outer sites or sidechains (which are linked to the backbone and inner structures of the protein) often determines the capacity of the protein to interact with other proteins.
  • One factor which directly influences conformation is protein folding. Deciphering the rules through which the building blocks (amino acid sequences) of the protein affect folding promises significant improvements in the design of proteins, many with a host of new catalytic functions useful, for example, in the chemical, food processing, pharmaceutical, and other industries.
  • an improved computer-based system is provided that is capable of processing a full sequence of amino acid residues of a protein (e.g., plastocyanin), representing free
  • the system comprises an input means such as a keyboard for specifying (entering) selected amino acid sequences and other data such as temperature and fold preferences, a RAM (random access memory) for storing such data, a ROM (read-only memory) with a stored program, a CRT (cathode ray tube) display unit and/or printer, an optional auxiliary disc storage device for storage of relevant data bases, and a microprocessor for processing the entered data, for simulating, under control of the stored program, the folding of the protein from its unfolded state to its folded (tertiary) state, and for displaying via the display unit (or printer) tertiary conformations of the protein in three dimensions.
  • a keyboard for specifying (entering) selected amino acid sequences and other data such as temperature and fold preferences
  • a RAM random access memory
  • ROM read-only memory
  • CRT cathode ray tube
  • auxiliary disc storage device for storage of relevant data bases
  • a microprocessor for processing the entered data, for simulating,
  • a novel lattice is employed for representing
  • ⁇ -carbons located a distance of ⁇ 5 units from each other.
  • the ⁇ -carbons represent a chain or backbone of the protein.
  • Each ⁇ -carbon is shown to occupy a central cubic lattice side plus six adjacent cubic lattice sites defining a surface of interaction (e.g., an area or volume having a surface of finite size).
  • Each sidechain is represented as being embedded in the lattice and occupying a selected number (four) of lattice sites located relative to the central site, the number of sites occupied by the sidechain being proportional to the number of sites defining the surface of interaction.
  • the system determines the tertiary conformation of the protein using Monte Carlo dynamics with an asymmetric Metropolis sampling criterion.
  • the system (a) generates a three-dimensional representation of an unfolded conformation consisting of an ⁇ -carbon backbone and sidechains, (b) produces (in accordance with local conformational preferences of the
  • the system modifies each conformation by moving randomly selected residues (beads) and inter-residue bond vectors to different selected lattice sites by performing various type moves (single-bead jump-type moves, two-bead end-flip moves, chain-rotation type moves, and translation wave-type moves).
  • Fig. 1 is a diagramatic illustration of a globular protein in its native folded conformation.
  • Fig. 2 is a diagramatic illustration of a full sequence of amino acid residues of which the protein represented in Fig. 1 is comprised.
  • Fig. 3 is a block diagram of the system of the present invention.
  • Fig. 4 is a block diagram showing a perspective view of a cubic lattice model employed in the system of Fig. 3.
  • Fig. 5 is a block diagram showing a segment of a protein model comprising an ⁇ -carbon and
  • Fig. 6 is a diagramatic illustration of an ⁇ -carbon backbone of a protein segment.
  • Fig. 7 is a diagramatic illustration of an ⁇ -carbon of the protein backbone segment shown in Fig. 6.
  • Figs. 8A-8C are diagramatic illustrations of selected simple arrangements of an ⁇ -carbon backbone and associated sidechains.
  • Fig. 9 is a diagramatic illustration of a jump-type move made by a randomly selected residue (bead) within the lattice of Fig. 4, effecting a change in conformation of the protein model.
  • Fig. 10 is a diagramatic illustration of a rotation-type move made by a pair of randomly
  • Fig. 11 is a diagramatic illustration of a translation-type (wave-type) move made by a U-shaped segment within the lattice of Fig. 4, effecting a change in conformation of the protein model.
  • Figs. 12A-12D are diagrammatic illustrations of the folding of a selected segment of a protein to a ⁇ -barrel conformation.
  • Figs. 13A-13C are graphs showing an average number of native contact pairs between sidechains versus time.
  • Figs. 14A and 14B are graphical representations
  • FIG. 1 illustrations of a folding pathway defined by a sequence as it folds from an unfolded state to a folded (native) state.
  • Figs. 15A-15F are block diagrams (flow charts) showing a method employed by the system of Fig. 3 in simulating protein folding.
  • Fig. 16 is a block diagram showing an alternate embodiment of the processor of Fig. 15.
  • FIG. 1 A simplified representation of a globular protein (e.g., plastocyanin) in its native (natural, folded) form is shown in Figure 1.
  • Figure 2 A simplified representation of a full sequence of amino acid residues of which the protein is comprised is shown in Figure 2. The protein becomes unfolded
  • Temperature may be specified in any unit (whether fahrenheit, centigrade, or Kelvin) and at any level or value (whether in or outside the transition range of the protein) as explained hereinafter. Generally, depending on the native biological conditions of the particular protein molecule being investigated, the temperatures that are specified are those in and bordering the transition region of the protein
  • a section or segment 11 of a full sequence (e.g., a sequence of a protein much like that depicted in Fig. 2) is shown in stick form (without associated residues or atomic structure).
  • the section 11 includes an ⁇ -carbon segment 13 and a sidechain ( ⁇ -carbon) segment 15 representative of each amino acid residue of the protein.
  • the protein segments may be viewed as embodied within a cubic reference framework or lattice model (Fig. 4), constructed from vectors of the type (1,0,0), (0,1,0), (0,0,1), the distance between any two adjacent points being unity.
  • the ⁇ - carbon atoms 13 when linked as shown in Fig. 6 form the backbone 14 of the protein. As shown in Figs. 4 and 7, each ⁇ -carbon 13 may be viewed as occupying a central cubic site 17 plus six adjacent cubic sites 18-23, defining a finite surface of interaction.
  • Adjacent ⁇ -carbon centers may be viewed as linked by a 210-type lattice vector 25, as shown in Fig. 4.
  • the backbone 14 (Fig. 6) represents a structure of finite thickness about which a somewhat inflexible, hard core envelope of a chain of residues develop.
  • the conformation of the backbone at the i th ⁇ -carbon is specified in terms of r 2 ⁇ 1 , the square of the distance between adjacent ⁇ -carbons (i-1 and ⁇ -carbons, and ⁇ represents a bond angle that one of the ⁇ -carbons make with respect to the other, as shown in Fig. 6.
  • the distance between consecutive ⁇ -carbons equals ⁇ 5 units.
  • Selected values of r 2 ⁇ are 6, 8, 10, 12, 14, 16, and 18, expressed in model units, indicating various accessible bond angle states. These values represent internal orientational states corresponding to actual (known) physical conformations.
  • each ⁇ -carbon has attached to it a sidechain 15, constructed for example in a helix conformation 27, or in a ⁇ -strand conformation 29. From the central vertex portion 31 of the ⁇ -carbon, the sidechain 15 is formed,
  • any intrinsic preference of the protein model for a particular conformation may be represented by the individual preferences of the respective residues for the various bond angle states.
  • the term local conformational preferences shall mean the relative preferences which each local group of residues (i.e., a selected residue plus two flanking (adjacent) residues on either side of the selected residue) exhibit for the different
  • conformational states As indicated previously, these states are represented by the value r 2 ⁇ of the lattice model. Since for every residue i there are seven distinct values of r 2 ⁇ , corresponding to 18 distinct local conformational states, the local energetic preference (denoted as parameter e ⁇ (r 2 ⁇ 1 )) for each of the states (r 2 ⁇ values) must be
  • the torsional (dihedral angle) potential of a residue i.e., its tendency to undergo an angle of rotation or twist
  • the torsional potential of a residue i.e., its tendency to undergo an angle of rotation or twist
  • X -1 for left-handed chirality (L).
  • L left-handed chirality
  • conformation could also be specified by the vectors b 1 , b i+1 , b i+2 as shown in Figure 8) . As many as 324 rotational states exist for each internal bond.
  • rotational states are all assigned a relative energy value ⁇ ⁇ (r 2 ⁇ 1 , r 2 ⁇ i+1 , r 2 ⁇ i X). Generally, all such rotational states are statistically weighted. Where the majority of the conformations are taken to be isoenergetic (with a small bias toward a small subset of conformations that are native), the short and intermediate range energetic preferences may be represented as e(r 2 ⁇ ,1 , r 2 ⁇ i+1 , r 2 ⁇ 1 ) .
  • the seven lattice sites that define the ⁇ -carbon (Fig. 7) and the four lattice sites (Fig. 5) that define the surface 24 of the sidechain interact repulsively (i.e., with strong, hard core repulsion) with all the other ⁇ -carbons and their respective sidechains. In other words, no more than one
  • r k1 represents the distance between the k th and 1 th such centers, then the soft core repulsive energy e rep between the pair may be expressed as:
  • interaction energy parameter E c is introduced which allows for secondary structure stabilization when any part of the ⁇ -carbon hard core envelope of the l th residue is at a distance of 3 units from the ⁇ -carbon center of the k rh residue.
  • the cooperative interaction energy e ckl may be given by:
  • ⁇ ckl ⁇ c dot (b k ,b l ) + dot (b k+1 ,b l ) +
  • Sidechains may be hydrophobic, hydrophilic or inert. Pairs of hydrophobic
  • hydrophilic pairs interact weakly (i.e., weakly attractive or repulsive with no change in quality to behavior).
  • hydrophobicity index h(i) ⁇ 0
  • hydrophilic residues were assigned a positive hydrophobicity index h(i) > 0.
  • am(i,j) -h(i) . h(j) . ⁇
  • the subscripts phobe-phobe mentioned above represent interaction between two hydrophobic residues, phobe-phil
  • hydrophobicity scale Based on the frequency of occurrence of contacts between sidechain pairs in protein crystal structures, the MJ scale is used to determine effective inter-residue contact energies.
  • short-range interactions shall mean interactions between adjacent residues in the chain and does not include effects of their neighbors (i.e., neighboring residues in the chain).
  • Medium-range interactions shall mean interactions between first, second, and third nearest-neighbor residue groups in the chain.
  • Long-range interactions shall mean interactions between residues (not ⁇ - carbons) which are positioned greater than three nearest neighbors apart down the chain but which are spatially close (i.e., within 3 ⁇ A of each other).
  • a native conformation may comprise one of a number of isoenergetic states. It is the juxtaposition of short-medium-and-long-range interactions, together with other factors described herein that produce the final result, namely a stable, folded conformation.
  • B i (k) is used to represent the i th stretch of k residues in the sequence.
  • the k residues are represented as having identical ⁇ ⁇ and ⁇ ⁇ values and a marginal (short and intermediate range) preference for ⁇ -state conformation. range) preference for ⁇ -state conformation.
  • Putative band regions are denoted by b i (j), and consist of j residues located at the interface between putative ⁇ -stretches i and i+1. Chain Dynamics. Modification of Conformations
  • a clock is used to sequentially choose the particular outcome.
  • the minimum size unit selected for rotation consists of three beads, and the maximum size unit is 2+wave.
  • the value of the parameter "wave" is generally 4, it is chosen so that the size of the unit undergoing the rotation is the size of a mean element of secondary structure.
  • the particular size of the unit ( ⁇ +1) undergoing the attempted rotation is chosen by the value of an external clock parameter, and
  • a particular bead I at one end of the rotating unit, is chosen at random. For beads less than n/2, the unit undergoing the rotation is 1-5. For beads greater than n/2, the unit undergoing rotation is I+ ⁇ . If ib represents the first residue at the beginning of the rotating unit, and iend represents the residue at the end of the rotating unit, then if the bond angle state between the vectors b ib and b iend-1 is a 14-18 state, the rotation is attempted. (The range of values of r 2 ⁇ i is chosen so that the rotation is physically possible.) The rotation is implemented by interchanging the two bond vectors (e.g., vectors 35, 37 joining randomly selected bead 39 shown in Fig.
  • the initial set of bond vectors joining residues ib to iend is (b ib , b ib+1 , ... b iend-2 , b iend-1 ).
  • the final set of bond vectors is (b iend-1 , b ib+1 , ...
  • the defect insertion point lies to the left of I, and the other half the time it lies to the right of it.
  • the value 4 is selected for wave.
  • the bond vectors b z 41 and b I+3 43 are then sliced out of the chain, thereby deleting two beads 43 and 47, provided that b I+1 49 and b I+2 51, which will form the new bond angle state or vertex l+1 53, satisfy the local geometric constraints of the chain.
  • two bonds 49, 51 are inserted into the chain.
  • the new set of four vectors are (v, b JJ-1 , b JJ , -v), where the vector v is chosen at random. Note that the intervening bond factors between 1+4 and JJ-2 are left unchanged. A new conformation is then generated by renumbering the residues so that their identity is conserved. As before, both excluded volume and local bond angle criteria must be satisfied in order for the
  • E new is calculated and compared to the energy of the old conformation E old .
  • E new represents the sum of the individual energies, and is expressed as:
  • E new E ⁇ + ⁇ ⁇ + E c + E s
  • E ⁇ ⁇ ⁇ ⁇
  • E s also referred to as E side
  • E old represents the initial total value, then successive previous total values with which E new is compared.
  • T temperature
  • S entropy
  • k B represents Boltzmann's constant and T represents the absolute temperature of the protein.
  • T represents the absolute temperature of the protein.
  • sampling scheme or criterion is applied in conjunction with a dynamic Monte Carlo technique (as described for example in Monte Carlo Methods in Statistical Physics by K. Binder, cited above).
  • a single Monte Carlo dynamics time step consists of N attempts at move type (i) (jump-type move) mentioned above, two attempts at move type (ii) where each of the chain ends are subjected to move type (ii), one attempt at move type (iii), and one attempt at move type (iv).
  • the protein model is started out in a randomly generated high temperature (T) state. It is then cooled down, equilibrated, cooled further, until collapse to a folded conformation occurs.
  • T high temperature
  • Monte Carlo time steps are sampled.
  • the set of elemental moves employed in the simulation satisfies the well known stochastic kinetics master equation describing the dynamics of the system. (Refer, for example, to Appendix B.) In the limit (after a large number of steps), an equilibrium distribution of states is generated.
  • thermodynamics of folding With respect to the thermodynamics of folding, a detailed explanation is presented below.
  • the protein By restricting the protein to the lattice, it may be treated as a rotational isometric state model of the protein.
  • the transition from the denatured to the native state is treated in the context of a two- state model.
  • the free energies of the denatured state A D , and the native state A N are calculated as follows: A D is calculated by neglecting all tertiary interactions in the denatured state (although
  • a N is approximated by the energy of the native state E N .
  • Z D J ⁇ V D,i J, as defined in Appendix C.
  • ⁇ S N 2 > and ⁇ S D 2 > are the mean square radii of gyration in the native and denatured state
  • conformational transitions can be approximated by a two-state model, or can be determined directly from folding trajectories.
  • Figs. 12A-D show a segment with backbone
  • ⁇ -carbons 101 and interacting sidechain sites 103 are also shown in the top view. Also shown in the top view are hydrophobic core 105 with the interdigitating sidechains 107, 109. Also shown are the corresponding conformations 111, 113 with ⁇ -carbons alone.
  • the first of the three native turns is shown to involve the eight through eleventh residues with backbone bond angle conformations 18, 8, 18, and 10, respectively.
  • the central turn is shown to involve a crossover connection between the two anti-parallel ⁇ -strands, and involves the eighteenth through twentieth residues with backbone bond angle
  • the residues are hydrophobic (hydrophilic).
  • the first strand consists of the first through eighth residue.
  • the ninth through eleventh turn residues are all hydrophilic.
  • the second strand runs from the twelfth to the eighteenth residue, with all the even (odd) residues hydrophobic (hydrophilic).
  • the nineteenth and twentieth turn residues are, respectively, hydrophilic and hydrophobic.
  • the third strand runs from the twenty-first to the twenty-sixth residue.
  • the twenty-seventh through twenty-ninth are turn residues, all of which are hydrophilic.
  • the fourth strand runs from the thirtieth to the thirty-seventh residue.
  • the first and last residues are virtual residues (i.e., they are devoid of sidechains, but they do occupy excluded volume). They may be regarded as capping the two ends, and are included so that the bond angle state for the real residues (the second and thirty-sixth residue) may be defined.
  • the subject of equilibrium folding the requirements for equilibrium folding of a region of the chain to its unique, native structure (e.g., the four-member ⁇ -barrel structure) is
  • parameter ⁇ ⁇ (16) was found equal to zero for the B i state and -0.25/T for all the other states.
  • the parameter ⁇ ⁇ (16,16,37) .6/T, and is zero for all other states.
  • ⁇ -conformations are locally preferred for the B i portions of the primary sequence, and that the torsional angle preference ⁇ ⁇ (for native-like conformations in the B i region) is locally favored by a ratio of 1:1.75 over that in the turn region.
  • the times indicated in the figure are in units of 500 Monte Carlo steps, and the fully native molecule contains twenty contact pairs.
  • conformational properties such as the energy, the instantaneous value of the radius of gyration, the total number of contact pairs N c,tot also undergo sharp changes in value that is a
  • turn propensities of 1% or lower are sufficient to yield folding to the native barrel of Figure 12.
  • the chain generally thrashes about for over many millions of time steps (e.g., over 50 million) without finding a unique folded form.
  • FIG. 14A-B A trajectory of a sample having the primary sequence ⁇ ⁇ > ⁇ ⁇ , 1., 1.75, is shown in Figs. 14A-B.
  • the three-member ⁇ -barrel is the long-lived
  • the mechanism of assembly is best described as punctuated, on-site construction.
  • unfolding is the reverse of folding.
  • unfolding starts with either one of the external strands becoming denatured or an internal stand closest to the denatured tail becoming unfolded.
  • Figs. 3 and 15 a system and method are shown and described for simulating protein folding and determining three-dimensional (tertiary) structures of proteins.
  • the system comprises an input means 57 such as a keyboard for specifying (entering) selected amino acid sequences and other data such as
  • a RAM random access memory
  • ROM read-only memory
  • CRT cathode ray tube
  • auxiliary disk storage device 67 for storage of relevant data bases
  • microprocessor 69 for performing, under control of the stored program, the steps of processing the entered data, simulating the folding of the protein from its unfolded state to its folded (tertiary) state, and displaying via the display unit (or printer) tertiary conformations of the protein in three dimensions.
  • the system inputs (specifies) the data for processing, stores the data in memory then processes it as shown in Figs. 15A-F.
  • Sample data of the type which may be input to the system is shown in Appendix E.
  • the system In processing the data, the system generates a tertiary interaction matrix as shown in Appendix E and produces, in addition to a display of the protein's tertiary structure, a sample output as shown in Appendix E for tracking the simulation.
  • the system operates under the control of a stored
  • Figs. 15A-F in response to the specified data the system generates a random conformation of backbone and sidechain elements
  • the system modifies the conformation by interchanging a randomly selected pair of bond vectors. In other words, it proceeds to change the rotation angle ⁇ . Thereafter, the system proceeds to determine the coordinates of lowest energy conformation which satisfy the Metropolis criterion. It does this by first calculating the total energy (E new ) of the new modified conformation then comparing the total energy E new with the total energy of the old conformation E old . If E old is greater than E new , then the
  • Step B the system directly proceeds to the next step (Step B).
  • the system proceeds to choose a bead at random to move within the lattice. Before moving the bead, the system tests if the move (which is a jump-type move) would violate the excluded volume criterion. If no, it proceeds with the move. If yes, it does not proceed with the move, and proceeds instead to choose the next bead until all the beads in the chain have been checked for modification (movement). If the move would not violate the excluded volume criterion, the conformation is modified by moving the bead to a new lattice site. In other words, the bead would make a jump move which would change its
  • the system calculates the total energy of the new conformation, that is, the total energy E new in a similar manner as indicated earlier.
  • E new is then compared with E old , the energy of the previous conformation before the move. If E old is greater than E new , then the coordinates of the old conformation are replaced with the coordinates of the new, and the next bead move is checked. If E old is greater than E new , then the Metropolis criterion is applied (and the random number R is generated, and the probability P is calculated in the same manner as indicated earlier, as shown in Fig. 15A-F), and the random number R is compared with the probability P. If R is less than P, the coordinates of the old conformation are replaced with the coordinates of the new and the next bead move is checked. If R is not less than P, the next bead move is checked and the loop is
  • step C the system proceeds to process the two end beads. It identifies the first end bead then checks if an end flip-type move would violate the excluded volume criterion. If no, it proceeds with the move. Otherwise, it aborts the move and proceeds to check the second end bead. In the event the move of the first end bead would not violate the excluded volume criterion, the system proceeds to modify the conformation by performing an end-flip move that changes the coordinates of the end bead. It then proceeds to determine the coordinates of the lowest energy conformation which satisfies the Metropolis criterion in the same manner as it did for the rotational and jump-type moves. After
  • the system checks if both end beads are processed. If the second end beads remain to be processed, the system identifies the second end bead and proceeds to check whether an end flip move of the second end bead would violate the excluded volume criterion. If it would violate the criterion and both end beads have been considered, it then proceeds to the next step (step D). If it does not violate the criterion, then the system proceeds to modify the conformation by performing an end-flip move of the second end bead changing the coordinates of the second end bead. It then proceeds to determine the coordinates of the lowest energy conformation which satisfy the
  • step D the system selects a bond at random then searches for a U-shaped segment. It then checks, after finding the U-shaped segment, whether a move of a translation (wave motion) type move would violate the excluded volume criterion. If not, it proceeds with the
  • step E If it does violate the excluded volume criterion, it aborts the move and proceeds to check if all the jump-type moves were made. If all were made, it proceeds to the next step (step E).
  • the system proceeds to modify the conformation by performing the translation/wave- motion-type move changing the coordinates of the beads defining the U-shaped segment.
  • the system determines the coordinates of lowest energy
  • Sample display output is presented in Fig. 1.
  • Sample printed output is presented in Appendix E.
  • FIG. 16 An alternative embodiment of the system is presented in Fig. 16 comprising a keyboard 151 for entering data representing temperature and amino acid sequences, a RAM 153 for storing the entered data, and a unit 155 for generating a representation of a lattice, including unit 157 for positioning lattice sites, and unit 159 for positioning ⁇ -carbons
  • the system includes a unit 161 for combining the generated lattice
  • the allowed values of r 2 ⁇ are 6,8,10,12,14,16 and 18. Any other value of r 2 ⁇ is rejected as not realistic or not representable on the 210 lattice.
  • four sidechain vectors are constructed. The center of sidechain interaction is located at the site defined by a diamond lattice vector d 34, of the type ( ⁇ 1, ⁇ 1, ⁇ 1), which points from the center of the ⁇ -carbon to the point ( ⁇ 1, ⁇ 1, ⁇ 1).
  • the other three vectors f 1 , f 2 and f 336,38,40 are of the fcc type , whose sum is twice that of the diamond lattice vector d 34.
  • the vector d has left-handed chirality (L). With respect to the backbone, vector d points toward the N-terminus of the sequence.
  • the orientation angle is generally not less than 60°.
  • d x isgn(p x +w x )
  • d z isgn(p z +w z ).
  • ⁇ i ⁇ represents a first set of vectors
  • p( ⁇ i ⁇ ,t) represents the probability of finding a set of
  • k f represents rate of increase of the set (i) in size
  • k b represents rate of decrease of the set ⁇ i ⁇ in size
  • ⁇ i' ⁇ ) represents the probability of occupying set ⁇ i ⁇ upon moving from set ⁇ i' ⁇ ;
  • ⁇ i ⁇ ) represents the probability of occupying set ⁇ i' ⁇ upon moving from set ⁇ i ⁇ ; and the relationship between k f and k b may be expressed as:
  • k B represents Boltzmann's constant
  • T temperature (in degree Kelvin) of the protein.
  • a bead represents an amino acid residue comprising a full sidechain (i.e., four lattice sites) and backbone segment (i.e., seven lattice sites).
  • a bead is shown, for example, in Figures 5 and 9.
  • i and i+1 represents a first pair of vectors
  • i' and i'+1 represents a second pair of vectors
  • represents the bond angle between vectors (bonds)
  • a conformation may be modified by rotational and/or translational motion of one or more beads, as shown for example in Figures 10 and 11. Appendix C
  • the second vector can be
  • a pseudo inner product may be defined (by analogy to ortho- normal basis sets) as follows: (C-2) if the two vectors i and j are allowed, and (c-3) if the two vectors i and j are not allowed.
  • the internal partition function of the denatured state, Z 0 D may be obtained from where J is a row vector of dimension 24, consisting of a 1 followed by twenty-three zeros, J is a column vector consisting of twenty-four ones, and U D,i is a 24 x 24 matrix associated with the ith residue, each row of which contains 18 non-zero elements and 6 zero elements.
  • U D,i may be expressed as:
  • the configurational partition function can be written as the product of the internal bond angle partition functions associated with each bond angle state z ⁇ ,i : (C-6)
  • Equation C-4 The matrix product in equation C-4 is of the form: U0. 2 (1,k)U 0 . 3 (k,l) ⁇ U 0.N-2 (U,r)U 0.N-1 (r,s) (C-7)
  • the chain is divided into statistical weight matrices associated with pairs of bonds. That is, the partition function is calculated as (C-11)
  • J* 576 is a row vector of dimensionality 576 whose first ter is unity and remaining terms are zero.
  • J 576 is a column vector o dimensionality 576, all of whose elements are unity.
  • U ⁇ i is a 576 by 576 matrix.
  • ⁇ .K(j,k,l,N) (j,k)(k,l)(l,le)xp( -( ⁇ ⁇ ,-1 (k,l) /k B T)U 0 (j,k,l,N-1)
  • R D -k B Tln(Z D ).
  • c ncglyshort.f is a version of ncthermshort.f but which introduces c thermalization into the wave displacements
  • c ah is the l./temp in the thermalization step of the waves and c rotations
  • ICONF(I,J) (VX(I)+VX(J))**2+(VY(I)+VY(J))**2+(VZ(I)+VZ(J))**2
  • IDOTP IABS(VX(I)*VX(J)+VY(I)*VY(J)+VZ(I)*VZ(J))
  • ICONA ( ICONF(I,J)-4)/2
  • N2 VECTOR(WX2,WY2,WZ2)
  • ICONA (ICONF(I,J)-4)/2
  • DO LJ 2,LENF1 READ(12,*)K,STATN(LJ), IDIS(LJ), ASTR(LJ), IHAND(LJ)
  • APLPB APLPB/ATEMP

Abstract

The system comprises an input means (57) such as a keyboard for specifying (entering) selected amino acid sequences and other data such as temperature and fold preferences, a RAM (random access memory) (59) for storing such data, a ROM (read-only memory) (61) with a stored program, a CRT (cathode ray tube) (63) display unit and/or printer (65) an optional auxiliary disc storage device (67) for storage of relevant data bases, and a microprocessor (69) for processing the entered data, for simulating, under control of the stored program, the folding of the protein from its unfolded state to its folded (tertiary) state, and for displaying via the display unit (or printer) tertiary conformations of the protein in three dimensions.

Description

SYSTEM AND METHOD FOR DETERMINING
THREE-DIMENSIONAL STRUCTURES OF PROTEINS Background of the Invention
This invention relates to modeling systems generally, and particularly to computer-based
simulation systems used in determining three- dimensional structures (tertiary native
conformations) of globular protein molecules.
The value of determining structure or conformation of proteins is well known. For example, in 1961 a Nobel Prize was awarded to Max Perutz for his work in determining the structure of the
hemoglobin protein in blood. From this discovery, we now understand more about sickle cell hemoglobin and how drugs can be designed to treat patients with this disorder.
The prediction of antigenic determinants also is based on the prediction of protein tertiary structure. One such scientific work is reported, for example, by Hopp and Woods in "Prediction of protein antigenic determinants from amino acid sequences", Proceedings of the National Academy of Science USA 78, pp. 3824-3828 (1981), and in "A Computer Program for Predicting Protein Antigenic Determinants",
Molecular Immunology Vol. 20, No. 4, pp. 483-489 (1983).
The structure (native conformation) of the protein, particularly the conformation of the outer sites or sidechains (which are linked to the backbone and inner structures of the protein) often determines the capacity of the protein to interact with other proteins. One factor which directly influences conformation is protein folding. Deciphering the rules through which the building blocks (amino acid sequences) of the protein affect folding promises significant improvements in the design of proteins, many with a host of new catalytic functions useful, for example, in the chemical, food processing, pharmaceutical, and other industries.
As a tool, computer systems are sometimes used to combine and display protein structures. One such system, used to convert two polypeptide chains to a single polypeptide chain, is described for example in U.S. Patent No. 4,704,692, entitled
"Computer Based System and Method for Determining and Displaying Possible Chemical Structures for
Converting Double- or Multiple-Chain Polypeptides to Single-Chain Polypeptides", issued November 3, 1987 to inventor Robert C. Ladner. Computer systems have also been used to investigate protein structures and predict protein folding. A few of such uses have been reported in Protein Folding by N. Go et al., pp. 167-81, ed. by N. Jaenicke, Amsterdam, Holland
(1980); Biopolymers by S. Miyazawa et al., 21:1333- 63, (1982); and Journal of Molecular Biology, by M. Levitt, 104:59-107 (1976).
These systems often (a) cannot process a full sequence of amino acid residues of a protein or protein segment (i.e., cannot process or otherwise represent the interactions of all the residues of the protein or protein segment; this task often becomes intractable, the system generally becomes unduly burdened by the many degrees of freedom of the residues), (b) cannot complete the folding process (because of inability of the system to recognize false, local energy - minima conditions), (c) cannot represent tertiary conformations in three dimensions, (d) cannot represent interactions between sidechains, (e) do not display the pathway taken by a protein in folding, or (f) do not permit free (unconstrained) interactions between residues for more realistic simulation of real proteins.
What is needed and would be useful, therefore, is a computer-based system that would eliminate the above-mentioned deficiencies, and provide a faster way of determining protein
structures, thereby increasing the productivity of many scientists and encouraging the undertaking of many more needed investigations , including
investigation of structures of protein sequences obtained from mapping of the human genome. Summary of the Invention
Accordingly, an improved computer-based system is provided that is capable of processing a full sequence of amino acid residues of a protein (e.g., plastocyanin), representing free
(unconstrained) interactions between residues and between sidechains, tracking an entire folding operation (pathway) from the protein's unfolded
(denatured) state to its fully folded (native) state, and displaying tertiary conformations of the protein in three dimensions.
The system comprises an input means such as a keyboard for specifying (entering) selected amino acid sequences and other data such as temperature and fold preferences, a RAM (random access memory) for storing such data, a ROM (read-only memory) with a stored program, a CRT (cathode ray tube) display unit and/or printer, an optional auxiliary disc storage device for storage of relevant data bases, and a microprocessor for processing the entered data, for simulating, under control of the stored program, the folding of the protein from its unfolded state to its folded (tertiary) state, and for displaying via the display unit (or printer) tertiary conformations of the protein in three dimensions.
A novel lattice is employed for representing
(framing) the various conformations of the protein as it folds from an unfolded sequence of amino acid residues to a tertiary structure. The model
comprises a cubic arrangement of 24-nearest-neighbor lattice sites, with adjacent sites located a unit distance from each other, and adjacent α-carbons located a distance of √5 units from each other. The α-carbons represent a chain or backbone of the protein. Each α-carbon is shown to occupy a central cubic lattice side plus six adjacent cubic lattice sites defining a surface of interaction (e.g., an area or volume having a surface of finite size).
Each sidechain is represented as being embedded in the lattice and occupying a selected number (four) of lattice sites located relative to the central site, the number of sites occupied by the sidechain being proportional to the number of sites defining the surface of interaction.
In response to specification of temperature and the amino acid sequence of the protein, the system determines the tertiary conformation of the protein using Monte Carlo dynamics with an asymmetric Metropolis sampling criterion. The system, (a) generates a three-dimensional representation of an unfolded conformation consisting of an α-carbon backbone and sidechains, (b) produces (in accordance with local conformational preferences of the
residues, and the lowest total energy of interactions between close sidechain pairs which satisfies the criterion) successive likely conformations at the temperature, according to the total energy of each conformation, (c) selects from the successive likely conformations the lowest total-free-energy tertiary conformation which satisfies said criterion, and (d) determines the coordinates of the selected tertiary conformation for display. In producing successive likely conformations, the system modifies each conformation by moving randomly selected residues (beads) and inter-residue bond vectors to different selected lattice sites by performing various type moves (single-bead jump-type moves, two-bead end-flip moves, chain-rotation type moves, and translation wave-type moves).
By the method employed by this system, simulation of protein folding and prediction of tertiary structure are not only performed with greater success and accomplished faster than by many existing methods, but the simulation itself becomes more manageable (tractable).
Brief Description of the Drawings
Fig. 1 is a diagramatic illustration of a globular protein in its native folded conformation.
Fig. 2 is a diagramatic illustration of a full sequence of amino acid residues of which the protein represented in Fig. 1 is comprised.
Fig. 3 is a block diagram of the system of the present invention.
Fig. 4 is a block diagram showing a perspective view of a cubic lattice model employed in the system of Fig. 3.
Fig. 5 is a block diagram showing a segment of a protein model comprising an α-carbon and
sidechain in a cubic lattice of the type shown in Fig. 4. Fig. 6 is a diagramatic illustration of an α-carbon backbone of a protein segment.
Fig. 7 is a diagramatic illustration of an α-carbon of the protein backbone segment shown in Fig. 6.
Figs. 8A-8C are diagramatic illustrations of selected simple arrangements of an α-carbon backbone and associated sidechains.
Fig. 9 is a diagramatic illustration of a jump-type move made by a randomly selected residue (bead) within the lattice of Fig. 4, effecting a change in conformation of the protein model.
Fig. 10 is a diagramatic illustration of a rotation-type move made by a pair of randomly
selected bond vectors within the lattice of Fig. 4, effecting a change in conformation of the protein model.
Fig. 11 is a diagramatic illustration of a translation-type (wave-type) move made by a U-shaped segment within the lattice of Fig. 4, effecting a change in conformation of the protein model.
Figs. 12A-12D are diagrammatic illustrations of the folding of a selected segment of a protein to a β-barrel conformation.
Figs. 13A-13C are graphs showing an average number of native contact pairs between sidechains versus time.
Figs. 14A and 14B are graphical
illustrations of a folding pathway defined by a sequence as it folds from an unfolded state to a folded (native) state.
Figs. 15A-15F are block diagrams (flow charts) showing a method employed by the system of Fig. 3 in simulating protein folding.
Fig. 16 is a block diagram showing an alternate embodiment of the processor of Fig. 15.
Detailed Description of the Invention
A simplified representation of a globular protein (e.g., plastocyanin) in its native (natural, folded) form is shown in Figure 1. A simplified representation of a full sequence of amino acid residues of which the protein is comprised is shown in Figure 2. The protein becomes unfolded
(denatured) when it is heated to an elevated
temperature, and it refolds to its native form when the temperature is lowered to a selected level.
Temperature may be specified in any unit (whether fahrenheit, centigrade, or Kelvin) and at any level or value (whether in or outside the transition range of the protein) as explained hereinafter. Generally, depending on the native biological conditions of the particular protein molecule being investigated, the temperatures that are specified are those in and bordering the transition region of the protein
(typically, in and above 35°C-45ºC).
Given a sequence of amino acid residues of a known or unknown protein, it would be useful , for example in the designing of a drug, to know to what protein form (structure, conformation) the sequence would fold if selected residues were changed
(modified).
To determine the probable tertiary structure (three-dimensional conformation) to which a given sequence or modified sequence would fold, a
simulation of the folding operation could be
performed on a computer system of the type shown in Fig. 3. The system uses a "210" lattice model, as shown in Fig. 4. The system is described in detail hereinafter. Prior to description of the system. however, to facilitate understanding of the
invention, other aspects of the invention (such as lattice arrangement, types of movement of segments (residues) of protein within the lattice,
orientational states of a segment, and inter-residue interaction) are described below.
Lattice Model, and Positioning of Protein
Conformation
Referring now to Fig. 5, a section or segment 11 of a full sequence (e.g., a sequence of a protein much like that depicted in Fig. 2) is shown in stick form (without associated residues or atomic structure). The section 11 includes an α-carbon segment 13 and a sidechain (β-carbon) segment 15 representative of each amino acid residue of the protein.
The protein segments may be viewed as embodied within a cubic reference framework or lattice model (Fig. 4), constructed from vectors of the type (1,0,0), (0,1,0), (0,0,1), the distance between any two adjacent points being unity. The α- carbon atoms 13 when linked as shown in Fig. 6 form the backbone 14 of the protein. As shown in Figs. 4 and 7, each α-carbon 13 may be viewed as occupying a central cubic site 17 plus six adjacent cubic sites 18-23, defining a finite surface of interaction.
Adjacent α-carbon centers may be viewed as linked by a 210-type lattice vector 25, as shown in Fig. 4.
The backbone 14 (Fig. 6) represents a structure of finite thickness about which a somewhat inflexible, hard core envelope of a chain of residues develop. The conformation of the backbone at the ith α-carbon is specified in terms of r2 Θ1, the square of the distance between adjacent α-carbons (i-1 and α-carbons, and θ represents a bond angle that one of the α-carbons make with respect to the other, as shown in Fig. 6. In model units, the distance between consecutive α-carbons equals √5 units.
Selected values of r2 Θ are 6, 8, 10, 12, 14, 16, and 18, expressed in model units, indicating various accessible bond angle states. These values represent internal orientational states corresponding to actual (known) physical conformations.
As shown in Figs. 5 and 8, each α-carbon has attached to it a sidechain 15, constructed for example in a helix conformation 27, or in a β-strand conformation 29. From the central vertex portion 31 of the α-carbon, the sidechain 15 is formed,
comprising four lattice vector points (1,1,0),
(1,1,0), (1,1,0), and (1,1,1) 33. Three points represent fcc-type (face center cubic) lattice vectors, i.e., vectors of the type (±1, ±1, 0). The fourth point represents a diamond lattice vector of the type (±1, ±1, ±1). This latter vector serves as the center of hydrophobic or hydrophilic interactions (explained hereinafter). The orientation of the sidechain depends on the backbone conformation, i.e., depends on r2 Θ. At least two of the three fee
vectors comprising the sidechain are shown in an L- conformation (i.e., with left-handed chirality). The diamond lattice-type vector is always shown in the L- conformation. (For a more detailed description of lattice rules which should be followed when
constructing conformations, refer to Appendix A.) For the calculations described hereinafter, either the residues are glycine, in which case there is no sidechain, or the residues have a sidechain of uniform size. Interactions Between Residues
The following is a description of how the 210 lattice model (Fig. 4) is used to denote
interactions between elements (residues) of a given backbone conformation, and to denote the energy of such interactions. To specify the conformation of the backbone of a chain, composed of n residues on an α-carbon representation, n-2 bond angles (Θ) and n-3 torsional angles (Φ) must be specified. To determine the conformation of the first and last residues, a virtual residue is appended to each end of the chain. These virtual residues are represented as inert.
They occupy space but are devoid of sidechains.
Thus, with the addition of the two fictitious
(virtual) residues, n bond angles and n-1 torsional angles can now be used to specify the backbone conformation of the chain. (For convenience in denoting segments, the residues of the chain may be numbered from 1 to n.)
With respect to expressing (representing) a preference for a given conformation, any intrinsic preference of the protein model for a particular conformation may be represented by the individual preferences of the respective residues for the various bond angle states. In the description that follows, the term local conformational preferences shall mean the relative preferences which each local group of residues (i.e., a selected residue plus two flanking (adjacent) residues on either side of the selected residue) exhibit for the different
conformational states. As indicated previously, these states are represented by the value r2 Θ of the lattice model. Since for every residue i there are seven distinct values of r2 Θ, corresponding to 18 distinct local conformational states, the local energetic preference (denoted as parameter eΘ(r2 Θ1)) for each of the states (r2 Θ values) must be
specified. If it is desired to reduce the number of such adjustable parameters (that is, parameters requiring specification), the conformations (except conformations where εΘ (r2 Θ1)=0) may be made
isoenergetic and assigned the value εΘ > 0.
In addition to bond angle, the torsional (dihedral angle) potential of a residue (i.e., its tendency to undergo an angle of rotation or twist) must be specified. The torsional potential
associated with the ith residue is specified in terms of residues (i-1) through (i+2). Actually, a
dihedral angle potential must be specified in the model for all residues from residue 2 (corresponding to real residue 1) to residue n-2 (corresponding to real residue n-1). Because the model is confined to a lattice, it is convenient to describe the torsional potential associated with the ith residue in terms. of: (a) r2 Θ, r2 Θi+1, the bond angle states i and i+1, (b) r2Φ, the square of the distance between α-carbons i-1 and i+2, and (c) the handedness of the dihedral angle, X = +1 for right-handed chirality (R) or
X = -1 for left-handed chirality (L). For example, a planar state having Φ = 0 is specified by (16, 16,
37, -1). That is, the square of the distance between α-carbons i-1 and i+1 is 16, between α-carbons i and i+2 is 16, and between α-carbons i-1 and i+2 is 37. (For definiteness in the calculation, a dihedral angle of 0 is taken to be left-handed. This
conformation could also be specified by the vectors b1, bi+1, bi+2 as shown in Figure 8) . As many as 324 rotational states exist for each internal bond.
These rotational states are all assigned a relative energy value εΦ (r2 Θ1, r2 Θi+1, r2 ΦiX). Generally, all such rotational states are statistically weighted. Where the majority of the conformations are taken to be isoenergetic (with a small bias toward a small subset of conformations that are native), the short and intermediate range energetic preferences may be represented as e(r2 Θ,1, r2 Θi+1, r2 Φ1) .
The seven lattice sites that define the α-carbon (Fig. 7) and the four lattice sites (Fig. 5) that define the surface 24 of the sidechain interact repulsively (i.e., with strong, hard core repulsion) with all the other α-carbons and their respective sidechains. In other words, no more than one
sidechain or α-carbon can simultaneously occupy a given lattice site. (This is generally referred to as the excluded volume criterion.) Such a model may be viewed as having a backbone of finite thickness. In addition to the hard core repulsion, described above, there is a weak (soft core) repulsive
interaction between non-bonded α-carbon backbone centers located within a distance of √5 model units of each other. If rk1 represents the distance between the kth and 1th such centers, then the soft core repulsive energy erep between the pair may be expressed as:
∞ ; r2 k1 = 0 , 1, 2 , 4
εrep ε rep ; r2 k1 = 3
3 εrep ; r2 k1 = 5
Figure imgf000014_0001
0 ; otherwise
Figure imgf000014_0002
rep typically takes on the value of 6 in the
calculations that follow.)
Following description of the lattice, bond angle, bond angle states, and torsional angles, a description of tertiary interactions between the residues in a three-dimensional setting is presented next. To represent the effect of hydrogen bonding and dipolar-type interactions, a cooperative
interaction energy parameter Ec is introduced which allows for secondary structure stabilization when any part of the α-carbon hard core envelope of the lth residue is at a distance of 3 units from the α-carbon center of the krh residue.
If a pseudodot product between two vectors is defined as:
dot(bk, bl) = 1 ; if bk = ±bl
0 ; otherwise
Figure imgf000015_0003
Figure imgf000015_0001
then, the cooperative interaction energy eckl may be given by:
εckl = εc dot (bk,bl) + dot (bk+1,bl) +
dot (bk,bl+1) + dot (bk+1,bl+1
Figure imgf000015_0004
Figure imgf000015_0002
where, ec represents an energetic preference
parameter which is applied, uniformly, to all residue pairs independent of their conformation.
Sidechain Interactions
In the preceding section, the subject of interactions relating to backbone conformation was discussed. In the following section, the subject of interactions between sidechains is discussed.
Sidechain interactions are treated as being
independent of backbone conformation. Interactions between any pair of side chains is allowed if the interacting sidechain sites lie at a distance of √2 from each other. Sidechains may be hydrophobic, hydrophilic or inert. Pairs of hydrophobic
sidechains interact with an attractive potential of mean force; hydrophobic/hydrophilic pairs interact with a repulsive potential of mean force; and
hydrophilic pairs interact weakly (i.e., weakly attractive or repulsive with no change in quality to behavior).
With respect to the calculation of sidechain-sidechain interaction energy, the following rules (scales) were employed in one calculation:
glycines were assumed to lack sidechains and were assigned a hydrophobicity index h(i) = 0.
Hydrophobic residues were assigned a negative
hydrophobicity index h(i) < 0, and hydrophilic residues were assigned a positive hydrophobicity index h(i) > 0. For all sidechains that were greater than two residues apart down the chain, the
sidechain-sidechain interaction matrix am(i,j), representing the interaction energy between the ith and the jth pair of sidechains, was given in the form:
am(i,j) = -h(i) . h(j) . ε
where ε = εphobe-phobe > 0, if h(i) and h(j) were both negative (that is, if both were hydrophobic).
ε = εphobe-phil > 0 if one residue is hydrophobic and the other hydrophilic, and ε = -εphil-Phii, (with ephil-phil > 0), if both h(i) and h(j) are positive, that is, if both sidechains are hydrophilic. The subscripts phobe-phobe mentioned above represent interaction between two hydrophobic residues, phobe-phil
represents interaction between a hydrophobic residue and a hydrophilic residue, and phil-phil represents interaction between two hydrophilic residues. (As indicated above and in the program listing shown in Appendix D, tertiary interactions between any spatially close pair of sidechains are implemented using a modified Miyazawa-Jernigan (MJ)
hydrophobicity scale. Based on the frequency of occurrence of contacts between sidechain pairs in protein crystal structures, the MJ scale is used to determine effective inter-residue contact energies.)
As used below, short-range interactions shall mean interactions between adjacent residues in the chain and does not include effects of their neighbors (i.e., neighboring residues in the chain). Medium-range interactions shall mean interactions between first, second, and third nearest-neighbor residue groups in the chain. Long-range interactions shall mean interactions between residues (not α- carbons) which are positioned greater than three nearest neighbors apart down the chain but which are spatially close (i.e., within 3·A of each other).
Both native and non-native interactions are allowed between non-bonded pairs of residues that are specially close enough to interact. No criterion or constraint is imposed to drive the simulation towards any predetermined native conformation. Based on long or short interactions, a native conformation may comprise one of a number of isoenergetic states. It is the juxtaposition of short-medium-and-long-range interactions, together with other factors described herein that produce the final result, namely a stable, folded conformation.
As described hereinafter, all of the energetic parameters, εΘ, εΦ , εrep, εphobe-phobe , εphobe-phil, εphil-phil are uniformly scaled by a reduced temperature factor, T.
With respect to specifying other
characteristics of the primary sequence of amino acid residues, the following conventions are used. In a simplified model, the term Bi(k) is used to represent the ith stretch of k residues in the sequence. The k residues are represented as having identical εΘ and εΦ values and a marginal (short and intermediate range) preference for β-state conformation. range) preference for β-state conformation.
Consistent with β-sheet formation, Bi(k) also
represents an alternating odd/even pattern of
hydrophilic and hydrophobic residues.
Where a sequence of k residues are locally indifferent to whether they are in an α-helix or in a β-sheet, the term ABi(k) may be used to denote the ith-stretch in the amino acid sequence containing k residues in an alternating hydrophobic/hydrophilic pattern, such that εΦ(12) = εΘ(16) for all k
residues. Where a sequence of k residues has an alternating hydrophobic/hydrophilic pattern and locally prefers α-helical state conformation, such that εΘ(12) = 0 and εΘ(16) > 0, this is denoted by the shorthand notation Ai(k).
Putative band regions are denoted by bi(j), and consist of j residues located at the interface between putative β-stretches i and i+1. Chain Dynamics. Modification of Conformations
The dynamics of the chain are simulated by a (pseudo) random sequence of conformational
rearrangements (moves) (i) through (iv) described below. In all such moves, the bead (amino acid residue) on which the move is performed is chosen at random.
(i) Examples of single bead jumps (also referred to as flips, spike or kink moves) are shown in Figure 9. Also, a representative set of single- bead modifications is listed in Table I. These moves are constructed by conserving the vector bi-1 + bi (i.e., not changing the magnitude nor direction of imaginary vector (bi-1+bi)). The moves are made in a manner which maintains the bond angle associated with the ith residue but changes the bond angles of the i- five distinct possible outcomes (associated with r2 Θi = 12), each of the moves is coded with five outcomes, some of which are degenerate (i.e., their
conformations, each has the same energy). A clock is used to sequentially choose the particular outcome.
New conformations of jumps (kinks) are also generated at random. After a move has been selected, it is only accepted if the adjacent bond angles are allowed (i.e., r2 Θi+1, and r2 Θ-1 must lie in the range 6-18). If the move satisfies these local geometric
constraints, then the sites (seven backbone sites plus four sidechain sites) into which the bead will jump are checked to insure that they are unoccupied. Otherwise, the move is rejected (not made).
A list of sample single-bead, modified vector values is presented in Table I.
Figure imgf000020_0001
(ii) With respect to two-bead end flips (in which the two end bonds are transformed to a new set of vectors), the set of two vectors is chosen at random from the twenty-four possible orientations of the lattice vectors. In this case, the two new end bond vectors must satisfy the allowed local bond angle criteria. If they do not, the move is be rejected. Further, the two end residues in their new conformation must not violate excluded volume
constraints.
The above-mentioned moves (i) and (ii) satisfy the correct dynamics for the athermal random coil state in the absence of hydrodynamic
interactions.
(iii) Turning now to chain rotations, an example of this type of move is shown in Fig. 10. The minimum size unit selected for rotation consists of three beads, and the maximum size unit is 2+wave. (The value of the parameter "wave" is generally 4, it is chosen so that the size of the unit undergoing the rotation is the size of a mean element of secondary structure.) The particular size of the unit (δ+1) undergoing the attempted rotation is chosen by the value of an external clock parameter, and
sequentially varies from the minimum to maximum size. A particular bead I, at one end of the rotating unit, is chosen at random. For beads less than n/2, the unit undergoing the rotation is 1-5. For beads greater than n/2, the unit undergoing rotation is I+δ . If ib represents the first residue at the beginning of the rotating unit, and iend represents the residue at the end of the rotating unit, then if the bond angle state between the vectors bib and biend-1 is a 14-18 state, the rotation is attempted. (The range of values of r2 Θi is chosen so that the rotation is physically possible.) The rotation is implemented by interchanging the two bond vectors (e.g., vectors 35, 37 joining randomly selected bead 39 shown in Fig. 10) . The initial set of bond vectors joining residues ib to iend is (bib, bib+1, ... biend-2, biend-1). The final set of bond vectors is (biend-1 , bib+1, ...
biend-2, bib). The new conformation is checked to insure that it can join the remainder of the chain without violating bond angle restrictions and
excluded volume restrictions.
(iv) Internal wave-like motions such as are shown in Figure 11 are also performed. These moves serve to propagate defects down the subchain by deleting a defect at one end of the subchain and creating the defect at the other end of the subchain. The defect propagation procedure is performed by the system as follows. I denotes a bead chosen at random. The system first determines if a U-shaped defect exists (i.e., does bl = -bI+3?). If not, attempt at wave-like motion is abandoned. If a defect exists, the system then picks a place where the defect should be inserted. The chosen point is at JJ = 1+2 ± (5+5), with δ varying between 0 and wave-1. About half of the time, the defect insertion point lies to the left of I, and the other half the time it lies to the right of it. As mentioned before, typically, the value 4 is selected for wave. As shown in Fig. 11, the bond vectors bz 41 and bI+3 43 are then sliced out of the chain, thereby deleting two beads 43 and 47, provided that bI+1 49 and bI+251, which will form the new bond angle state or vertex l+1 53, satisfy the local geometric constraints of the chain. Next, two bonds 49, 51 are inserted into the chain. If the original vectors associated with beads JJ-1 and JJ are bJJ-1 and bJJ, the new set of four vectors are (v, bJJ-1, bJJ, -v), where the vector v is chosen at random. Note that the intervening bond factors between 1+4 and JJ-2 are left unchanged. A new conformation is then generated by renumbering the residues so that their identity is conserved. As before, both excluded volume and local bond angle criteria must be satisfied in order for the
conformation to be accepted.
After each of the elemental moves (i) - (iv), described above, the energy of the new
conformation, Enew, is calculated and compared to the energy of the old conformation Eold. Enew represents the sum of the individual energies, and is expressed as:
Enew = EΘ + ΣΦ + Ec + Es where EΘ = Σ εΘ
N
EΦ = ∑tor
E=
Figure imgf000023_0001
ε ckl
and Es (also referred to as Eside)
1/2i,j Σ am(i,j)
(The term Eold represents the initial total value, then successive previous total values with which Enew is compared.)
With respect to free energy (as distinct from total energy), the system attempts to find a free energy minimum, given as:
Free energy = Total energy - TS
where T represents temperature, and S represents entropy.
If Enew is less than Eold, then the
conformation is accepted. Otherwise, a Metropolis sampling criterion is applied (as described for example in Monte Carlo Methods in Statistical Physics 2nd ed. by K. Binder, Springer-Verlag, Berlin, New York, 1986). In which event, a random number R uniformly distributed between 0 and 1 is generated. If R is less than the probability P, where
P-= EXP
Figure imgf000024_0001
then the conformation is accepted; otherwise, it is rejected. Here, kB represents Boltzmann's constant and T represents the absolute temperature of the protein. Thus, a standard asymmetric Metropolis sampling scheme (criterion) is employed. As
described below, the sampling scheme or criterion is applied in conjunction with a dynamic Monte Carlo technique (as described for example in Monte Carlo Methods in Statistical Physics by K. Binder, cited above).
A single Monte Carlo dynamics time step consists of N attempts at move type (i) (jump-type move) mentioned above, two attempts at move type (ii) where each of the chain ends are subjected to move type (ii), one attempt at move type (iii), and one attempt at move type (iv). In the simulation, the protein model is started out in a randomly generated high temperature (T) state. It is then cooled down, equilibrated, cooled further, until collapse to a folded conformation occurs. For each simulation run in the transition region between unfolded and folded states, at least 1.25 x 106 Monte Carlo time steps are sampled. The set of elemental moves employed in the simulation satisfies the well known stochastic kinetics master equation describing the dynamics of the system. (Refer, for example, to Appendix B.) In the limit (after a large number of steps), an equilibrium distribution of states is generated.
With respect to the thermodynamics of folding, a detailed explanation is presented below. By restricting the protein to the lattice, it may be treated as a rotational isometric state model of the protein. First, the transition from the denatured to the native state is treated in the context of a two- state model. The free energies of the denatured state AD, and the native state AN are calculated as follows: AD is calculated by neglecting all tertiary interactions in the denatured state (although
pentane-like effects are included). In the
calculation of AD, long range excluded volume effects are neglected. For the calculation of AN, small local fluctuations about the native state are
neglected, and AN is approximated by the energy of the native state EN.
In the context of a two-state model for folding, the fraction of molecules in the native state, fN, is given by
(2) exp { - (EN - AD) }/
fN = [ 1+exp { - (EN - AD) } ] . where AD is given as :
AD = KBTln(ZD) (3)
Figure imgf000025_0002
(The term ZD may be expressed as ZD = Jπ VD,iJ, as defined in Appendix C.)
In the context of the two-state model, the mean square radius of gyration <S2>, defined as
(4)
<S2> =
Figure imgf000025_0001
with |ri-rcm| representing the distance of the ith bead from the center mass rcm, may be expressed as
(5) <S2> = fN<SN 2>+(1-fN)<SD 2>
where <SN 2> and <SD 2> are the mean square radii of gyration in the native and denatured state,
respectively.
The above explanation may be used to select appropriate temperature values for use in the
simulation. Substantial computer time can be saved by avoiding high temperatures associated with the denatured state. Also, temperatures that are too low can drastically quench the system. Conformational Transitions
As shown below, conformational transitions can be approximated by a two-state model, or can be determined directly from folding trajectories.
In the following paragraphs, the
requirements for folding to a unique conformation
(e.g., a four-member β-barrel state) are described. Figs. 12A-D show a segment with backbone
α-carbons 101 and interacting sidechain sites 103. Also shown in the top view are hydrophobic core 105 with the interdigitating sidechains 107, 109. Also shown are the corresponding conformations 111, 113 with α-carbons alone.
The first of the three native turns is shown to involve the eight through eleventh residues with backbone bond angle conformations 18, 8, 18, and 10, respectively. The central turn is shown to involve a crossover connection between the two anti-parallel β-strands, and involves the eighteenth through twentieth residues with backbone bond angle
conformations 14, 10, and 18. The remaining outer turn is shown to involve residues the twenty-sixth through twenty-ninth residues in bond angle
conformations 12, 14, 14 and 8. The remainder of the bond angle states are all 16-type states. Thus, a planar β-sheet is assumed. Within an anti-parallel β-hairpin, the α-carbons are shifted with respect to each other by one lattice unit. This allows for the interdigitation of the side chains mentioned above. In the fully native conformation, there are twenty contacts between neighboring sidechains (i.e., twenty pairs of sidechain interacting sites that are a distance of √2 from each other).
In the conformation considered here, the pattern of hydrophobic and hydrophilic residues is the same. The model chain consists of N=37 residues. In each of the strands, all of the even (odd)
residues are hydrophobic (hydrophilic). The first strand consists of the first through eighth residue. The ninth through eleventh turn residues are all hydrophilic. The second strand runs from the twelfth to the eighteenth residue, with all the even (odd) residues hydrophobic (hydrophilic). The nineteenth and twentieth turn residues are, respectively, hydrophilic and hydrophobic. The third strand runs from the twenty-first to the twenty-sixth residue. The twenty-seventh through twenty-ninth are turn residues, all of which are hydrophilic. The fourth strand runs from the thirtieth to the thirty-seventh residue. The first and last residues (one and thirty-seven) are virtual residues (i.e., they are devoid of sidechains, but they do occupy excluded volume). They may be regarded as capping the two ends, and are included so that the bond angle state for the real residues (the second and thirty-sixth residue) may be defined. Turning now to the subject of equilibrium folding, the requirements for equilibrium folding of a region of the chain to its unique, native structure (e.g., the four-member β-barrel structure) is
described. The interplay of an intrinsic native turn propensity and a short- and medium-range preference for β-sheet formation is described.
In one simulation operation, for the sequence B1(7)b1(4) B2(6)b2(3) B3(5)b3 (4) B4 (8) the
parameter εΘ(16) was found equal to zero for the Bi state and -0.25/T for all the other states. For the Bi state the parameter εΦ (16,16,37) = .6/T, and is zero for all other states. For the turns bi: εΘ=0 for the native conformation, and εΘ=.25/T for all other conformations. Similarly,
εΦ = .6(1.75)/T = -1.05/T. εphil-phil = .25/T,
εphil-phob = 1-/T, and ephob-phob = -.75/T. The
cooperativity parameter εc = -.15/T. In the native conformation, the total short range free energy
EΘ = 0, the total torsional energy Etor = -25.8/T, the total sidechain interaction free energy arising from hydrophobic interactions Eside = -14.25/T, and the cooperative interaction free energy Ec = -11.25/T. Thus, the total energy of the native state
EN = -51.3/T. A summary of the conformational properties of this sequence, as well as all the other types of primary sequences, is presented in Table II. The primary sequence is designated by a shorthand notation εα > εβ, 1. , 1.75. This notation indicates that, based on bond angle preferences,
β-conformations are locally preferred for the Bi portions of the primary sequence, and that the torsional angle preference εΦ (for native-like conformations in the Bi region) is locally favored by a ratio of 1:1.75 over that in the turn region.
Figure imgf000029_0001
In the absence of long-range interactions, there is a negligible intrinsic preference for the native conformation. To address this point,
reference is made to equations 2-5. Using equations 2-5, the transition midpoint (including tertiary interactions) is predicted to be near T = 0.576.
Employing equation (3), it is found that at this temperature AD = -88.44, and that EN (without
tertiary interactions) equals -44.79. The fraction of molecules in the native conformation which would be present if all tertiary interactions are turned off (that is, the equilibrium population based on short and medium range interactions embodied in EΘ and Etor alone) is given by
(6) f°H = exp(-Etor) / exp(-AD).
Using equation 6, f°N is found to have the value:
N = 1.11 x 10-19.
Thus, there appears to be a negligible preference for the native state in the absence of long range interactions, suggesting that finding of the native conformation is by no means guaranteed by the above choice of short and medium range
interaction parameters. Rather, this chain will thrash about until it finds the native state.
The next subject described below is the nature of the conformational transition itself. In Figs. 13A-C, the average number of native contact pairs between sidechains (Nc) versus time, is plotted for a chain under denaturing conditions at T = 0.6, in the thermal transition region at T = 0.58, and under strongly renaturing conditions at T = 0.545. The times indicated in the figure are in units of 500 Monte Carlo steps, and the fully native molecule contains twenty contact pairs. Under denaturing conditions, Nc fluctuates around zero, characteristic of a relatively short, unfolded chain. In the transition region, the system starts out unfolded, and then around t/5000 = 118, it undergoes a rapid transition in about 6,500 Monte Carlo time steps to the fully native molecule. For the remainder of the time, it stays in the native state. Other
conformational properties (not shown), such as the energy, the instantaneous value of the radius of gyration, the total number of contact pairs Nc,tot also undergo sharp changes in value that is a
characteristic of an all-or none transition (i.e., a transition where the intermediates between the denatured and fully folded states are marginally populated). On further cooling to T = 0.545, the chain becomes fully native, with minor oscillations in Nc arising from the fluctuations of the ends residues of the chain.
Decreasing the turn propensity for native- like states decreases the stability of the native conformation and decreases the transition
temperature. In the transition region, however, not only are native in-register four member β-barrels observed, but so are out-of-register conformations in which one of the exterior strands is two residues out-of-register, shifting the native contact between sidechains two and thirty-six to a non-native contact of residues two through thirty-four in one case, and to a non-native sidechain contact of residues four through thirty-six in the other case. In the former case, the outer turn began at residue twenty-five instead of residue twenty-six, thus, pushing the outer strand beyond the end of the barrel; and in the latter case, the turn began at residue twenty-eight and involved five residues, producing a bulge. Out of a total of six conformational transitions to a folded state, four folded directly to the native conformation, and two produced the out-of-register states described above.
The out-of-register state associated with residues four through thirty six occurred at
relatively high temperature and folded in about
65,000 Monte Carlo steps. It remained folded for 315,000 time units before unfolding in about 165,000 time units.
Many out-of-register conformations have the same number of contacts between hydrophobic
sidechains as in the native state; they differ in the cooperative free energy between the strands and in the local conformational preferences. Dropping the turn preference, increases the population of these out-of-register states. It is seen, therefore, that in the absence of some intrinsic preference for secondary structure, many in-register and out-of- register conformations can be generated, and it is the marginal intrinsic turn propensities which act to select from among them one conformation as the unique folded form. Based on tertiary interactions between hydrophobic sidechains alone, many otherwise
degenerate conformations can be generated. Here, a marginal preference for β-strand secondary structure plus the presence of turn neutral regions are
required for folding to occur to a unique native state. Here, turn propensities of 1% or lower (see below, and Table II) are sufficient to yield folding to the native barrel of Figure 12.
It has been found that as the local propensity for β-states decreases, there is an increasing population of non-native turns and out-of- register states, even though the native turn
population increases as T decreases. To fold the system to the global free energy minimum that
corresponds to the native conformation, therefore, the free energy of out-of-register conformations should be increased relative to in-register
conformations. As the local preference for β-states decreases, it becomes easier to form non-native turns; this appears to be the origin of the out-of- register states. Therefore, since the number of contacts between sidechains is approximately the same for the in-register and out-of-register cases, what determines the native conformation is the number of cooperative-type interactions, εc, plus the
differences in local conformational preferences.
Where the local preference difference is decreased, a number of out-of-register states that are in deep local minima is observed.
For a primary sequence of the type εα = εβ ; 0, 1.5 (which is similar to the above cases, except that the torsional potential in the putative β-strand is disregarded), the β- and α-states are locally isoenergetic. The particular sequence of the ABi. stretches are induced by tertiary interactions. In all cases, the folded conformations turn out to be β-barrels. Thus, tertiary interactions taken with local turn propensities provide for selection of β-collapsed states. Where the transition temperature is reduced, the native turn populations become greater. For example, the calculated turn population of native turn one is about 10% at T = 0.40. Based on tertiary interactions alone, the unique native state is not achieved. This is most likely due to the degeneracy in sidechain contacts between the in- register and the two residue out-of-register
conformers. If native turn propensity is
sufficiently augmented, it appears that tertiary interactions plus intrinsic turn propensities are sufficient to yield the unique native state.
Further, if the short-range interactions favoring β- strand formation are decreased, turn formation at a non-native location becomes more likely and, thus, the intrinsic turn propensity must be augmented (see Table II) to insure the recovery of a unique
conformational state.
Next examined were sequences of the type A1(7)b1(4)A(6)b2(3)A3(5)b3(4)A4(8); that is, molecules having the sequence εα < εβ ; 0 , 1.6; 0.05, where the nature of the conformational transition for model proteins whose β-strands in the denatured state locally favor α-helix conformation, but whose amino acid pattern still consists of alternating
hydrophobic and hydrophilic residues. For ki, it has been found that εΘ(12) = 0, εΘ(16) = 0.05/T, and for all the others εΘ = .25/T. Furthermore, it was found that εΦ = 0 for all the residues in Ai. These
systems (where the local preference is for an α-helix conformation but the global free energy minimum conformation is that of a β-strand) spend substantial time trapped in relatively deep local minima. As the local preference for helical conformations is
increased in the putative β-strand forming regions, while the unique four-member β-barrel is sometimes obtained, the chain generally thrashes about for over many millions of time steps (e.g., over 50 million) without finding a unique folded form.
An important indication from these simulation results is that a marginal local turn preference plus tertiary interactions are sufficient to produce unique native conformations, even in the extreme situation where the local conformational preference is for helices rather than β-sheet. If the native conformation is in thermodynamic
equilibrium, then it is deemed to be at the lowest free energy state (conformation), independent of how the free energy is divided. That is, while it is conceptually convenient to divide the free energy into short-, medium- and long-range interaction contributions, it is the sum of these contributions, i.e., the total free energy, that determines the equilibrium conformation. The approach taken by the simulations show that the local minima problem can be surmounted to recover the lowest free energy
structure, which overrides local considerations if there is a marginal turn propensity for native-like turns. Thus, turns appear to play an extremely important role in determining the ability to recover a unique native conformation. Folding Pathway (Trajectory)
Turning now to a discussion of the folding pathway, it is seen that the sequence defines an observable pathway (trajectory) as it folds and makes the transition from its denatured (unfolded)
conformation state to its native (folded)
conformation state. A trajectory of a sample having the primary sequence εα > εβ, 1., 1.75, is shown in Figs. 14A-B. The conformations at different times, are shown at different orientations that aid in the visualization of the folding process. At t = 585,800 Monte Carlo time units, folding is seen to initiate from the central turn 115 between the β-hairpin composed of strands two 117 and three 119. (Folding is not unidirectional. β-strands may dissolve, as well as form, during the course of assembly.) If the conformation at t = 585,900 is compared with that at t = 586,000, it will be seen that a slight
dissolution of the β-hairpin 121 has occurred. By t = 586,300, the first β-hairpin 121 is almost fully assembled. However, by t = 586,550, the majority of one of the two strands in the β-hairpin dissolves and, then, reforms at t = 586,600. Then, there is a pause as the random coil tail 123 thrashes about, until the next native-like turn 125 forms.
By t - 587,700, three of the four β-strands 117, 119, 127 are essentially in place. Thus, assembly to the three-member β-barrel intermediates takes 1,900 time steps from the beginning of folding. Throughout this process, the excluded volume of the chain hinders assembly. Most of the configurations of the
denatured tail are nonproductive; the tail thrashes about until t = 591,800 when it works its way into a position (s) that permits native state assembly.
After which, the assembly becomes more rapid and, by t = 592,250, the fully folded molecule forms. Thus, the three-member β-barrel is the long-lived
intermediate, living for 4,550 time steps or 71% of the total elapsed time from the start of folding.
The mechanism of assembly is best described as punctuated, on-site construction.
With respect to unfolding of a tertiary structure, in all instances unfolding is the reverse of folding. Typically, unfolding starts with either one of the external strands becoming denatured or an internal stand closest to the denatured tail becoming unfolded. Computer System and Method
Referring now to Figs. 3 and 15, a system and method are shown and described for simulating protein folding and determining three-dimensional (tertiary) structures of proteins.
The system comprises an input means 57 such as a keyboard for specifying (entering) selected amino acid sequences and other data such as
temperature and fold preferences, a RAM (random access memory) 59 for storing such data, a ROM (read- only memory) 61 with a stored program, a CRT (cathode ray tube) display unit 63 and/or printer 65, an optional auxiliary disk storage device 67 for storage of relevant data bases, and a microprocessor 69 for performing, under control of the stored program, the steps of processing the entered data, simulating the folding of the protein from its unfolded state to its folded (tertiary) state, and displaying via the display unit (or printer) tertiary conformations of the protein in three dimensions.
A user enters the amino acid sequence data file from the auxiliary storage unit). In response to entry of the sequence data, the system inputs (specifies) the data for processing, stores the data in memory then processes it as shown in Figs. 15A-F. Sample data of the type which may be input to the system is shown in Appendix E. In processing the data, the system generates a tertiary interaction matrix as shown in Appendix E and produces, in addition to a display of the protein's tertiary structure, a sample output as shown in Appendix E for tracking the simulation. As indicated above, the system operates under the control of a stored
program. A listing of the program is shown in
Appendix D.
Turning now to Figs. 15A-F, in response to the specified data the system generates a random conformation of backbone and sidechain elements
(residues). It does this by generating a set of random bond angles, then generating the coordinates of the backbone and sidechains as a starting chain in a 210 lattice (Fig. 4). The system then checks to determine if the excluded volume criterion is met , after which, it constructs an interaction table, a sample of which is shown in Appendix E. It proceeds to construct the interaction table by first
establishing respective bond angle preferences, then establishing dihedral (rotational) angle preferences followed by establishing side-chain interaction criteria. The system then stores the temperature, bond angle, lattice coordinates, preferences, and interaction data in a table or matrix like that shown in Appendix E. Thereafter, the system reads the data from the table and constructs, by means of Monte Carlo simulation, a random conformation; following which, the system calculates the total energy of the conformation represented as (Eold). Thereafter, the system selects (at random by Monte Carlo simulation) a pair of bond vectors for rotation. It then checks if the rotation would violate the excluded volume criterion. If it would, the rotation is not
attempted, and the system proceeds to the next step. If it would not violate the excluded volume
criterion, another check is made to determine if the bond angles subtended by the bond vectors are between 14 and 18; if they are, it attempts the rotation.
Otherwise, it does not attempt the rotation and proceeds to the next step. In performing rotation, the system modifies the conformation by interchanging a randomly selected pair of bond vectors. In other words, it proceeds to change the rotation angle Φ . Thereafter, the system proceeds to determine the coordinates of lowest energy conformation which satisfy the Metropolis criterion. It does this by first calculating the total energy (Enew) of the new modified conformation then comparing the total energy Enew with the total energy of the old conformation Eold. If Eold is greater than Enew, then the
coordinates of the old conformation are replaced with the coordinates of the new conformation. The system then proceeds to the next step (step B) which is be described below. If Eold is not greater than Enew then, in compliance with the Metropolis criterion, a random number R is generated and the probability P = e
Figure imgf000038_0001
is calculated. That probability is compared with the random number R. If R is less than P, the
coordinates of the old conformation is replaced with the coordinates of the new conformation and the system proceeds to the next step (step B). If, however, R is not less than P, the system directly proceeds to the next step (Step B). At the next step, the system proceeds to choose a bead at random to move within the lattice. Before moving the bead, the system tests if the move (which is a jump-type move) would violate the excluded volume criterion. If no, it proceeds with the move. If yes, it does not proceed with the move, and proceeds instead to choose the next bead until all the beads in the chain have been checked for modification (movement). If the move would not violate the excluded volume criterion, the conformation is modified by moving the bead to a new lattice site. In other words, the bead would make a jump move which would change its
coordinates and associated bond angle Θ. After the move is made and the conformation is modified
thereby, the system calculates the total energy of the new conformation, that is, the total energy Enew in a similar manner as indicated earlier. Enew is then compared with Eold, the energy of the previous conformation before the move. If Eold is greater than Enew, then the coordinates of the old conformation are replaced with the coordinates of the new, and the next bead move is checked. If Eold is greater than Enew, then the Metropolis criterion is applied (and the random number R is generated, and the probability P is calculated in the same manner as indicated earlier, as shown in Fig. 15A-F), and the random number R is compared with the probability P. If R is less than P, the coordinates of the old conformation are replaced with the coordinates of the new and the next bead move is checked. If R is not less than P, the next bead move is checked and the loop is
repeated until all bead moves (i.e., the moves of all n beads) have been checked, at which time if all bead moves have been checked the system proceeds to the next step (step C). At this next step, the system proceeds to process the two end beads. It identifies the first end bead then checks if an end flip-type move would violate the excluded volume criterion. If no, it proceeds with the move. Otherwise, it aborts the move and proceeds to check the second end bead. In the event the move of the first end bead would not violate the excluded volume criterion, the system proceeds to modify the conformation by performing an end-flip move that changes the coordinates of the end bead. It then proceeds to determine the coordinates of the lowest energy conformation which satisfies the Metropolis criterion in the same manner as it did for the rotational and jump-type moves. After
determining the coordinates of the lowest energy conformation which satisfy the Metropolis criterion, the system checks if both end beads are processed. If the second end beads remain to be processed, the system identifies the second end bead and proceeds to check whether an end flip move of the second end bead would violate the excluded volume criterion. If it would violate the criterion and both end beads have been considered, it then proceeds to the next step (step D). If it does not violate the criterion, then the system proceeds to modify the conformation by performing an end-flip move of the second end bead changing the coordinates of the second end bead. It then proceeds to determine the coordinates of the lowest energy conformation which satisfy the
Metropolis criterion, after which it proceeds to the next step (step D). At this next step, the system selects a bond at random then searches for a U-shaped segment. It then checks, after finding the U-shaped segment, whether a move of a translation (wave motion) type move would violate the excluded volume criterion. If not, it proceeds with the
modification. If it does violate the excluded volume criterion, it aborts the move and proceeds to check if all the jump-type moves were made. If all were made, it proceeds to the next step (step E).
However, if the move would not violate the excluded volume criterion, the system proceeds to modify the conformation by performing the translation/wave- motion-type move changing the coordinates of the beads defining the U-shaped segment. The system then determines the coordinates of lowest energy
conformation which satisfy the Metropolis criterion, after which it proceeds to check if all the jump-type moves were made. If all the jump-type moves are not made (completed), it starts the loop again. One complete loop is represented by one rotational move, n jump-type moves, two end-flip moves, and one U- shaped move. After the loops have been completed and all moves made and/or aborted, the system checks to determine if the chain is still positioned near the center of the lattice. If it isn't, it moves the chain to the center of the lattice and adjusts its coordinates accordingly. Thereafter, the system displays a three-dimensional representation of the protein structure and repeats the process
(processing) for a predetermined number of times.
However, if upon checking whether the chain is still positioned near the center of the lattice, it finds that it remained at the position near the center of the lattice, the system immediately proceeds to displaying the three-dimensional representation of the protein, then repeats the process. After the three-dimensional coordinates of the tertiary protein structure are generated for display, a graphics program such as SYBYL (which is commercially
available from Tripost Associates Corporation of St. Louis, Missouri) is used by the system to display the tertiary structure corresponding to the coordinates. Sample display output is presented in Fig. 1. Sample printed output is presented in Appendix E.
An alternative embodiment of the system is presented in Fig. 16 comprising a keyboard 151 for entering data representing temperature and amino acid sequences, a RAM 153 for storing the entered data, and a unit 155 for generating a representation of a lattice, including unit 157 for positioning lattice sites, and unit 159 for positioning α-carbons
relative to the lattice sites. The system includes a unit 161 for combining the generated lattice
representation and the sequence of residues, a unit 163 for producing representations of protein
structures, and a unit 165 for comparing the protein structure representations to a predetermined
criterion and for selecting one of the protein structure representations for display.
APPENDIX A
The following is a description of various lattice model rules which must be followed for constructing conformations of various sidechains linked to various backbone configurations.
As shown in Figure 8, let the ith bond vector bi connect α-carbons (i-1) and (i) . Then, for a given backbone
conformation, r2 Θ may be defined as follows: r2 Θ = (bi+bi+1 ) 2
On the 210 lattice, the allowed values of r2 Θ are 6,8,10,12,14,16 and 18. Any other value of r2 Θ is rejected as not realistic or not representable on the 210 lattice. For a given backbone conformation, four sidechain vectors are constructed. The center of sidechain interaction is located at the site defined by a diamond lattice vector d 34, of the type (±1,±1,±1), which points from the center of the α-carbon to the point (±1,±1,±1). The other three vectors f1 , f2 and f336,38,40 are of the fcc type , whose sum is twice that of the diamond lattice vector d 34. The vector d has left-handed chirality (L). With respect to the backbone, vector d points toward the N-terminus of the sequence. The orientation angle is generally not less than 60°.
Pseudovector p is defined as the cross-product of bi+1 and bi: p = bi+1
Figure imgf000043_0001
bi and w is defined as: w = bi-bi+1 Appendix A (Cont'd)
The general procedure for the calculation d, f1, f2 and f3 is given as follows: If d = (dx,dy,dz), then f1 = (dx,dy,0)
f2 = (dx,0,dz)
f3 = (0,dy,dz).
In the following, use is made of the function isgn(x), where: isgn(x) = 1 x≥0
-1 x<0 .
If r2 Θ = 14, then dx = isgn(px)
dy = isgn(py)
dz = isgn(pz)
If r2 Θ = 8,12 or 16, then dx = isgn(px-2bx,i+1)
dy = isgn(py-2by,i+1)
dz = isgn(pz-2bz,i+1) where bi+1 = (bx,i+1, by ,i+1, bz ,i+1).
If r2 Θ = 6 or 10, then dx = isgn(px+wx)
dy = isgn(py+wy)
dz = isgn(pz+wz) Appendix A (Cont'd)
And, if r2 Θ = 18, and if px • py
Figure imgf000045_0001
0, then dx = isgn(px)
dy = isgn(py)
dz = isgn(pz)
Otherwise, dx = isgn(px+wx)
dy = isgn(py+wy)
dz = isgn(pz+wz).
APPENDIX B
A generalized master equation is shown below:
(1) = Σ ...Σ kfp({i}|{i'})q({ri,})-kbp({i'}|{i})q(ri,))
Figure imgf000046_0001
Figure imgf000046_0005
where
{i} represents a first set of vectors;
(i'} represents a second set of vectors;
p({i},t) represents the probability of finding a set of
vectors {i} at a time t;
kf represents rate of increase of the set (i) in size
(membership) due to move of bead from set {i'} to set (i);
kb represents rate of decrease of the set {i} in size
due to move of bead to set {i'} from set {i};
{ri} and {r'i) represent coordinates of the set of
bond vectors {i} and {i'};
q represents an excluded volume function
Figure imgf000046_0002
= 1; if are unoccupied
Figure imgf000046_0004
0; if { } are occupied
Figure imgf000046_0003
p({i}|{i'}) represents the probability of occupying set {i} upon moving from set {i'};
p({i')|{i}) represents the probability of occupying set {i'} upon moving from set {i}; and the relationship between kf and kb may be expressed as:
kf (U{i}-U{i'})/
kB = exp(- kBT Appendix B (Cont' d)
where U{i} represents the total energy of the protein
in the ith conformation;
U{i'} represents the total energy of the protein
in the i'th conformation;
kB represents Boltzmann's constant; and
T represents temperature (in degree Kelvin) of the protein.
A bead represents an amino acid residue comprising a full sidechain (i.e., four lattice sites) and backbone segment (i.e., seven lattice sites). A bead is shown, for example, in Figures 5 and 9. In terms of the above equation, the probability of finding a set of vectors {i,i+1} at a time t in a two-bond jump- type move of a bead from one coordinate position (ri) to another coordinate (ri, ) may be expressed as: = Σ kf P(i;i+1|i';i'+1;Θ)q(ri)-kb P(i';i'+1|
Figure imgf000047_0001
i',i'+1 i;i+1;Θ)q(ri,)
Θi,,i,+1= Θi,i+1
where,
i and i+1 represents a first pair of vectors;
i' and i'+1 represents a second pair of vectors; and
Θ represents the bond angle between vectors (bonds)
i and i+1 and between i' and i'+1.
In addition to the single-bead jump-type move described above, a conformation may be modified by rotational and/or translational motion of one or more beads, as shown for example in Figures 10 and 11. Appendix C
Calculation of the Denatured State Free Energy
In this appendix, an expression for the free energy of the unfolded state of a model protein confined to a 210 lattice is calculated. Two cases are examined. The first corresponds to the situation when the torsional potential εΦ equals zero, and the second corresponds to the more general case when εΦ is nonzero.
With respect to the lattice, each of the twenty-four
possible vectors connecting the lattice sites may be given a number one through twenty-four, as follows:
1=(2,1,0) 13= [0,-1,-2)
2=(2,0,1) 14= [0,-2,-1)
3=(2,-1,0) 15= [0,1,-2)
4=(2,0,-1) 16= (0,-2,1)
5=(1,2,0) 17= (-1,2,0)
6=(1,0,2) 18= (-1,0,2)
7-(1,-2,0) 19= (-1,-2,0) (C-1)
8=(1,0,-2) 20= (-1,0,-2)
9=(0,1,2) 21= (-2,1,0)
10=(0,2,1) 22= (-2,0,1)
11=(0,-1,2) 23= (-2,-1,0)
12=(0,2,-1) 24= (-2,0,-1).
To specify the conformation of the chain, given the location of the first bead, a sequence of N-1 numbers, ranging from 1 to 24, is specified with the first bond vector (vector 1) chosen arbitrarily as vector (2,1,0), the second vector must satisfy the constraint 6 ≤r2 Θ≤18. There are 18 such possibilities, and there are four states such that r2 Θ =6. The second vector can be
(0,-2,±1) and (-1,0, ±2). There are two such possibilities when r2 Θ =8, namely (0,-1, ±2). When r2 Θ =10, there are two
possibilities as well, (-1,2,0) and (1,-2,0). If r2 Θ =12, again, there are two possibilities with the allowed second vectors being (0,1,±2). Turning to the r2 Θ =14 case, there are a total of four possibilities (0,2,±1) and (1,0, ±2). If r2 Θ =16, there is one possibility, (2,-1,0). Finally, for r2 Θ =18, there are three possibilities (2,0,±1) and (1,2,0). In general, for a given vector number i, there are eighteen allowed vectors; subsequent allowed vectors vary depending on the particular vector that precedes them.
A pseudo inner product may be defined (by analogy to ortho- normal basis sets) as follows:
Figure imgf000049_0003
(C-2) if the two vectors i and j are allowed, and
Figure imgf000049_0002
(c-3) if the two vectors i and j are not allowed.
Denatured state partition function eΦ =0
In the absence of a torsional potential that serves to couple adjacent bond angle states (and which, therefore,
introduces cooperativity into the model), the internal partition function of the denatured state, Z0 D, may be obtained from
Figure imgf000049_0001
where J is a row vector of dimension 24, consisting of a 1 followed by twenty-three zeros, J is a column vector consisting of twenty-four ones, and UD,i is a 24 x 24 matrix associated with the ith residue, each row of which contains 18 non-zero elements and 6 zero elements. UD,i may be expressed as:
UDl(k,I) = (k,l)exp(-εΘ,l(k,l)/kΘT). (C-5)
As shown below, the configurational partition function can be written as the product of the internal bond angle partition functions associated with each bond angle state z θ,i:
Figure imgf000050_0005
(C-6)
The matrix product in equation C-4 is of the form: U0.2(1,k)U0.3(k,l)●●U0.N-2(U,r)U0.N-1(r,s) (C-7)
Figure imgf000050_0006
Given that the sum of all the elements in the columns is independent of the row index (i.e., each row has the same set of bond angle states that must be summed over), the sum of the products can be expressed as the product of the sums, as follows: (C-8)
Figure imgf000050_0004
which is identical to equation C-6 because z Θ,i is the same as (C-9)
Figure imgf000050_0007
Thus, the separability of the partition function is established. The free energy of the denatured state is simply εΦ30
Figure imgf000050_0003
(C-10)
Figure imgf000050_0002
To include the effect of non-zero εΦ into the calculation of the partition function, the chain is divided into statistical weight matrices associated with pairs of bonds. That is, the partition function is calculated as (C-11)
Figure imgf000050_0001
where J*576 is a row vector of dimensionality 576 whose first ter is unity and remaining terms are zero. J576 is a column vector o dimensionality 576, all of whose elements are unity. lu =N if N is even, and lu =N-1 if N is odd. UΦ i is a 576 by 576 matrix.
For convenience in setting up UΦ i, the torsional angles are labeled from 3 to N-1, rather than from 2 to N-2 as in the main text. For i=2, one merely has to account for the bond angle associated with the εecond residue. Choosing the first bond as vector 1, the only non-zero elements of UΦ 2 are
U2(l,J) = (l,J)exp(-εΘ.2(l.J)/kΘT). (C-12)
We next consider the case where 2<i<lu. Let the bond
vectors associated with residues i-3 , i-2, i-1 and i be labelled by j,k,l,m, respectively. The jth bond vector connects residues i-3 to i-2. The rows of UΦ i (row, column) are obtained from j and k by
row =(j-1)24+k (C-13) col = (l-1)24+m (C-14)
In defining the statistical weight matrix UΦ(j,k,l,i)
associated with the torsional potential due to the particular sequence of the three bonds j,k,l (where k goes from vertex i-1 to i), the distance ri-2,i+1. between residues i-2 to i+1 is
considered. If the square of this distance is less than 3, then due to the hard core stearic repulsion,
UΦ(j,k,l,i)=0 (C-15)
If r2 i-2,i+1=3, then
UΦ(J,k.l,i) = (j.k)(k,I)exp[-(ε,(j,k,I)+3εrep)/kBT] (C-16)
If r2 i-2,i+1=5, then UΦ(j.k,l,i) = (j,k)(k,I)exp[-(ε,(j,k,I)+εrep)/kBT]. (C-17)
For all other r2 i-2,i+1,
UΦ(j.k,l,i) = (j,k)(k,!)exp[-(ε,(j,k,l))/kBT] (C-18)
Thus, local short range repulsions are accounted for in the treatment as well. For 2<i<lu, if lu is even, and for 2<i≤lu is odd, then
Uf(j.k,l,m) = (j,k)(k,i)(l,m)exp(-Θ,l-l(k,l)+εΘ.l(l,m))/kBT)UΘ(j,k,l,i-1)UΘ(k,l,m,i) (C-19)
If i=lu, and lu is even then, since vertex i is at the end of the chain, it is necessary to only account for the last bond angle and torsional angle associated with vertex N-l. To make this last matrix conformable with the previous matrices (e.g., vector type 1), an extra bond is appended at the end of the chain, giving:
^.K(j,k,l,N)=(j,k)(k,l)(l,le)xp(-(εΘ,-1(k,l)/kBT)U0(j,k,l,N-1)
(C-20)
From the above definitions of U, J and Z, it is seen where the free energy AD of the denatured state can be determined from the equation:
RD=-kBTln(ZD).
APPENDIX D c only left handed diamond lattice vectors can interact
c revised to include a finer trajectory
c
c generalized to include other favoring of torsional potentials c uses full hydrophobic hydrophobic interaction matrix
c based on Miyazawa Jernigan interaction scale
c 8/19/B9
c **FIXES THE PROBLEM OF GLYCINES AT POSITIONS 3 AND LEKF-2 c **PRESENT IN ALL PREVIOUS VERSIONS
c ncglyshort.f is a version of ncthermshort.f but which introduces c thermalization into the wave displacements
c generates short trajectories
c like jstherm but it also
c calculates the number of native contacts
c
c ***************************************************************************************** c SIDECHAINS ONLY A DISTANCE OF THE SQUARE ROOT OF TWO CAN INTERACT c all other faces of the sidechain are hard core
c produces equivalent interaction for 12 and 16 states
c should produce shifted sq(10) for beta barrel like spates c with hydrophoblic core
c program uses setind.f and ergd.f
c *************************************************************************** c * * c * P R O T E I N 201 * c * THE NEXT GENERATION * c * * c *************************************************************************** c WITH GLYCINE (NO SIDE GROUP) CODED AS := 0 (zero) HYDDROPHCEICITY c NO GLYCINE ASSUMED ON THE THREE ENDS SEGMENTS. ALLOWS FCR 6 STATE c PROGRAM SIMULATES SIMPLIFIED MODELS OF GLOBULAR PROTEINS BASED ON c THE " 21 0 " LATTICE ALPHA-CARBON REPRESENTATION INCLUDES SOME c DETAILS OF A SEQUENCE DESCRIPTION. HAS BUILD-IN CHIRALITY OF THE c AMINOACIDS. ASYMETRIC METROPOLIS SCHEME WITH -A-VARlETY OF LOCAL c REARANGEMENTS OF MAIN (AND SIDE GROUPS) CHAIN BACKBONE. EDITED BY c AK - FEB. 1989 ST. LOUIS.
c REPULSIVE INTERACTIONS SQRT(5)
c WITH 'WAVE' MOTIONS, HYDROGEN BONDS, COOPERATIVITY, SIDE GROUPS.. c THREE (+1) SITE SIDE GROUPS
c NOTE THAT THIS PROGRAM USES EREP5,EHB, setini, REMOVES, LOOKG,ERGG c setind.f allows for interactions between left handed chirality c diamond lattice vector
c vaxran version
c ***************************************************************************
c this version of program was created on 5/12/89
c *************************************************************************** c 4/18/89 c constructs the torsional potential in the progarm
c PHISEQ used
c ah is the l./temp in the thermalization step of the waves and c rotations
c at the head of the INPUT file
c ******************************************************************************* c
c THE LATERAL TRANSLATION OF A STRING ADDED
c
c WITH SPECIFICATION OF THE TORSIONAL POTENTIAL FOR SEQOENCE c THIS IS GIVEN IN APH( 24 , 24 , 24 , 'LENGTH') ARRAY WHICH HAS TO BE c PREPARED AS AN INPUT FILE FILENAME='PKIPAT' USE AKPHIMAKE c
c LIST OF BACKBONE VECTORS - USE FOR ANALYSE OF LOCAL GEOMETRY c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
Figure imgf000054_0001
c
IMPLICIT INTEGER (I-Z) REAL vaxran
double precision etot , etot2 , cv, anct , ant
real asumr2 , asums2 , as2
LOGICAL GOODCLOOK
param ster ( ndim=150 )
DIMENSION ASTR(ndim), IDIS(ndim),STATN(ndim)
DIMENSION astrr(ndim),RIDIS(ndim), RSTATN(ndim), RIHAND(ndim)
DIMENSION XYZ(ndim, ndim, ndim), X(ndim),Y(ndim), Z(ndim),ihand(ndim)
DIMENSION VECTOR(-2:2,-2:2.-2:2), VX(24),VY.24),VZ(24)
DIMENSION ICONF(24,24),GOODC(24,24)
DIMENSION VECT1(24,24,5),VECT2 (24,24,5)
DIMENSION SIDGRl(24,24), SIDGR2(24,24),SIDGR3 (24,24)
DIMENSION ICA(0:ndim), STLX(13),STLY(13),STLZ(13)
DIMENSION AC(ndim, 20), AM(ndim, ndim), IHYD(ndim), IC6 (ndim)
DIMENSION IC8 (ndim), IC10(ndim), IC12(ndim), IC14 (ndim), IC16 (ndim), IC18(nd
DIMENSION PRODV(24,24),ICAO(ndim), APH(24, 24, 24,ndim.
DIMENSION XNEW(ndim), YNEW(ndim), ZNEW(ndim), INDGL( ndim)
dimension iflip(20,5),inc(ndim,ndim)
dimension SIX(24, 24)
dimension S1Y(24,24)
dimension S1Z(24,24)
dimension xt(ndim),yt(ndim),zt(ndim)
DIMENSION AM(ndim,ndim), IHYD(ndim), IC6(ndim), ahyd(ndim,ndim)
c XYZ - OCCUPANCY LIST WITH SIDE GROUPS (0,-1, INDEX!!!)
c X, Y, Z - EXPLICITE COORDINATES OF I-LENT BEADS
c ICONF - R2 (VECTOR CODE, VECTOR CODE)
c ICA - EXPLICATE VECTORS DOWN THE CHAIN
c APH - ENERGY OF A GIVEN SEQUENCE OF THREE BONDS, DEPENDSc ON CONFORMATION AND THE NUMBER OF THE RESIDUE
c
DATA VX /4*2,4*1,8*0,4*-1,4*-2/
DATA VY /1, 0,-1,0,2,0,-2,0,1,2,-1,2,-1,-2,1,-2,2,0,-2,0,1,0,-1,0/
DATA VZ /0,1,0,-1,0,2,0,-2,2,1,2,-1,-2,-1,-2,1,0,2,0,-2,0,1,0,-1/c
c FCC LATTICE VECTORS (AND 000)
DATA STLX /4*0,-1,1,-1,1,-1,1,-1,1,0/
DATA STLY /-1,1,-1,1,1,-1,4*0,-1,1,0/
DATA STLZ /1,-1,-1,1,2*0,1,-1,-1,1,3*0/
c
c TETRAHEDRAL LATTICE VECTORS
c DATA TLX /1,-1,1,-1,1,-1,1,-1/
c DATA TLY /-1,1,-1,1,1,-1,1,-1/
c DATA TLZ /-1,1,1,-1,-1,1,1,-1/
c
c CODING THE VECTORS TO THE ARRAY
c
DO XX=-2,2
DO YY=-2,2
DO ZZ=-2,2 VECTOR(XX,YY, ZZ)=0
ENDDO
ENDDO
ENDDO
VECTOR(2,1,0)=1
VECTOR(2,0,1)=2
VECTOR(2,-1,0)=3
VECTOR(2,0,-1)=4
VECTOR(1,2,0)=5
VECTOR(1,0,2)=6
VECTOR(1,-2,0)=7
VECTOR(1,0,-2)=8
VECTOR(0,1,2)=9
VECTOR( 0,2,1)=10
VECTOR(0,-1,2)=11
VECTOR( 0,2,-1)=12
VECTOR(0,-1,-2)=13
VECTOR(0,-2,-1)=14
VECTOR(0,1,-2)=15
VECTOR(0,-2,1)=16
VECTOR(-1,2,0)=17
VECTOR(-1,0,2)=18
VECTOR(-1,-2,0)=19
VECTOR(-1,0,-2)=20
VECTOR(-2,1,0)=21
VECTOR(-2,0,1)=22
VECTOR(-2,-1,0)=23
VECTOR(-2,0,-1)=24
c ..................................................................................................
c
c LIST OF CONFORMATIONS - THE SUM OF TWO VECTORS
c ..................................................................................................
DO 1=1,24
DO J=1,24
ICONF(I,J)=(VX(I)+VX(J))**2+(VY(I)+VY(J))**2+(VZ(I)+VZ(J))**2
IDOTP=IABS(VX(I)*VX(J)+VY(I)*VY(J)+VZ(I)*VZ(J))
IF(IDOTP.EQ.5) PRODV(I,J)=1
ENDDO
ENDDO
c
c THE CODE OF A VECTOR READS AS CODE=VECTOR(X,Y,Z) (1 TO 24 ) c AND VICE VERSA COORDINATES READ AS X=VX(CODE)
c
c ..................................................................................................
c
c LIST OF ACCEPTABLE CONFORMATIONS 6-18 (LOGICAL TABLE) c ..................................................................................................
c DO 1=1 , 24
DO J=1 , 24
IF(ICONF(I,J) .LT.6.0R.ICONF(I,J).GT.18) THEN
c 6,8,10,12,14,16, AND R2=18 ALLOWED
GOODC(I,J)=. FALSE.
ELSE
GOODC(I,J)=.TRUE.
END IF ENDDO ENDDO
c
c .................................................................................................................................... c
c FLIP-TWIST ARRAY GIVES A DIRECT PREDICTION OF THE NEW CONF. STATE c VECT1(I,J,K) GIVES A FIRST VECTOR AFTER JUMP FROM SEQUENCE OF I-J c TO NEW STATES (SOMETIMES DEGENERATED) K-l,..5 < READS AS A CODE > c ..................................................................................................................................c
DO 1=1,24
DO J-1,24
IF(GOODC(I,J)) THEN
WX=VX(I)
WY=VY(I)
WZ=VZ (I)
NX=VX(J)
NY=VY(J)
NZ=VZ(J)
VECT1(I,J,1)=J
VECT1(I,J,4)=J
VΞCT1(I,J,5)=J
VECT2(I,J,1)=I
VECT2(I,J,4)=I
VECT2(I,J,5)=I
ICONA= ( ICONF(I,J)-4)/2
GO TO (6,1,2,3,2,5,2) ICONA
c CONFORMATION F.2-6c FOUR POSSIBILE ARRANGEMENTS
6 SX=WX+NX
SY=WY+NY
SZ=WZ+NZ
IF(IABS(SX).EQ.2) THEN
IF(SY.NE.SZ) THEN
WY=-WY
WZ=-WZ
NZ=-NZ
NY=-NY
ENDIF WX1=WX
WX2=NX
WY1=WZ WZ1=WY
WY2=NZ
WZ2=NY
GO TO 15
ENDIF
IF(IABS(SY).EQ.2) THEN
IF(SX. NE.SZ) THEN
WX=-WX
WZ=-WZ
NX=3NX
NZ=-NZ
ENDIF
WY1-WY
WY2-NY
WX1-WZ
WZ1-WX
WX2-NZ
WZ2-NX
GO TO 15
ENDIF IF(IABS(SZ).EQ.2) THEN
IF(SX.NE.SY) THEN
WX=-WX
WY=-WY
NX=-NX
NY=-NY
ENDIF
WZ1=WZ
WZ2=NZ
WX1=WY
WY1=WX
WX2=NY
WY2=NX
ENDIF
15 N1=VECTOR(WX1,WY1,WZ1)
N2=VECTOR(WX2,WY2,WZ2)
VECT1(I,J,2)=N1
VECT2(I,J,2)=N2
VECT1(I,J,3)=N2
VECT2(I,J,3)=N1
GO TO 7
c CONFORMATION R2=8 1 MX=1
MY=1
MZ=1
IF(IABS(WX). EQ.1) MX=-1
IF(IABS(WY).EQ.1) MY=-1
IF(IABS(WZ).EQ.1) MZ=-1
PX=WX*MX
PY=WY*MY PZ=WZ*MZ
I2=VECTOR(PX,PY,PZ)
LX=NX*MX
LY=NY*MY
LZ=NZ*MZ
J2=VECTOR(LX,LY,LZ)
VECT1(I,J,2)=I2
VECT2(I,J,2)=J2
VECT1(I,J,4)=I2
VECT2(I,J,4)=J2
VECT1(I,J,5)=J2
VECT2(I,J,5)=I2
VECT1(I,J,3)=J2
VΞCT2(I,J,3)=I2
GO TO 7
c
c CONFORMATION R2=10 c CONFORMATION R2=14c CONFORMATION R2=18 2 VECT1(I,J,2)=J
VECT2(I,J,2)=I
VECT1(I,J,3)=J
VECT2(I,J,3)=I
GO TO 7
c3 CONFORMATION R2=12
TEMPCO=3*WX*NX+2*WY*NY+WZ*NZ
SX=WX+NX
SY=WY+NY
SZ=WZ+NZ
c TEMPCO=3 X AXIS DIRECTION IN THE ORIGINAL STATE c =2 Y
c11 =1 Z DIRECTION
GO TO (13,12,11) TEMPCO
WX1=SX
WX2=0
WZ1=0
WZ2=SZ
WY1=SY/2
WY2=SY/2
KX1=SX
KX2=0
KY1=0
KY2=SY
KZ1=SZ/2
KZ2=SZ/2
GO TO 14
12 WY1=SY
WY2=0
WZ1=0
WZ2=SZ WX1=SX/2
WX2=SX/2
KY1=SY
KY2=0
KX1=0
KX2=SX
KZ1=SZ/2
KZ2=SZ/2
GO TO 14
13 WZ1=SZ
WZ2=0
WX1=0
WX2=SX
WY1=SY/2
WY2=SY/2
KZ1=SZ
KZ2=0
KY1=0
KY2=SY
KX1=SX/2
KX2=SX/2
14 Nl-VECTOR(WX1,WY1,WZ1)
N2-VECTOR(WX2,WY2,WZ2)
VECT1(I,J,2)=N1
VECT2(I,J,2)=N2
M1=VECTOR(KX1, KY1, KZ1)
M2=VECTOR(KX2, KY2, KZ2)
VECT1(I,J,3)=M1
VECT2(I,J,3)=M2
VECT1(I,J,4)=N2
VECT2(I,J,4)=N1
VECT1(I,J,5)=M2
VECT2(I,J,5)=M1
GO TO 7
c CONFORMATlON R2=16 5 SX=WX+NX
SY=WY+NY
SZ=WZ+NZ
TEMPCO=(3*IABS(SX)+2*IABS(SY)-TABS (SZ))/4
GO TO (23,22,21) TEMPCO
21 WX1=WX
WX2=NX
WY1=WZ
WY2=NZ
WZ1=WY
WZ2=NY
KX1=WX
KX2=NX
KY1=NZ
KY2=WW KZ1=NY
KZ2=WY
GO TO 24
22 WY1=WY
WY2=NY
WX1=WZ
WX2=NZ
WZ1=WX
WZ2=NX
KY1=WY
KY2=NY
KX1=NZ
KX2=WZ
KZ1=NX
KZ2=WX
GO TO 24
23 WZ1=WZ
WZ2=NZ
WX1=WY
WX2=NY
WY1=WX
WY2=NX
KZ1=WZ
KZ2=NZ
KX1=NY
KX2=WY
KY1=NX
KY2=WX
24 Nl-VECTOR (WX1, WY1, WZ1)
N2-VECTOR (WX2, WY2, WZ2)
VECT1(I,J,2)=N1
VECT2(I,J,2)=N2
VECT1(I,J,4)=N2
VECT2(I,J,4)=N1
M1=VECTOR (KX1, KY1, KZ1)
M2=VECTOR (KX2, KY2, KZ2)
VECT1(I,J,3)=MI
VECT2(I,J,3)=M2
VECT1(I,J,5)=M2
VECT2(I,J,5)=M1
CONTINUE
c7 CONFORMATION IS NOT ACCETABLE
ELSE
DO K=1,5
VECT1(I,J,K)=0
VECT2(I,J,K)=0
ENDDO
END IF
ENDDO
ENDDO c
c ...................................................................................................................................... c
c SIDE GROUPS - EXPLICITE DEFINITION BASED ON CONFORMATION R2 c ...................................................................................................................................... c
c SIDGRl (24, 24) CONTAINS CODES OF 110 VECTORS c SIDGR2(24,24) CONTAINS CODES OF 110 VECTORS c SIDGR3 ( 24 , 24 ) CONTAINS CODES OF 110 VECTORS c
DO 1=1,24
DO J=1,24
IF(.NOT.GOODC(I,J)) GO TO 40
X1=VX(I)
Y1=VY(I)
Z1=VZ(I)
X2=VX(J)
Y2=VY(J)
Z2=VZ(J)
ICONA=(ICONF(I,J)-4)/2
PX=Y1*Z2-Y2*Z1
PY=X2*Z1-Z2*X1
PZ=X1*Y2-Y1*X2
PX=-PX
PY=-PY
PZ=-PZ
WX=X1-X2
WY=Y1-Y2
WZ=Z1-Z2
GO TO (33,32,33,32,31,32,36) ICONA
c CONFORMATION R2=14 31 SUMAX=PX
SUMAY=PY
SUMAZ=PZ
GC TO 39
c CONFORMATION R2=8 c CONFORMATION R2=12 c CONFORMATION R2=16 32 SUMAX=PX-2*X2
SUMAY=PY-2*Y2
SUMAZ=PZ-2*Z2
GO TO 39
c CONFORMATION R2=6 c CONFORMATION R2=103 c
33 SUMAX=PX+WX
SUMAY=PY+WY
SUMAZ-PZ+WZ GO TO 39
c CONFORMATION R2=18 c
36 IF(PX*PY.NE.O) THEN
c THE CASE OF DOWN THE AXIS CONFORMATION
SUMAX=PX
SUMAY=PY
SUMAZ=PZ
GO TO 39
ENDIF
c THE CASE OF 330 CONFORMATION
SUMAX=PX+WX
SUMAY=PY+WY
SUMAZ=PZ+WZ
c
c
39 SUX=ISIGN(1,SUMAX)
SUY=ISIGN(1,SUMAY)
SUZ=ISIGN(1,SUMAZ)
X1=SUX
X2=SUX
X3=0
Y1=SUY
Y2=0
Y3=SUY
Z1=0
Z2=SUZ
Z3=SUZ
c
c GIVES THE CODE OF (STLX,STLY,STLZ)V, VALUE 1,2,..12
ICODT=9*X1+3*Y1+Z1
IF(ICODT.LT.O) ICODT=-1-ICODT
SIDGRl ( I,J)=ICODT
ICODT=9*X2-3*Y2+Z2
IF(ICODT.LT.O) ICODT=-1-ICODT
SIDGR2(I,J)=ICODT
ICODT=9*X3+3*Y3+Z3
IF(ICODT.LT.0) ICODT=-1-ICODT
SIDGR3(I,J)=ICODT
c insert of check for handedness
x4=(x1+x2+x3)/2
y4=(y1+y2+y3)/2
z4=(z1+z2+z3)/2
SlX(i, j)=x4
SlY(i,j)=y4
SlZ(i,j)=z4
CONTINUE ENDDO ENDDO c INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUTc --------------------------------------------------------------------------------------------- c SET UP OF THE VECTOR REPRESENTATION OF THE CHAINc
OPEN(UNIT=5,FILE='INPUT',STATUS='OLD')
OPEN(UNIT=10,FILE='FILEDAT',STATUS='OLD')
OPEN(UNIT=6,FILE='OUTPUT',STATUS='OLD')
OPEN(UNIT=1,FILE='SEQUENCE',STATUS=*OLD')
OPEN(UNIT=11,FILE='contact',STATUS='OLD')
OPEN(UNIT=12,FILE='PHISEQK',STATUS=*OLD')
OPEN(UNIT=14,FILE='PHISEQHR',STATUS-'OLD')
OPEN(UNIT=13,FILE='TRACE',STATUS='OLD')
open(unit=15, file=' hydmap', status='old')
open(unit=16, file=' output', status='old')
READ(10,*) LENF
LENF1=LENF=1
LENF2=LENF=2
AL2=LENF2
LENF3=LENF=3
AL3=LENF3
LENF4=LENF=4
AL4=LENF4
AL6=LENF=6
AL9=LENF=9
MIX=1
LENHA=LENF/2
do i=1, 100
STATN(i)=0.d0
IDIS(i)=0.d0
ASTR(i)=0.d0
IKAND(i)=0.d0
RSTATN(i)=0.d0
RIDIS(i)=0.d0
astrr(i)=0.d0
RIHAND(i)=0.d0
do j=1,100
do k=1,100
xyz(k, j ,i)=0.d0
end do
end do
end do c
c ************************SEQUENCE READING*********************************************** c
c ************************READING OF TORSIONAL POTENTIALS************************* c
c E EXPLICIT CONSTRUCTION OF APH
DO LJ=2,LENF1 READ(12,*)K,STATN(LJ), IDIS(LJ), ASTR(LJ), IHAND(LJ)
END DO
c generalized to include other conformational prefrences
read(14,*)other
if(other .eq. 0) go to 542
do lj=1, other
READ(14,*)K,RSTATN(k),RIDIS(k),ASTRR(k),RIHAND(k)
END DO
continue
c ********************************** SEQUENCE READING**********************************c
c CAUTION::: THE REVERSE PATTERN IS NOT ALLOWED +K HAS TO BEc ASSUMED AS A PREFERENCE FOR A GIVEN STATE c
c
c STATN - R2( 1-1,1+1)
c IDIS - R2(l-1,I+2)
c ASTR- STRENGTH OF PREFERENCE FOR THE DIHEDRAL ANGLE
c aa plays the role of the thermalization factor
read(5,*)ah
WRITE(6, 8120)
8120 FORMAT(1X, ' ** THE THREE SIDE GROUP PROGRAM
AND GLI glypredict.f full hydrophobic interaction matrix',/',
* 'uses not necessarily native conf in torsions',/)
NB=L-0
INDGL(l)=0
INDGL(LENF)=0
DO I=2,LENF1
INDGL(I)=1
READ(l,*) K,IC6(I),
* IC8 (I), IC10 (I), ICl2 (I), ICl4 (I), ICl6 (I), ICl8 (I), IHYD(I)
IF(IHYD(I).EQ.0) THEN
INDGL(I)=0
NBGL=NBGL+1
ENDIF ENDDO
c
c *************************INPUT FILE***********************************************************c
READ(5,*) RANDOM, NCYCLE, PHOT
READ(5,*) AC6,AC8,ACl0,ACl2,ACl4,ACl6,ACl8
READ( 5,*) APLPB, BPLPL, CFBPB, AREP, AH3, APHI
READ(5,*) ATEMP,WAVEL
c
WRITE (6,8020) RANDOM, NCYCLE, PHOT,WAVEL 8020 F0RMAT(1X,' ** THE THREE SIDE GROUP PROGRAM AND GLI. **',/, 'DIAMOND LATTICE SITES INTERACT ',/,
* '** glypredict with .5 *jemigan **',
* IX,' RANDOM SEED =',16,' NUMBER OF CYCLES',215,/,
* IX, ' MAXIMUM WAVE LENGTH =',14,/ )
write(6,8999)ah
8999 format(lx,' temp/tempthermal =',1f8.4)
WRITE(6,8021) AC6,AC8,AC10 ,AC12,AC14,AC16,AC18
8021 FORMAT (5X,/,3X,' ENERGY OF STATE 6 =',F6.2,/,
* 3X, E NNEERRGGYY O0F. STATE 8 = ' , F6 . 2 , /,
* 3X, ENERGY OF STATE 10=' , F6 .2 , /'
* 3X, ENERGY OF STATE 12= ' ,F6.2,/,
* 3X. ENERGY OF STATE 14= ',F6.2,/,
* 3X, ENERGY OF STATE 16= ',Fr6o..2<.,,//,,
* 3X,' ENERGY OF STATE 18=',F6.2,/)
WRITE(6, 8022) APLPB,BPLPL,CPBPB
8022 FORMAT(3X,' PHIL-PHOB, PHIL-PHIL, PHOB-PHOB ,4F8.3,/)
WRITE(6,8024) AREP,AHB,APHI
8024 FORMAT(3X,' REPULSIVE INT. AND COOPER.+H-BOND ' ,2F 8.3,/
* ,3X,' SCALING FACTOR FOR DIHEDRAL ANGLE POTENT IAL', F8.3,/) WRITE(6,8023) ATEMP
8023 FORMAT(1X,/,3X,' TEMPERATURE OF THE SYSTEM F8.3 ,/ ) c
c construction of native contact map
do i=l,lenf2
do j=i,lenf1
inc(i,j)=0.d0
inc(j,i)=0.d0
end do
end do
read(11,*)ntot
write(6,2039)ntot
2039 format(1x, 'not=1',13,/, 'the native contact pairs are' )
do i=1,ntot
read(11,*)j,k
inc(j,k)=1
write(6,*)j,k
end do
c
c *********** SET THE CURRENT FORCE OF INTERACTIONS ************** c
A APHI=APHI/ATEMP
AHB=AHB/ATEMP
AC6=AC6/ATEMP
AC8=AC8/ATEMP
AC10=AC10/ATEMP
AC12=AC12/ATEMP
AC14=AC14/ATEMP
AC16=AC16/ATEMP AC18=AC18/ATEMP
APLPB=APLPB/ATEMP
BPLPL=BPLPL/ATEMP
CPBPB=CPBPB/ATEMP
AREP=AREP/ATEMP
do i-l,lenf2
read(15,*)(ahyd(i,j),j-i,lenf2)
c write(6,*)(ahyd(i,j),j-i,lenf2)
do j-i,lenf2
ahyd(j,i)-ahyd(i,j)
end do
end do
DO I=2,LENF1
imi=i-1
DO J=2,lenf1
jml=j-1
IF(IABS(I-J).GT.2) THEN
am(i,j)=ahyd(iml,jml)/ateimp
ELSE
AM(I,J)=0.
c A PRIORI NO INTERACTIONS OF SIDE GROUPS WHEN/I-J/<3
ENDIF
ENDDO
ENDDO
c
DO l=2 LENF1
AC(l,6)=lC6(I)*AC6
AC(I,8)=IC8(I)*AC8
AC(I,10)=IC10(I)-*AC10
AC(I,12)=IC12(I)*AC12
AC(I,14)=IC14(I)*AC14
AC(I,16)=IC16(I)*AC16
AC(I,18)=IC18(I)*AC18
ENDDO
c
c ********************READING OF TORSIONAL POTENTIALS********* * ** ** ** ** *** ** **c
DO 100 l=1,24
Xl=VX(I)
Yl=VY(I)
Zl=VZ(I)
DO 1200 J=1,24
IF<GOODC(I,J)) THEN
X2=VX(J)
Y2=VY(J)
Z2-VZ(J)
c CROSS PRODUCT OF THE TWO FIRST VECTORS PX=Y1*Z2-Y2*Z1
PY=Z1*X2-Z2*X1
PZ=X1*Y2-Y1*X2
stl=iconf(i,j)
DO 300 K=1,24
IF(.NOT.GOODC(J,K)) GO TO 300
X3=VX(K)
Y3=VY(K)
Z3=VZ(K)
IHAN=PX*X3+PY*Y3+PZ*Z3
IHAN=SIGN(1,IHAN)
st2=iconf(j,k)
c............................................................................................................................................................
c
DO 401 INDEX=2,LENF2
B=0.
aσh(i,j,k,index)=0.
IF(STATN(INDEX) .EQ. STl.AND. STATN( INDEX+1). EQ. ST2) THEN
KX=X1+X2+X3
KY=Y1+Y2+Y3
KZ=Z1+Z2+Z3
R2-KX*KX+KY*KY+KZ*KZ
IF(R2.EQ. IDIS ( INDEX). and. ihan . ec. ihand(index))B-ASTR(INDEX)
ENDIF APH(I,J,K,INDEX)=(B)*APHI
IF(RSTATN(INDEX).EQ. ST1.AND. RSTATN(INDEX-1). EQ. ST2) THEN
KX=X1+X2+X3
KY=Y1+Y2+Y3
KZ=Z1+Z2+Z3
R2=KX*KX+KY*KY*+KZ*KZ
IF(R2.EQ.RIDIS(INDEX). and. ihan. eq.Rihand(index))B=astrR(INDEX)
ENDIF APH(I,J,K,INDEX)=(B)*APHI
401 CONTINUE
c ...................................................................................................................................................... c
300 CONTINUE
END IF
1200 CONTINUE
100 CONTINUE
ICA(0)=1
c this is because the sinralicity of APH reading, value irrelevant
ICA(LENF)=1
c
c ***********************INITIAL CONFORMATION**************************
c
c c caution (zero initialization assumed)
MAX=100
MID=MAX/2
SX=0
SY=0
SZ=0
DO I=1,LENF
READ(10,*) X(I),Y(I),Z(I)
SX=SX+X(I)
SY=SY+Y(I)
SZ=SZ+Z(I)
ENDDO
SX=SX/LENF
SY=SY/LENF
SZ=SZ/LENF
XSHIFT=MID-SX
YSHIFT=MID-SY
ZSHIFT=MID-SZ
DO I=1,LENF
X(I)=X(I)+XSHIFT
Y(I)=Y(I)+YSHIFT
Z(I)=Z(I)+ZSHIFT
ENDDO
DO I=1,LENF1
J=I+l
WX=X(J)-X(I)
WY=Y(J)-Y(I)
WZ=Z(J)-Z(I)
ICA(I)=VECTOR(WX,WY,WZ)
ENDDO CALL setin(XYZ,INDGL(l),X(l),Y(l),Z(l),13,13,13,1)
CALL setin(XYZ, INDGL(LENF),X(LENF),Y(LENF),Z(LENT),13,13,13,1) DO J-2,LENF1
I=J-l
II=ICA(I)
JJ=ICA(J)
IF(GOODC(II,JJ)) THEN
S1=SIDGR1(II,JJ)
S2=SIDGR2(II,JJ)
S3=SIDGR3(II,JJ)
CALL setin(XYZ, INDGL(J),X(J),Y(J),Z(J),S1,S2,S3,J)
WRITE(6, 8001) I,J
8001 FORMAT(5X, 'WRONG INPUT CHAIN - VECTORS ',214)
GO TO 9000
END IF ENDDO
c
c............CALCULATION OF THE ENERGY OF INITIAL STATE c
E=0.
ENERG=0.
DO J=2 , LENF1
I=J-1
II=ICA ( I )
JJ=ICA(J)
c ROTATIONAL CONTRIBUTION
JCONF=ICONF( II ,JJ)
ENERG=ENERG+AC(J,JCONF)
c INTERACTIONS OF SIDE GROUPS
IX=X(J)+SlX(ii,jj)
IY=Y(J)+SlY(ii,jj)
IZ=Z(J)+SlZ(ii,jj)
E=E+ERG(XYZ,INDGL(J),AM,IX,IY,IZ,J)
4501 continue
ENERG=ENERG+E/2.
c COOPERATIVE AND HYDROGEN BOND
E=0.
DO I=2,LENF1
E=E+EHB(XYZ,ICA,PRODV,X(I) ,Y(I) ,Z(I) ,I,AHB)
ENDDO ENERG=ENERG+E/2
c REPULSIVE INTERACTIONS
E=0.
DO I=2.LENF1
E=E+EREPUL(XYZ,X(I),Y(I),Z(I),1.)
ENDDO
c this is because the implicite symmetry of repulsive interactions c which is taken into account in the remainder of the pro gram.
E=(E-AL3*2.)/2.
ENERG=ENERG+E*AREP
c
c D DIHEDRAL POTENTlAL
DO J=2 ,LENF2
II=ICA(J-1 )
JJ=ICA(J)
KK=ICA(J÷l )
ENERG=ENERG+APH (ll,JJ,KK ,J )
ENDDO
c /////////////////////////////////////////////////////////////////////////////////////////////////////////
RN1=RAND0M*2+7531
RN2=RAND0M*2+8883
RN3=RANDOM*6+ 7907 c ***************************************************** ********************** c * * c * DYNAMICS OF THE CHAIN * c * * c *************************************************************************** c
c
c MAIN CLOCK OF THE ALGORITHM
c
ICLOCK=1
c
QROT=0
QWAVE=0
QKINK=0
QEND=0
c
asumr2=0.
etot-0.d0
etot2=0.d0
sxd=0.d0
syd=0.d0
szd=0.d0
anct=0.d0
ant=0.d0
write(6,931)
931 format(lx, 'iterm= R2= AS2= ENERGY= native any contacts')
DO 7777 ITERM=1, NCYCLE
iclock=iclock+1
c
DO 7700 IDUMI-1,100
iclock=iclock+1
c
DO 7770 IPHCO=1,PHOT
c
IF(ICLOCK.GT.2000) ICLOCK=ICLOCK-vaxran(rn2)*1000
c
c ................LATERAL WAVE DISPLACEMENT........................ c
c s set up of the thermalization move
if (vaxran(rn2) .gt. .01) then
af=1.d0
else
af=ah
end if
IVA=MOD( ICLOCK,WAVEL) + 3 I=INT(vaxran(rnl)*AL6)+ 3
IF(I.GT.LENHA) THEN
IFIRST=I-IVA
ILAST=I
ELSE
IFIRST=I
ILAST=I+IVA
ENDIF
WI=ICA(IFIRST)
JL=ILAST-1
WJ=ICA(JL)
JCONF=ICONF(WI,WJ)
IF(JCONF.LT.14.OR.JCONF.GT.18) GO TO 7001
IF(.NOT.GOODC(ICA(IFIRST-l),WJ)) GO TO 770001
GO TO 7001
TO 7001
TO 7001 c REMOVE THE STRING
DO K=IFIRST,ILAST
II=ICA(K-l)
KK=ICA(K)
IKS1=SIDGR1(II,KK)
IKS2=SIDGR2(II,KK)
IKS3=SIDGR3(II,KK)
XJ=X(K)
YJ=Y(K)
ZJ=Z(K)
CALL REMOVE(XYZ,INDGL(K) ,XJ,YJ, ZJ,IKS1, IKS2, IKS3)
ENDDO
c SETIN AND EXCLUDED c VOLUME TEST c THE NEW VECTORS
ICA(IFIRST)=WJ
ICA(JL)=WI
IFA=IFIRST-1
XJ=X(IFA)
YJ=Y(IFA)
ZJ=Z(IFA)
DO J=IFIRST,ILAST
II=ICA(J-1)
JJJ=ICA(J)
XJ=XJ+VX(II)
YJ=YJ+VY(II)
ZJ=ZJ+VZ(II)
XNEW(J)=XJ YNEW(J)=YJ
ZNEW(J)=ZJ
S1=ΓIDGRI(II,JJJ)
S2=SIDGR2(H,JJJ)
S3=SIDGR3(II,JJJ)
IF(LOOK(XYZ,INDGL(J),XJ,YJ,ZJ,SL,S2,S3)) THEN
c THEN REMOVE AND TERMINATE
IF(J.EQ.IFIRST) GO TO 2004
DO K=IFIRST,J-1
KK=ICA(K-l)
KKK=ICA(K)
S1=SIDGR1(KK,KKK)
S2=SIDGR2(KK,KKK)
S3=SIDGR3(KK,KKK)
CALL REMOVE(XYZ, INDGL(K),XNEW(K),YNEW(K),ZNEW(K),S1,S2,S3)
ENDDO
2004 ICA(IFIRST)=WI
ICA(JL)=WJ
DO I=IFIRST,ILAST
II]ICA(I-l)
JJJ]ICA(I)
S1]SIDGR1(II,JJJ)
S2]SIDGR2(II,JJJ)
S3=SIDGR3(II,JJJ)
CALL setin(XYZ,INDGL(I),X(I),Y(I),Z(I),S1,S2,S3,I)
ENDDO
GO TO 7001
ELSE
c SET NEW BEAD
CALL setin(XYZ, INDGL(J) ,XJ,YJ,ZJ,S1,S2,S3,J)
ENDIF
ENDDO
c THE NEW STRl NG KEEPS EXCLUDED VOLUMEc
c
c COMPUTATION OF ENERGY OF THE NEW CONFORMATION AND REMOVE STRINGc
ENEW=0.
ER=0.
E=0.
DO J=IFIRST,ILAST
I=J-l
II=ICA(I)
JJ=ICA(J)
XJ=XNEW(J)
YJ=YNEW(J) ZJ=ZNEW(J)
S1=SIDGR1(II,JJ)
S2=SIDGR2(II,JJ)
S3=SIDGR3(II,JJ)
JCONF=ICONF(II,JJ)
ENEW=ENEW+AC(J,JCONF) +APH (II,JJ,ICA(J+l),J)
c INTERACTIONS OF SIDE GROUPS
IX=XJ+SlX(ii,jj)
IY=YJ+Sly(ii,jj)
IZ=ZJ+Slz(ii,jj)
E=E+ERG(XYZ,INDGL(J),AM,IX,IY,IZ,J) c COOPERATIVE AND HYDROGEN BOND
E=E+EHB (XYZ , ICA, PRODV, XJ, YJ , ZJ, J, AHB)
c REPULSIVE INTERACTIONS
ER=ER+EREPUL(XYZ,XJ,YJ,ZJ,AREP)
CALL REMOVE(XYZ,INDGL(J),XJ,YJ,ZJ,S1,S2,S3) ENDDO
ENEW=ENEW+APH (ICA( IFIRST-2), ICA( IFA), ICA( IFIRST), IFA) ENEW=ENEW+E+ER
c
c
c
c COMPUTATION OF THE OLD ENERGY AND SETIN OF THE CHAIN PIECE c
c THE OLD VECTORS
ICA( IFIRST)=WI
ICA(JL)=WJ
EOLD=0.
ER=0.
E=0.
DO J=IFIRST,ILAST
XJ=X(J)
YJ=Y(J)
ZJ=Z(J)
II=ICA(J-i)
JJJ=ICA(J)
S1=sIDGR1(II,JJJ)
S2=SIDGR2 (ll,JJJ)
S3-SIDGR3(II,JJJ)
c s4=sidgr4 (ii,jjj)
c tx=tlx(s4)
c ty=tiy(s4)
c tz=tlz(s4)
CALL setin (XYZ , INDGL(J), XJ, YJ ,ZJ,S1,S2,S3,L)c
JCONF=ICONF(II, JJJ)
EOLD=EOLD+AC( J, JCONF ) +APH ( II , JJJ,lCA(J+1),J)
c INTERACTIONS OF SIDE GROUPS
IX=XJ+SIX(ii,jjj)
IY=YJ+Sly(ii,jjj)
IZ=ZJ+Slz(il,jjj) E=E+ ERG (XYZ , INDGL(J ) , AM, IX , IY , IZ , J )
c COOPERATIVE AND HYDROGEN BOND
E=E+EHB(XYZ, ICA,PRODV,XJ,YJ, ZJ,J,AHB)
cc REPULSIVE INTERACTIONS
ER=ER+EREPUL(XYZ,XJ,YJ, ZJ,AREP)
ENDDO
EOLD=EOLD+APH(ICA(IFIRST-2), ICA(IFA),ICA(IFIRST),IFA)
EOLD=EOLD+E+ER
c
c METROPOLIS CRITERION
c
DE=ENEW-EOLD
IF(EXP(-DE*af).GT.vaxran(rn3)) THEN
c ACCE P TED
QROT=QROT+1
ENERG=ENERG+DE
DO J=IFIRST, ILAST
II=ICA(J-1)
JJ=ICA(J)
S1=SIDGR1(II,JJ)
S2=SIDGR2(II,JJ)
S3=SIDGR3(II,JJ)
CALL REMOVE(XYZ,INDGL(J),X(J),Y(J), Z (J), S1, S2, S3)
ENDDO
c THE NEW VECTORS
ICA(IFIRST)=WJ
ICA(JL)=WI
DO J=IFIRST, ILAST
XJ=XNEW(J)
YJ=YNEW(J)
ZJ=ZNEW(J)
X(J)=XJ
Y(J)=YJ
Z(J)=ZJ
II=ICA(J-1)
JJ=ICA(J)
S1=SIDGR1(II,JJ)
S2=SIDGR2(II,JJ)
S3=SIDGR3(II,JJ)
CALL setin(XYZ, INDGL(J),XJ,YJ, ZJ, S1,S2, S3,J) ENDDO ENDIF
7001 DO 7000 IDUMA=1,LENF4
ICLOCK=ICLOCK+1
c
I=INT(vaxran(rnl)*AL4)+2
c RUNS FROM 2 TO LENF-3 (VΕCTOR INDEX RUNS FROM 1 TO LENF-1)
J=I+1
KINK=MOD(ICLOCK, 5 )+1
c DEFINES KIND OF KINK OF THE VECTORS I-J IV=ICA(I)
JV=ICA(J)
IIV=VECTl(IV,JV,KINK)
IP=I-1
IPV=ICA(IP)
IF(.NOT.GOODC(IPV,IIV)) GO TO 7000
JN=J+1
JNV=ICA(JN)
JJV=VECT2( IV,JV,KINK)
IF(.NOT.GOODC(JJV,JNV)) GO TO 7000
c
c CONFORMATION IS OK - CHECK THE EXCLUE VOLUME c REMOVE THE STRING
JL=J
ifirst=i
ilast=jn
DO K=IFIRST,ILAST
II=ICA(K-1)
KK=ICA(K)
IKS1=SIDGR1(II,KK)
IKS2=SIDGR2(II,KK)
IKS3=SIDGR3(II,KK)
XJ=X(K)
YJ=Y(K)
ZJ=Z(K)
CALL REMOVE(XYZ, INDGL(K) ,XJ,YJ, ZJ, IKS1, IKS2, IKS3) ENDDO
c SETIN AND EXCLUDEDc VOLUME TESTc THE NEW VECTORS
ICA(IFIRST)=iiv
ICA(JL)=jjv
IFA=IFIRST-1
XJ=X(IFA)
YJ=Y(IFA)
ZJ=Z(IFA)
DO J=IFIRST, ILAST
II=ICA(J-1)
JJJ=ICA(J)
XJ=XJ+VX(II)
YJ=YJ+VY(II)
ZJ=ZJ+VZ(II)
XNEW(J)=XJ
YNEW(J)=YJ
ZNEW(J)=ZJ
S1=SIDGR1(II,JJJ)
S2=SIDGR2(II,JJJ) S3=SIDGR3(II,JJJ)
IF(LOOK(XYZ,INDGL(J),XJ,YJ,ZJ,S1,S2,S3)) THEN
c THEN REMOVE AND TERMINATE
IF(J.EQ. IFIRST) GO TO 2204
DO K=IFIRST,J-1
KK=ICA(K-l)
KKK=ICA(K)
S1=SIDGR1(KK,KKK)
S2=SIDGR2 (KK, KKK)
S3=SIDGR3(KK,KKK)
CALL REMOVE(XYZ, INDGL(K), XNEW(K),YNEW(K), ZNEW(K), S1, S2,S3)
ENDDO
2204 ICA(IFIRST)=IV
ICA(JL)=JV
DO I=IFIRST, ILAST
II=ICA(I-1)
JJJ=ICA(I)
S1=SIDGR1(II,JJJ)
S2=SIDGR2(II,JJJ)
S3=SIDGR3(II,JJJ)
CALL setin(XYZ,INDGL(I),X(I),Y(I),Z(I),S1,S2,S3,
ENDDO
GO TO 7000
ELSE
c SET NEW BEAD
CALL setin(XYZ, INDGL(J), XJ,ZJ,S1,S2,S3,J)
ENDIF ENDDO
c THE NEW STRING KEEPS EXCLUDED VOLUMEc
c
c COMPUTATION OF ENERGY OF THE NEW CONFORMATION AND REMOVE STRINGc
ENEW=0.
ER=0.
E=0.
DO J=IFIRST, ILAST
I=J-l
II=ICA(I)
JJ=ICA(J)
XJ=XNEW(J)
YJ=YNEW(J)
ZJ=ZNEW(J)
S1=SIDGR1(II,JJ)
S2=SIDGR2(II,JJ)
S3=SIDGR3(II,JJ) c ROTATIONAL CONTRIBUTION JCONF-ICONF( II ,JJ)
ENEW-ENEW+AC(J,JCONF)+APH( II,JJ, ICA(J+1),J)
c INTERACTIONS OF SIDE GROUPS
IX-XJ+Slx(ii,jj)
IY-YJ÷SlY(ii,jj)
IZ-ZJ+Slz(ii,jj)
E-E+ERG(XYZ, INDGL(J),AM, IX, IY, IZ,J)c COOPERATIVE AND HYDROGEN BOND
E-E+EHB(XYZ ,ICA, PRODV, XJ,YJ, ZJ,J,AHB)
c REPULSIVE INTERACTIONS
ER-ER+EREPUL( XYZ , XJ,YJ, ZJ,AREP)
CALL REMOVE(XYZ , INDGL(J),XJ,YJ,ZJ,S1,S2,S3) ENDDO
ENEW=ENEW+APH(ICA(IFIRST-2), ICA(IFA), ICA( IFIRST), IFA) ENEW=ENEW+E+ER
c
c
c
c COMPUTATION OF THE OLD ENERGY AND SETIN OF THE CHAIN PIECE c
c THE OLD VECTORS
ICA( IFIRST)=IV
ICA(JL)=JV
EOLD=0.
ER=0.
E=0.
DO J=IFIRST, ILAST
XJ=X(J)
YJ=Y(J)
ZJ=Z(J)
II=ICA(J-1)
JJJ=ICA(J)
S1=SIDGR1(II,JJJ)
S2=SIDGR2(II,JJJ)
S3=SIDGR3(II,JJJ)
c s4=sidgr4 (ii,jjj)
c tx=tix(s4)
c ty=tly(s4)
c tz=tlz(s4)
CALL setin(XYZ,INDGL(J),XJ,YJ,ZJ,S1,S2,S3,J) c
JCONF=ICONF(II,JJJ)
EOLD=EOLD+AC(J,JCONF)+APH (II,JJJ,ICA(J+1),J)
c INTERACTIONS OF SIDE GROUPS
IX=XJ+Slx(ii,jjj) IY=YJ+SlY(ii,jjj)
IZ=ZJ+Slz(ii,jjj)
E=E+ERG(XYZ,INDGL(J),AM,IX,IY,IZ,J)
c COOPERATIVE AND HYDROGEN BOND
E=E+EHB(XYZ, ICA, PRODV, XJ, YJ, ZJ,J,AHB)
c REPULSIVE INTERACTIONS
ER=ER+EREPUL(XYZ,XJ,YJ, ZJ,AREP)
ENDDO
EOLD=EOLD+APH(ICA(IFIRST-2), ICA(IFA), ICA(IFIRST), IFA)
EOLD=EOLD+E÷ER
c
c METROPOLIS CRITERION
c
DE-ENEW-EOLD
IF(EXP(-DE).GT.vaxran(rn3)) THEN
c ACCEPTED
iflip(iconf(iv,jv),kink)=iflip(iconf(iv,jv),kink)+1
QKINK=QKINK÷1
ENERG=ENERG+DE
DO J=IFIRST, ILAST
II=ICA(J-1)
JJ=ICA(J)
S1=SIDGR1(II,JJ)
S2=SIDGR2(II,JJ)
S3=SIDGR3(II,JJ)
CALL REMOVE(XYZ,INDGL(J),X(J),Y(J),Z(J),S1,S2,S3) ENDDO
c THE NEW VECTORS
ICA(IFIRST)=IIV
ICA(JL)=JJV
DO J=IFIRST, ILAST
XJ=XNEW(J)
YJ=YNEW(J)
ZJ=ZNEW(J)
X(J)=XJ
Y(J)=YJ
Z(J)=ZJ
II=ICA(J-1)
JJ=ICA(J)
S1=SIDGR1(II,JJ)
S2=SIDGR2(II,JJ)
S3=SIDGR3(II,JJ)
CALL setin(XYZ, INDGL(J),XJ, YJ, ZJ, S1,S2,S3,J)
ENDDO
ENDIF
c 7000 CONTINUE
c idummy=1
c if(idummy .eq. 1) go to 7770
c
c END FLIPS (TWO BONDS REARANGEMENTS)c
c N-TERMINUS (TAIL)
c
JV3=ICA(3)
60 NV2=INT(vaxran(rn1)*24.)+1
IF(.NOT.GOODC(NV2,JV3)) GO TO 60
61 NVl=INT(vaxran(rn3)*24.)+1
IF(.NOT.GOODC(NVl,NV2)) GO TO 61
c CONFORMATION IS OK. CHECK THE EXCLUDED VOLUME
CALL REMOVE(XYZ,INDGL(1),X(1),Y(1),Z(1),13,13,13) ICAl=ICA(1)
ICA2=ICA(2)
PK21=SIDGR1(ICAl, ICA2)
PK22=SIDGR2 (ICAl, ICA2)
PK23=SIDGR3 (ICAl, ICA2) c **************************
CALL REMOVE(XYZ, INDGL( 2 ),X(2),Y(2),Z(2), PK21, PK22, PK23) c CHECK THE ROTATION OF SIDE GROUP ON THIRD BEAD
c 8/19/89
c oninvoke if no glycines are here
c
if(indgl(3) .eσ.0) go to 3040
PK31=SΪDGR1(ICA2,JV3)
PK32=SIDGR2 (ICA2,JV3)
PK33=SIDGR3(ICA2,JV3)
c *************************************************
c * ***************************
SX1=X(3)+STLX(PK31)
SX2=X(3)+STLX(PK32)
SX3=X(3)+STLX(PK33)
SY1=Y(3)+STLY(PK31)
SY2=Y(3)+STLY(PK32)
SY3=Y(3 )+STLY(PK33 )
SZ1=Z(3)+STLZ(PK31)
SZ2==(3)+STLZ(PK32)
SZ3-Z(3)+STLZ(PK33)
XYZ(SX1,SY1,SZ1)=0
XYZ(SX2,SY2,SZ2)=0
XYZ(SX3,SY3,SZ3)=0
xone=(sx1+sx2+sx3-x(3))/2
yone=(sy1+sy2+sy3-y(3))/2
zone=(sz1+sz2+sz3-z(3))/2 xyz(xone,yone,zone)=0
NK31=SIDGRKNV2,JV3 )
NK32= SIDGR2 (NV2,JV3)
NK33=SIDGR3 (NV2,JV3)
c s4=sidgr4(nv2,jv3)
c tx=tlx(s4)
c ty=tly(s4)
c tz=tlz(s4)
c **********************
MX1=X(3)+STLX(NK31)
MX2=X(3)+STLX(NK32)
MX3=X(3)+STLX(NK33)
MY1=Y(3)+STLY(NK31)
MY2=Y(3)+STLY(NK32)
MY3=Y(3)+STLY(NK33)
MZ1=Z (3)+STLZ (NK31)
MZ2=Z(3)+STLZ(NK32)
MZ3=Z(3)+STLZ(NK33)
IF(XYZ(MX1,MY1,MZ1).NE.0) GO TO 64
IF(XYZ(MX2,MY2,MZ2) .NE.O) GO TO 64
IF(XYZ(MX3,MY3,MZ3).NE.O) GO TO 64
mxone=(mxl+mx2+mx3-x(3))/2
myone=(myl+my2+my3-y(3))/2
mzone=(mzl+mz2+mz3-z(3))/2
if(xyz(mxone,myone,mzone) .ne.0) go to 64
XYZ(MX1,MY1,MZ1)=-1
XYZ (MX2,MY2,MZ2)=-1
XYZ(MX3,MY3,MZ3)=-1
xyz(mxone,myone,mzone)=3
c...
c end of check of sidechain conformation if sidechain there is not c a glycine.
3040 continue
NK21=SIDGR1 (NV1, NV2 )
NK22=SIDGR2 (NV1, NV2 )
NK23=SIDGR3 (NV1, NV2)
c *******************************
WX2=VX(NV2)
WY2=VY(NV2)
WZ2=VZ(NV2)
X2=X(3)-WX2
Y2=Y(3)-WY2
Z2=Z(3)-WZ2
IF(LOOK(XYZ,INDGL(2),X2,Y2,Z2,NK21,NK22,NK23)) GO TO 63 WX1=VX(NV1)
WY1=VY(NV1)
WZ1=VZ (NV1)
X1=X2-WX1 YI-Y2-WY1
Z1-Z2-WZ1
IF(LOOK(XYZ,INDGL(1),X1,Y1,Z1,13,13,13)) GO TO 63 c
c........ OLD CONFORMATIONAL ENERGY (LOCAL)
c
IC3=ICONF(ICA2,JV3)
IC2=ICONF(ICA1,ICA2)
COLD=AC(2,IC2)+AC(3,IC3)
c s4=sidgr4 (ical, ica2)
c tx=tlx(s4)
c ty=tly(s4)
c tz=tlz(s4)
c PK23=SIDGR3(ICA1,ICA2)
c ipk21=ihanl(ical,ica2)
QX1=X(2)+slx(ical,ica2)
Qy1=y(2)+sly(ical,ica2)
Qz1=z(2)+slz(ical,ica2)
SuX1=X(3)+S1X(ica2,jv3)
Suy1=y(3)+siy(ica2,jv3)
Suz1=z(3)+Slz(ica2,jv3)
EOLD-COLD
+ERG (XYZ,INDGL(3),AM,SuXl, SuY1 , SuZ1 , 3) c * +ERG(XYZ,INDGL(3),AM,SuX3,SuY3,SuZ3,3)
+ERG (XYZ, INDGL(2),AM, QX1,QY1, QZ1, 2)
+APH(ICA1, ICA2 ,JV3, 2)+APH(ICA2,JV3, ICA(4),3) * +EREPUL(XYZ,X(2),Y(2),Z(2),AREP)
+EHB (XYZ, ICA, PRODV,X(3),Y(3), Z(3), 3,AKB)
+EHB(XYZ,ICA,PRODV,X(2),Y(2),Z(2),2,AHB)
c
c........ ..NEW CONFORMATIONAL ENERGY (LOCAL)
c
ICA(1)=NV1
ICA(2)=NV2
IC3=lCONF(NV2,JV3)
IC2=ICONF(NV1,NV2)
CNEW=AC(2,IC2)+AC(3, IC3)
c* *************************************************
c s4=sidgr4 (nv1,nv2)
c tx=tlx(s4)
c ty=tly(s4) c tz=tiz( s4 )
LX1=X2+Slx( nv1,nv2)
Lyl=y2+Sly(nv1,nv2)
Lzl=z2+Slz(nv1,nv2)
MuXl=X(3)+SlX(nv2, jv3)
Muyl=y(3)+Sly(nv2,jv3)
Muzl=z (3)+Slz(nv2,jv3)
ENEW-CNEW
* +ERG (XYZ, INDGL(3),AM,muX1,muY1,MuZ1,3) * +ERG (XYZ, INDGL(2),AM,LX1,LY1,LZ1,2)
* +APH (NV1, NV2, JV3, 2)+APH(NV2,JV3,ICA(4),3) * +EREPUL(XYZ, X2, Y2, Z2,AREP)
* +EHB(XYZ, ICA, PRODV, X(3),Y(3),Z(3) ,3,AHB) * +EHB (XYZ, ICA, PRODV, X2, Y2, Z2, 2,AHB) c.......................METROPOLIS CRITERION
c
DE=ENEW-EOLD
IF(EXP(-DE) .LT.vaxran (rN3)) GO TO 63
ENERG=ENERG+DE
c
c SET-IN THE NEW CONFORMATION OF THE TAIL
X(1)=X1
Y(1)=Y1
Z(1)=Z1
X(2)=X2
Y(2)=Y2
Z(2)=Z2
CALL setin(XYZ,INDGL(l),X1, Y1, Z1, 13, 13, 13,1; CALL SETIN(XYZ, INDGL(2), X2, Y2, Z2, NK21, NK22, NK23 ,2;
QEND=QEND+1
GO TO 79
c
c SET-IN THE CLD CONFORMATION OF THE TAIL 63 if(indgl(3) . eq. 0) go to 641
XYZ (MX1,MY1. MZ1)=0
XYZ (MX2,MY2,MZ2)=0
XYZ(MX3,MY3,MZ3)=0
xyz (mxone,myone,mzcne)=0
64 XYZ(SX1,SY1,SZ1)=-1
XYZ(SX2,SY2,SZ2)=-1
XYZ(SX3,SY3,SZ3)=-1 xyz(xone,yone, zone)=3
641 continue
ICA(1)=ICA1
ICA(2)=ICA2
CALL setin(XYZ,INDGL(1),X(1),Y(1),Z(1),13,13,13,1) CALL SETIN(XYZ, INDGL(2),X(2),Y(2),Z(2), PK21, PK22, PK23, 2) c
c C-TERMINUS (HEAD)
c
79 JV3=ICA(LENF3)
80 NV2=INT(vaxran(ml)*24.)+1
IF(.NOT.GOODC(JV3,NV2)) GO TO 80
81 NVl=INT(vaxran(rn2)*24.)+1
IF(.NOT.GOODC(NV2,NV1)) GO TO 81
c CONFORMATION IS OK. CHECK THE EXCLUDED VCLUME
CALL REMOVE(XYZ , INDGL(LENF), X(LENF), Y(LENF), Z (LENF), 13,13,13)
ICA2=ICA(LENF2)
ICAl=ICA(LENF1)
PK21=SIDGR1(ICA2, ICA1)
PK22=SIDGR2(ICA2, ICA1)
PK23=SIDGR3 (ICA2, ICA1)
c
IIII=INDGL(LENF1)
CALL REMOVE(XYZ, II", X(LENF1), Y(LENF1), Z (LENF1), PK21,PK22, PK23)c CHECK THE ROTATION OF SIDE GROUP ON TH IRD BEAD
if(indgl(lenf2) .eg. 0) go to 6045
PK31=SIDGR1(JV3,ICA2)
PK32=SIDGR2(JV3, ICA2)
PK33=SIDGR3 (JV3, ICA2)
SX1=X(LENF2)+STLX(PK31)
SY1=Y(LENF2)+STLY(PK3i)
SZ1=Z(LENF2)+STLZ(PK31)
SX2=X(LENF2)+STLX(PK32 )
SY2=Y(LENF2)+STLY(PK32)
SZ2=Z(LENF2)+STLZ(PK32)
SX3=X(LENF2 )+STLX(PK33 )
SY3=Y(LENF2)+STLY(PK33)
SZ3-Z(LENF2)+STLZ(PK33)
XYZ(SX1,SY1,SZ1)=C
XYZ(SX2,SY2,SZ2)=0
XYZ(SX3,SY3,SZ3)=0
xone=(sx1+sx2+sx3-x(ienf2))/2
yone=(sy1-usy2+sy3-y(ienf2))/2
zone=(sz1+sz2+sz3-z(lenf2)) /2
xyz(xone,yone,zone)=0
NK31=SIDGR1(JV3, NV2)
NK32=SIDGR2(JV3, NV2) )
NK33-SIDGR3 (JV3 ,NV2) MX1=X(LENF2)+STLX(NK31)
MY1=Y(LENF2)+STLY(NK31)
MZ1=Z(LENF2)+STLZ(NK31)
MX2=X(LENF2)+STLX(NK32)
MY2=Y(LENF2)+STLY(NK32)
MZ2=Z (LENF2)+STLZ(NK32)
MXJ=X(LENF2)+STLX(NK33)
MY3=Y(LENF2)+STLY(NK33)
MZ3=Z (LENF2)+STLZ(NK33)
IF(XYZ(MX1,MY1,MZ1).NE.0) GO TO 84
IF(XYZ(MX2,MY2,MZ2).NE.O) GO TO 84
IF(XYZ(MX3,MY3,MZ3).NE.O) GO TO 84
mxone=(mx1+mx2+mx3-x(lenf2))/2
myone=(my1+my2+my3-y(lenf2))/2
mzone=(mz1+mz2+mz3-z(lenf2))/2
if(xyz(mxone,mvone,mzone) .ne.0) go to 84
XYZ(MX1,MY1,MZ1)=-1
XYZ(MX2,MY2,MZ2)=-1
XYZ(MX3,MY3,MZ3 )=-1
xyz(mxone,myone,mzone)=lenf2
c....
6045 continue
NK21=SIDGR1(NV2, NV1)
NK22=SIDGR2 (NV2, NV1)
NK23=SIDGR3 (NV2,NV1)
WX2=VX(NV2)
WY2=VY(NV2)
WZ2=VZ(NV2)
X2=X(LENF2)+WX2
Y2=Y(LENF2)+WY2
Z2=Z(LENF2)+WZ2
IF(LOOK(XYZ, IIII, X2, Y2, Z2, NK21, NK22, NK23)) GO TO 83 WX1=VX(NVl)
WY1=VY(NVl)
WZ1=VZ(NVI)
X1=X2+WX1
Y1=Y2+WY1
Z1=Z2+WZ1
IF(LOOK(XYZ, INDGL(LENF), X1, Y1, 21, 13, 13, 13) ) GO TO 83 c
c.......... .OLD CONFORMATIONAL ENERGY (LOCAL)
c
IC3=ICONF(JV3,ICA2)
IC2=ICONF(ICA2, ICA1)
C0LD-AC(LENF1, IC2)+AC(LENF2, IC3)
c s4-sidgr4 (ica2,ical)
c tx-tlx(s4)
c ty-tly(s4)
c t tz=tlz (s4)
QX1=X(lenf1)+S1X(ica2,ical)
Qy1=y(lenf1)+Sly(ica2,ical)
Qzl=z (lenf1)+Slz (ica2,ical) SuX1=X(LENF2)+SlX(jv3,ica2)
Suy1=y(LENF2)+SlY(jv3,ica2)
Suz1=z(LENF2)+S1z(jv3,ica2) c QX1=X(LENF1)+STLX( PK21)*
c QY1=Y(LENF1)+STLY(PK21)
c QZ1=Z(LENF1)+STLZ(PK21)
c QX2=X (LENF1)+STLX(PK22)
c QY2=Y(LENF1)+STLY(PK22)
c QZ2=Z (LENF1)+STLZ (PK22)
c QX3=X (LENF1 )+STLX(PK23)
c QY3=Y(LENF1)+STLY(PK23 )
c QZ3=Z(LENF1)+STLZ(PK23)
EOLD=COLD
* -ERG ( XYZ, INDGL(LENF2) ,AM,SuX1,suY1,suZ1,LENF2 * +APH (ICA(LENF4), JV3, ICA2, LENF3)
* +APH (JV3, ICA2, ICAl, LENF2)
* +ERG ( XYZ, IIII,AM, CX1, CY1, CZ1, LENF1) * +EREPUL(XYZ, X(LENF1),Y(LENF1), Z (LENF1), AREP)
* +EHB (XYZ, ICA, PRODV, X (LENF2), Y(LENF2), Z (LENF2), LENF2, AHB) * +EHB (XYZ, ICA, PRODV, X (LENF1), Y( LENF1), Z (LENF1), LENF1, AHB)c....... ..NEW CONFORMATIONAL ENERGY (LOCAL)
c
ICA(LENF1 )=NV1
ICA(LENF2)=NV2
IC3=ICONF(JV3, NV2)
IC2=ICONF(NV2, NV1)
CNEW=AC(LENF1, IC2)+AC(LENF2, IC3)
LX1=X2+slx(nv2, nv1)
Ly1=y2+sly(nv2, nv1)
Lz1=z2+slz(nv2, nv1)
MuX1=X(lenf2)+SLX (jv3, nv2)
MuYL=y(lenf2)+Sly(jv3, nv2)
Muz1=z(lenf2)+Slz(jv3, nv2)
c LX1=X2+STLX(NK21)
c LY1=Y2+STLY(NK21)
c LZ1=Z2+STLZ(NK21)
c LX2=X2+STLX(NK22)
c LY2=Y2+STLY (NK22)
c LZ2=Z2+STLZ(NK22) c LX3-X2-STLX(NK23)
c LY3=Y2+STLY(NK23)
c LZ3=Z2+STLZ(NK23)
ENEW=CNEW
* +ERG (XYZ, INDGL(LENF2), AM,muX1,muY1,muZ1, LENF2) * +APH (ICA(LENF4),JV3, NV2,LENF3) * +APH (JV3, NV2, NV1, LENF2)
* +ERG (XYZ, IIII,AM,LX1, LY1,LZ1,LENF1) * +EREPUL(XYZ, X2, Y2, Z2,AREP)
* +EHB (XYZ, ICA, PRODV, X (LENF2), Y(LENF2), Z(LENF2),LENF2, AHB) * +EHB (XYZ, ICA, PRODV, X2, Y2, Z2, LENF1, AHB) c
c.................. ..METROPOLIS CRITERION
c
DE=ENEW-EOLD
IF(EXP(-DE) .LT.vaxran (rn3)) GO TO 83
ENERG=ENERG+DE
c
c SET-IN THE NEW CONFORMATION OF THE HEAD
X(LENF)=X1
Y(LENF)=Y1
Z(LENF)=Z1
X( LENF1)=X2
Y(LENF1)=Y2
Z(LENF1)=Z2
CALL setin(XYZ, INDGL(LENF), X1, Y1, Z1, 13, 13, 13, 1)
CALL SETIN(XYZ, IIII, X2,Y2, Z2, NK21, NK22, NK23,LENF1)
QEND=QEND+1
GO TO 7007
c
c SET-IN THE CLD CONFORMATION OF THE TAIL
83 if(indg1(lenf2) .eq. 0) go to 675
XYZ(MX1,MY1,MZ1)=0
XYZ (MX2,MY2,MZ2)=0
XYZ(MX3,MY3,MZ3)=0
xyz(mxone,myone,mzone)=0
84 XYZ(SX1,SY1,SZ1)=-1
XYZ(SX2,SY2,SZ2)=-1
XYZ(SX3,SY3,SZ3)=-1
xyz(xone,yone,zone)=lenf2
675 continue ICA(LENFI)=ICA1
ICA(LENF2)=ICA2
CALL setin (XYZ , INDGL(LENF), X(LENF),Y(LENF), Z (LENF), 13, 13, 13, 1) CALL SETIN(XYZ ,1111, X(LENFI ), Y (LENFI ), Z (LENFI ), PK21, PK22 , PK23,LENF1)c
c WAVE LIKE MOTION OF THE CHAIN FRAGMENT, VARIOUS CONFORMATIONS
c
7007 I=INT(AL9'vaxran(rn2))=3
if( vaxran(rn3) .gt. .01) then
af=1.d0
else
af =ah
end if
c
1-2 IS THE CENTRAL BEAD OF THE PIECE TO BE CUT-OFF
c JJ - IS THE CENTRAL ONE OF THE PIECE TO BE CCNSTRUCTED
c SEARCH FOR U-SHAPED (OF VARIOUS WIDTH) CONFORMATIONS
c
IV2=ICA(I)
IV5-ICA(I-3)
VX2=VX(IV2)
VX5=VX(IV5)
IF(VX2.Ne.-VX5) GO TO 7770
VY2=VYCIV2)
VY5=VY(IV5.
IF(VY2.NE.-VY5) GO TO 7770
VZ2=VZ(IV2)
VZ5=VZ(IV5)
IF (VZ2. N E. - V Z5) GO TO 7770
c LOOK FOR THE SECOND END
IVA=MOD(ICLCCK,WAVEL)
MIX=-MIX
JJ=I+2-MIX* (5+IVA)
IF (JJ.L T.4.OR.JJ. G T. L E NF 3 ) GO TO 777 0
c ACCEPTED DOWN THE CHAIN CHOICE c I-END CONSTRUCTION (CUT-OFF)
IV3=ICA(I+2)
IV4=ICA(I+I)
c KINK FERFCRMEEED (KINK==1)
IVl=ICA(I-1)
I4=I+4
IV6=ICA(I4)
IF(GOODC(IV1,IV3). AND GOODC (IV4, IV6)) GOTO 200
c ELSE TRY KINK FLIP OF THE TOP
INV3=VECTI(IV3, IV4, KINK)
IV4=VECT2 (IV2, IV4, KINK)
IV3=INV3
IF(.NOT.GOODC (IV!, IV3)) GO TO 7770 LF(.NOT. GOODC (IY4,IV6)) GO TO 7770c CONSTRUCT THE NEW JJ- END 200 J1=JJ-1
JV1=ICA(J1)
JV2=ICA(JJ)
JVL=ICA(JJ-2)
J3=JJ+1
JVP=ICA(J3)
201 V=INT(vaxran(rnl)*24.)+1
IF(.NOT.GOODC(JVL,V)) GO TO 201
WVX=VX(V)
WVY=VY(V)
WVZ=VZ(V)
VM=VECTOR(-WVX, -WVY, -WVZ)
IF(.NOT. GOODC (VM,JVP)) GO TO 201
c
DO KINK=1,5
JNl=VECT1 (JV1,JV2, KINK)
IF(GOODC(V,JN1)) THEN
JN2=VECT2 (JV1,JV2, KINK)
IF(GOODC(JN2,VM)) GO TO 202
END IF
ENDDO GO TO 7770
c
c MODIFICATION OF THE BOND STRING ARRAY ICA, STORE THE OLD ONEc
202 IF(MIX.GT.0) THEN
IFIRST=1
ILAST=J3
ELSE
IFIRST=J1
ILAST=I4
ENDIF
DO J=IFIRST-1, ILAST
ICAO(J)=ICA(J)
ENDDO
c
IF(MIX.GT.0) THEN
ICA(I)=IV3
ICA(I+1)=IV4
DO J=I4,JJ-2
ICA(J-2)=ICAO(J)
ENDDO ICA(JJ)=VM
ICA(J1)=JN2
ICA(J1-1)=JN1
ICA(J1-2)=V ELSE
ICA(I4-1)=IV4
ICA(I4-2)=IV3
DO J=J3,I-1
ICA(J+2)=ICAO(J)
ENDDO
ICA(J1)=V
ICA(JJ)=JN1
ICA(J3)=JN2
ICA(J3+1)=VM
END IF
c REMDVE THE STRING
DO K=IFIRST, ILAST
II=ICAO(K-1)
KK=ICAO(K)
IKS1=SIDGR1(II,KK)
IKS2=SIDGR2(II,KK)
IKS3=SIDGR3(II,KK)
XJ=X (K)
YJ=Y(K)
ZJ=Z(K)
CALL REMOVE(XYZ, INDGL(K),XJ,YJ, ZJ, IKS1, IKS2, IKS3 )
ENDDO
c SETIN AND EXCLUDED c VOLUME TEST
IFA=XFIRST-1
XJ=X(IFA)
YJ=Y(IFA)
ZJ=Z(IFA)
II=ICA(J-1)
JJJ=ICA(J)
XJ=XJ+VX(II)
YJ=YJ+VY(II)
ZJ=ZJ+VZ(II)
XNEW(J)=XJ
YNEW(J)=YJ
ZNEW(J)=ZJ
IK≤1=SIDGRI(II,JJJ)
IKS2=SIDGR2 (II,JJJ)
IKS3=SIDGR3 (II,JJJ)
IF (LOCK (XYZ , IN DGL (J), XJ, YJ ,ZJ, IKS1, IKS2, IKS3)) THEN c T THENREMOVE AND TERMINATE
IF(J.EQ. IFIRST) GO TO 204
DO K= IFIRST, J-1
KK=ICA(K-1)
KKK=ICA(K)
IKS1=SIDGR1 (KK, KKK) IKS2=SIDGR2 (KK, KKK)
IKS3=SIDGR3(KK,KKK)
CALL REMOVE(XYZ, INDGL(K), XNEW(K), YNEW(K), ZNEW(K), IKS1, IKS2, IKS3)
ENDDO
204 DO I=IFIRST, ILAST
ICA(I)=ICAO(I)
II=ICA(I-1)
JJJ=ICA(I)
IKS1=SIDGR1(II,JJJ)
IKS2=SIDGR2(II,JJJ)
IKS3=SIDGR3 (II,JJJ)
CALL setin(XYZ, INDGL(I), X(I), Y (I), Z(I), IKS1, IKS2, IKS3, L)
ENDDO
GO TO 7770
ELSE
c SET NEW BEAD
CALL SETIN(XYZ, INDGL(J), XJ, YJ, ZZ, IKS1, IKS2, IKS3, J)
ENDIF
ENDDO
c THE NE W ST RING K EE P S EXCLUTED VOLUME c
c
c COMPUTATION OF ENERGYOF THE NEW CONFORMATION AND REMOVE STRING c
ENEW=0.
ER=0.
E=0.
DO J=IFIRST, ILAST
I=J-1
II=ICA(I)
JJ=ICA(J)
XJ=XNEW(J)
YJ=YNEW(J)
ZJ=ZNEW(J)
IKS1=SIDGR1(II,JJ)
IKS2=SIDGR2(II,JJ)
IKS3=SIDGR3(II,JJ) c ROTATIONAL CONTRIBUTI ON
JCONF=ICONF(II,JJ)
ENEW=ENEW+AC(J,JCONF)+APH (II, JJ, ICA(J+1), J)
c INTERACTIONS OF SIDE GROUPS
IX1=XJ+Slx(ii,jj)
Iy1=yJ-Sly(ii,jj)
I Z 1 =zJ+Slz (ii,jj) E=E+ ERG ( XYZ , INDGL ( J ) , AM, IX1 , IY1 , IZ1 , J )
c COOPERATIVE AND HYDROGEN BOND
E-E+EHB(XYZ, ICA, PRODV, XJ,YJ, ZJ,J ,AHB)
c REPULSIVE INTERACTIONS
ER-ER+EREPUL( XYZ , XJ,YJ, ZJ,AREP )
CALL REMOVE(XYZ, INDGL(J) ,XJ,YJ, ZJ, IKS1, IKS2, IKS3)
ENDDO
ENEW=ENEW+APH ( ICA( IFIRST-2), ICA(IFA), ICA(IFFRST), IFA)
ENEW=ENEW+E+ER
c
c
c
c COMPUTATION OF THE CLD ENERGY AND SETIN CF THE CHAIN PIECE c
DO J=IFIRST, ILAST
I-ICA(J)
ICA(J)=ICAO(J)
ICAO(J)=I
ENDDO
c NEW ICA STORED IN ICAO AT THIS POINT
EOLD=0.
ER=0.
E=0.
DO J=IFIRST, ILAST
XJ=X(J)
YJ=Y(J)
ZJ=Z(J)
II=ICA(J-1)
JJJ=ICA(J)
IKS1=SIDGR1 (II,JJJ)
IKS2=SIDGR2(II,JJJ)
IKS3=SIDGR3 (II,JJJ)
c s4=sidgr4(ii,jjj)
c tx=rlx(s4)
c ty=tly(s4)
c tz=tlz(s4)
CALL setin(XYZ, INDGL(J),XJ,YJ, ZJ, IKS1, IKS2, IKS3,J) c
JCONF=ICONF(II,JJJ)
EOLD=EOLD+AC(J,JCONF) +APH( II ,JJJ, ICA(J+l),J)
c INTERACTIONS OF SIDE GROUPS
IX1=XJ+Slx(ii,jjj)
Iy1-yJ+Ξly(ii,jjj)
Iz1-zJ+Slz(ii,jjj)
E=E+ERG(XYZ, INDGL(J),AM, IX1, IY1, IZ1,J) c COOPERATIVE AND HYDROGEN BOND E=E-EHB(XYZ, ICA, PRODV, XJ, YJ , ZJ,J ,AHB)
c REPULSIVE INTERACTIONS
ER=ER+EREPUL(XYZ , XJ, YJ, ZJ,AREP)
ENDDO
EOLD=EOLD+APH(ICA(IFIRST-2), ICA(IFA), ICA( IFIRST), IFA) EOLD=EOLD+E+ER
c
c METROPOLIS CRITERION
c
DE=ENEW-EOLD
IF(EXP (-DE*af) . GT.vaxran(rn3)) THEN
c
QWAVE-QWAVE+1
ENERG-ENERG+DE
DO J=IFIRST, ILAST
II=ICA(J-1)
JJ=ICA(J)
IKS1=SIDGR1(II,JJ)
IKS2=SIDGR2(II,JJ)
IKS3=SIDGR3(II,JJ)
CALL REMOVE(XYZ, INDGL(J), X(J ,Y(J),Z(J), IKS1,IKS2,IKS3)
ENDDO
DO J=IFIRST, ILAST
XJ=XNEW(J)
YJ=YNEW(J)
ZJ=ZNEW(J)
X(J)=XJ
Y(J)=YJ
Z(J)=ZJ
II=ICAO(J-1)
JJ=ICAO(J)
IKS1=SIDGR1(II,JJ)
IKS2=SIDGR2(II,JJ)
IKS3=SIDGR3(II,JJ)
CALL setin (XYZ, INDGL(J), XJ, YJ , ZJ, IKS1, IKS2, IKS3, J)
ICA(J)=ICAO(J)
ENDDO
ENDIF
c
7770 CONTINUE
c
c OPTIONAL NORMALISATION OF THE COORDINATES
SX=0
SY=0
SZ=0
DO I=1,LENF
SX=SX+X(I)
SY=SY+Y(I)
SZ=SZ+Z(I) zt(i)=z(i)-szd
end do
vrite(13,*)(xt(i),yt(i),zt(i), i=1, lenf)
R2-(X(LENF)-X(l))**2+(Y(LENF)-Y(l))**2+(Z(LENF)-Z(l))**2
asuιnr2=asumr2+r2
etot2=etot2+energ*energ
etot-etot+energ
AS2=0.
SX=0
SY=0
SZ=0
DO I=1,LENF
SX=SX+X(I)
SY=SY÷Y(I)
SZ=SZ+Z(I)
ENDDO
c CENTRE OF GRAVITY COORDINATES
ASX=FLOAT(SX) /LENF
ASY=FLOAT(SY) /LENF
ASZ=FLOAT( SZ )/LENF
DO I=1,LENF
BX=(ASX-X(I))**2
BY=(ASY-Y(I))**2
BZ=(ASZ-Z(I))**2
AS2=AS2+BX+BY+BZ
ENDDO
AS2=AS2/LENF
asums2=asums2+ras2
c insertion of the native contact pairs
nt=0. d0
nct=0.
do 1400 i=2, lenf2
k-i+l
ii=ica(i-1)
iii=ica(i)
SX1=slx(ii,iii)+x(ι)
Sy1=sly(ii,iii)+y(i)
Sz1=slz(ii,iii)+z(i)
DO 1400 J=K, LENF1
c it is not counting the nearest1 down-the-chain neighbours, which may be
c usefull for some purposes ...............................................................................................
if(iabs(i-j) .eq.1) go to 1400
iR2=(X(J)-X(I))** 2+(Y(J)-Y(I))**2-(Z(J)-Z(I))**2
IF(iR2.GT.18) GO TO 1400
II=ICA(J-1)
III=ICA(J) Kx1=slX(II,III)+X(J)
KY1=S1Y(II,III)+Y(J)
KZ1=S1Z(II,III)+Z(J)
R11=(SX1-KX1)**2+(SY1-KY1)**2+(SZ1-KZ1)**2
IF(R11.EQ.2) then
nct=nct+inc(i,j)
nt=nt+1
end if
1400 CONTINUE
ant=ant+nt
anct=anct+nct
WRITE(6,8009) ITERM, R2, AS2, ENERG,nct,nt
8009 FORMAT(2X,2I5,FB.2,F10.4,3x,i3,2x,i3)
c
7777 CONTINUE
c
9000 CONTINUE
REWIND(UNIT=10)
WRITE(10,8000) LENF
DO I=1,LENF
WRITE(10, 8000) X(I), Y(I), Z(I)
ENDDO
c DO I=2,LENF1
c WRITE(6,8000) I,X(I), Y(I), Z(I), ICA(I), ICONF(ICA(I-1), ICA(I)) c ENDDO c ACCEPTANCE RATIOS FOR VARIOUS MOVES
FKINK=FLOAT(QKINK)/FLOAT(NCYCLE*PHOT)/AL4/100.
FWAVE=FLOAT(QWAVE)/FLOAT(NCYCLE*PHOT)/100.
FROT=FLOAT(QROT)/FLOAT(NCYCLE*PHOT)/100.
FEND=FLOAT(QEND)/FLOAT(NCYCLE*PHOT)/200.
etot=etot/ncycie
etot2=etot2/ncycle
cv=etot2-etot*etot
asumr2=asumr2/ncycle
asums2=asums2/ncycie
anct=anct/ncycie
ant=ant/ncycle
write(6,9942)etot, etot2, cv, asumr2, asums2, anct, ant
9942 format(1x, '<E>-',1F10.4,5X, '<E2>-',1Pd15.8,5X, 'CV-',1pd15.8,/, * 'mean-square-end to end vector=', 1pd15.8,/,
* '<S2>=',1pd15.8,/, 'native contacts=', 1pd15.8,/, * ' number of contacts=',1pd15.8,/)
write(6, 8012)
8012 format(1x,' fkink= fend = fwave= frot=')
WRITE(6,8002) FKINK, FEND, FWAVE,FROT
c
c DETAILED ANALYSIS OF THE CHAIN STRUCTURE
c write(16,181) atemp, etot, cv, asumr2, asums2, anct, ant
181 format(1x, 1f8.3, 2x,6 (1f8.3,2x))
WRITE (6, 8004)
8004 FORMAT(1X,//,8X,'VECTOR1 VECTOR2', 8x,' SIDE 1/2/3', 6x, 'HANDENES',/)
DO I=2,LENF1
Il=ICA(I-1)
JJ=ICA(I)
KK=ICA(I+1)
Icj=ICONF(II,JJ)
SID1=SIDGR1(II,JJ)
SID2=SIDGR2(II,JJ)
SID3=SIDGR3(II,JJ)
WX1=VX(II)
WY1=VY(II)
WZ1=VZ(II)
WX2=VX(JJ)
WY2=VY(JJ)
WZ2=VZ(JJ)
WX3=VX(KK)
WY3=VY(KK)
WZ3=VZ(KK)
R3=(WX1+WX2+WX3 )**2+(WΥ1+WY2**-WY3)**2+(WZ1+WZ2+WZ3)**2
WlX-(SIX(ii,jj))*INDGL(I)
Wly=(Sly(ii,jj))*INDGL(I)
Wlz=(Slz(ii,jj))*INDGL(I)
c specification of the non interactiong fee sites
W2X=stlx(sidl)*INDGL(I)
W2y=stly(sidl)*INDGL(I)
W2z=stiz(sidl)*INDGL(I)
W3X=stlx(sid2)*INDGL(I)
W3y=stly(sid2)*INDGL(I)
W3z=stlz(sid2)*INDGL(I)
W4X=Stlx(Sid3)*INDGL(I)
W4y=stly(sid3)*INDGL(I)
W4z=stlz(sid3)*INDGL(I)
IHX=(WY1*WZ2-WY2*WZ1)*INDGL(I)
IHY=(WX2*WZ1-WZ2*WX1)*INDGL(I)
IHZ=(WX1*WY2-WY1*WX2)*INDGL(I)
IH1=ISIGN(1,(IHX*W1X+IHY*WlY+IKZ*WlZ))
WRITE(6,8003) I,WX1,WY1,WZ1,WX2,WY2,WZ2,
* W1X,W1Y,W1Z,W2X,W2Y,W2Z,W3X,W3Y,W3Z,IH1,icj,r3 ENDDO
c CENTRE OF GRAVITY COORDINATES
ASX=FLOAT(SX)/LENF
ASY=FLOAT(SY)/LENF
ASZ=FLOAT(SZ )/LENT
XSHIFT=MID-ASX
YSHIFT=MID-ASY
ZSHIFT=MID-ASZ
IF((XSHIFT**2+YSHIFT**2+ZSHIFT**2).GT.30) THEN
c NORMALISATION
CALL REMOVE(XYZ,INDGL(1),X(1),Y(1),Z(1),13,13,13)
CALL REMOVE(XYZ,INDGL(LENF),X(LENF),Y(LENF),Z(LENF),13,13,13)
DO I=2 ,LENF1
II=ICA(I-1)
JJ=ICA(I)
SID1=SIDGR1(II,JJ)
SID2=SIDGR2(II,JJ)
SID3=SIDGR3(II,JJ)
CALL REMOVE(XYZ,INDGL(I) ,X(I) ,Y(I) ,Z(I) ,SID1, SID2,SID3)
ENDDO
DO I=1,LENF
X(I)=X(I)+XSHIFT
Y(I)=Y(I)+YSHIFT
Z(I)=Z(I)+ZSHIFT
ENDDO
sxd=sxd-xshift
syd=syd-yshift
szd=szd-zshift
CALL setin (XYZ,INDGL(1),X(1),Y(1),Z(1),13,13,13,1)
CALL setin(XYZ, INDGL(LENF),X(LENF),Y(LENF),Z(LENF),13,13,13,1)
DO I=2,LENF1
II=ICA(I-1)
JJ=ICA(I)
SID1=SIDGR1(II,JJ)
SID2=SIDGR2(II,JJ)
SID3=SIDGR3(II,JJ)
CALL setin(XYZ, INDGL(I),X(I),Y(I),Z (I),SID1,SID2,SID3,I)
ENDDO
END IF
c
7700 CONTINUE
c write(13,*)iterm,energ,sxd,syd,szd
do i-1, lenf
xt(i)=x(i)+sxd
yt(i)=y(i)+syd ENDDO
8003 FORMAT(1X,I4,X,3I3,X,3I3,2X,3I2,X,3I2,X,3I2,2X,I2,x,i3,x,i3) 8002 FORMAT(1X,//,5X,5F10.6)
write (6, 8010)
8010 format(1x,/,5x,50(1h-),/)
8000 FORMAT (3X, 514, 16)
do i=6,18,2
write(6,8005) i,iflip(i,1),iflip(i,2;,iflip(i,3),iflip(i,4), * iflip(i,5)
enddo
8005 format(1x,i5,5i8)
c
c test1 of occupancy - a direct one
lenfo7=lenf*7
Ienf23=Ienf2-NBGL
write(6,8006) lenf, lenfo7, ienf23,nbg1
8006 format(1x,/,5x, 'lenf 79-lerf lenf-2 -nbgl ngbl',4i6,/) isi=0
ione=0
ioc=0
do xx=1,max
do yy=1,max
do zz=1,max
point-xyz (xx,yy,zz)
if(point.ne.0) then
if(point. gt.0) then
isi=isi -1
else
if(point.eq.-1) icne=ione-1
ioc=ioc-1
endif
endif
enddo
enddo
enddo
c writing of the backbone and side groups coordinates - without the c central (inert) bead
c
c do i-2, lenf2
c ii=ica(i-1)
c iii=ica(i)
c SX1=stlLX(SIDGR1(II,III))+X(I)
c SY1=stlLY(SIDGR1(II,III))+Y(I)
c SZ1=stlLZ(SIDGR1(II,III))+Z(I)
c SX2=stlLX(SIDGR2(II,III))+X(I)
c SY2=stlLY(SIDGR2(II,III))+Y(I)
c SZ2=stlLZ(SIDGR2(II,III))+Z(I)
c SX3=stlLX(SIDGR3(II,III))+X(l)
c SY3=sSlLY(SIDGR3(II,III))+Y(l) c SZ3=stlLZ(SIDGR3(II,III))+Z(I)
c write(6,420) x(i),y(i),z(i),sx1,sy1,sz1,sx2,sy2,sz2,
c * sx3,sy3,sz3
c 420 format(5x,4(2x,3i4))
c enddo
ioc=ioc-3*(lenf2-nbgl)
ione=(ione-14)/3 -2
c there are 3 -1 per side chain that is not a glycine
c let us count the number of excess minus ones
write(6,8007) ione, ioc, isi
8007 format(1x,//,1X, 'L-GLY, occupancy and side groups ',316,//,
* 5x '***************** contact map *************' ,/)
do 400 i=2,lenf2
k=i+1
ii=ica(i-1)
iii=ica(i)
c s4=sidgr4(ii,iii)
c tx=tlx(s4)
c ty=tly(s4)
c tz=tlz(s4)
c ih1=ihan1(ii,iii)
c ih2=ihan2(ii,iii)
c ih3=ihan3(ii,iii)
SX1=slx(ii,iii)+x(i)
Sy1=siy(ii,iii)+y(i)
Sz1=slz(ii,iii)+z(i)
DO 400 J=K,LENF1
c it is not counting the nearest1 down-the-chain neighbours, which may be
c usefuil for some purposes ...............................................................................................
if(iabs(i-j).eq.1) go to 400
R2-(X(J)-X(I))**2+(Y(J)-Y(I))**2+(Z(J)-Z(I))**2
IF(R2.GT.18) GO TO 400
II=ICA(J-1)
III=ICA(J)
KX1=slX(II,III)+X(J)
KY1=SlY(II,III)+Y(J)
KZ1=SlZ(II,III)+Z(J)
R11=(SX1-KX1)**2+(SY1-KY1)**2+(SZ1-KZ1)**2
mult=0
IF(R11.EQ.2) MULT=MULT+1
IF(MULT.EQ.0) GO TO 400
IF(am(i,j) .gt. 0) THEN WrlTE(6,410) I,J,am(i,j)
elseIF(am(i,j).lt.0) THEN
WRITE(6,411) I,J,am(i,j)
else
WRITE(6,412) I,J,am(i,j)
ENDIF
400 CONTINUE
410 FORMAT(5X,13," and', 13,' residues are repulsive ',1f10.4)
411 FORMAT(3X,'**',13, ' and', 13,' are attractive ',1f10.4) 412 format(5x,i3, 'and',i3, 'are inert ',1f10.4)
CLOSE(UNIT=1)
CLOSE(UNIT=2)
CLOSE (UNIT=5)
CLOSE(UNIT=6)
CLOSE(UNIT=10)
stOP
END
function vaxran(iseed)
equivalence (iyf1,yf12)
data mask,mask2/x'3f000000',x'3f800000'/ iseed=iseed*69069 + 1
nseed=rshift(iseed,8)
if(iseed.It.0) then
iyf1= mask2+nseed
vaxran=yf12
else
iyf1=mask+nseed
vaxran=yf12-.5
endif
return
endd
c
c SIDE *3 ...GLYCINE................................................. c REMOVES THE CLUSTER (RESIDUE + SIDE GROUP)
SUBROUTINE REMOVE (XYZ, INDGL,JX,JY,JZ,ID1,ID2,ID3) INTEGER XYZ (150,150,150),STLX(13),STLY(13), STLZ(13) c FCC LATTICE VECTORS (AND 000)
DATA STLX /4*0,-1,1,-1,1,-1,1,-1,1,0/
DATA STLY /-1,1,-1,1,1,-1,4*0,-1,1,0/
DATA STLZ /1,-1,-1,1,2*0,1,-1,-1,1,3*0/
c
IF(INDGL.EQ.0) GO TO 88
IX=JX+STLX(ID1)
IY=JY+STLY(ID1)
IZ=JZ+STLZ(ID1)
XYZ(IX,IY,IZ)=0
IIX=JX+STLX(ID2)
IIY=JY+STLY(ID2)
IIZ=JZ+STLZ(ID2)
XYZ(IIX,ITY,IIZ)=0
IIIX=JX+STLX(ID3)
IIIY=JY+STLY(ID3)
IIIZ=JZ+STLZ(ID3)
XYZ(IIIX,IIIY,IIIZ)=0
LX=(IX+IIX+IIIX-JX)/2
LY=(IY+IIY+IIIY-JY)/2
LZ=(IZ+IIZ+IIIZ-JZ )/2
XYZ(LX,LY,LZ)=0
88 XYZ(JX,JY,JZ)=0
IXL=JX-1
XYZ(IXL,JY,JZ)=0
IXP=JX+1
XYZ(IXP,JY,JZ)=0
IYL=JY-1
XYZ(JX,IYL,JZ)=0
IYP=JY+1
XYZ(JX,IYP,JZ)=0
IZL=JZ-1
XYZ(JX,JY,IZL)=0
IZP=JZ+1
XYZ(JX,JY,IZP)=0
RETURN END
c ............................... SIDE *3 ..glycine............................................ c THIS SUBROUTINE SETS -INDEX TO THE EXCLUDED VOLUME ENVELOPE c AND INDEX-2,..... LENFI AT THE SIDE GROUP POSITION. BOTH THE c TERMINUSES ARE CODED -1 (IT IS USED IN ENERGY CALCULATIONS) c
c ....................................................................................
c ONLY THE FCC LATTICE VECTORS ARE ALLOWED TO INTERACT c subroutine setind.f
c includes check for handedness so that all interacting points c are left handed
SUBROUTINE SETIN (XYZ, INDGL,JX,JY,JZ,ID1,ID2,ID3,IND)
INTEGER XYZ(150,150,150),STLX(13),STLY(13),STLZ(13),INDGL C FCC LATTICE VECTORS (AND 000)
DATA STLX /4*0,-1,1,-1,1,-1,1,-1,1,0/
DATA STLY /-1,1,-1,1,1,-1,4*0,-1,1,0/
DATA STLZ /1,-1,-1,1,2*0,1,-1,-1,1,3*0/
if(indgl.eq.0) go to 88
IX=JX+STLX(ID1)
IY=JY+STLY( ID1)
IZ=JZ+STLZ(ID1)
XYZ(IX,IY,IZ)=-1
IIX=JX+STLX(ID2)
IIY=JY+STLY(ID2)
IIZ=JZ+STLZ(ID2)
XYZ(IIX,IIY,IIZ)=-1
IIIX=JX+STLX(ID3)
IIIY=JY+STLY(ID3)
IIIZ-JZ+STLZ(ID3)
XYZ(IIIX,IIIY,IIIZ)=-1
LX=(IX+IIX+IIIX-JX)/2
LY=(IY+IIY+IIIY-JY)/2
LZ=(IZ+IIZ+IIIZ-JZ)/2
XYZ(LX,LY,LZ)=ind
88 XYZ(JX,JY,JZ)=-IND
IXL=JX-1
XYZ(IXL,JY,JZ)=-IND
IXP=JX+1
XYZ(IXP,JY,JZ)=-IND
IYL=JY-1
XYZ(JX, IYL,JZ )=-IND
IYP=JY+1
XYZ(JX,IYP,JZ)=-IND
IZL=JZ-1
XYZ(JX,JY,IZL)=-IND
IZP=JZ+1
XYZ(JX,JY,IZP)=-IND
RETURN
END C
C SIDE *3 ...glycine............................................
C CHECK OF OCCUPANCY - ENTIRE CLUSTER (RESIDUE+SIDE GROUP)
FUNCTION LOOK(XYZ, INDGL,JX,JY,JZ,ID1,ID2,ID3)
LOGICAL LOOK
INTEGER XYZ(150, 150, 150), STLX(13),STLY(13),STLZ(13) C FCC LATTICE VECTORS (AND 000)
DATA STLX /4*0,-1,1,-1,1,-1,1,-1,1,0/
DATA STLY /-1,1,-1,1,1,-1,4*0,-1,1,0/
DATA STLZ /1,-1,-1,1,2*0,1,-1,-1,1,3*0/
LOOK-.FALSE.
C
IF(INDGL.EQ.0) GO TO 88
IX=JX+STLX(ID1)
IY=JY+STLY(ID1)
IZ=JZ+STLZ(ID1)
IF(XYZ(IX,IY,IZ).NE.0) THEN
LOOK=. TRUE.
RETURN ENDIF
IIX-JX+STLX(ID2)
IIY-JY+STLY(ID2)
IIZ-JZ+STLZ(ID2)
IF(XYZ (IIX, IIY,IIZ).NE.0) THEN- LOOK=.TRUE.
RETURN ENDIF
IIIX=JX+STLX(ID3)
IIIY=JY+STLY(ID3)
IIIZ=JZ+STLZ(ID3)
IF(XYZ (IIIX, IIIY,IIIZ).NE.C) THEN
LOOK=. TRUE.
RETURN ENDIF
LX=(IX+IIX+IIIX-JX)/2
LY=(IY+IIY+IIIY-JY)/2
LZ=(IZ+IIZ+IIIZ-JZ)/2
IF(XYZ(LX,LY,LZ).NE.O) THEN- LOOK-. TRUE.
RETURN ENDIF
C
88 IF(XYZ(JX,JY,JZ).NE.0) THEN- LOOK=.TRUE.
RETURN ENDIF
IXL=JX-1
IF(XYZ(IXL,JY,JZ).NE.O) THEN- LOOK=. TRUE.
RETURN ENDIF
IXP=JX+1
IF(XYZ(IXP,JY,JZ).NE.0) THEN
LOOK=.TRUE.
RETURN ENDIF
IYL=JY-1
IF(XYZ(JX,IYL,JZ).NE.0) THEN
LOOK=. TRUE.
RETURN
ENDIF IYP=JY+l
IF(XYZ(JX,IYP,JZ).NE.0) THEN
LOOK=. TRUE.
RETURN
ENDIF IZL=JZ-1
IF(XYZ(JX,JY,IZL).NE.0) THEN
LOOK=. TRUE .
RETURN
ENDIF IZP=JZ+1
IF(XYZ(JX,JY,IZP).NE.0) LOOK=.TRUE. RETURN END
C ..............................side 3 .........glycine .........................................
C THIS FUNCTION COMPUTES THE STRENGTH OF INTERACTIONS BETWEEN THE
C SIDE GROUPS - only the nearest neighbours r=1, for ak102gly.f c all interactions are at a distance 2
c
c program setind.f 5/12/89
FUNCTION ERG(XYZ, INDGL,AM,KSX,KSY,KSZ,J)
DIMENSION AM(150,150)
INTEGER XYZ(150, 150, 150), P1, P2, P3, P4, P5, P6, p7,p8, p9
integer p10,p11,p12
ERG=0.
IF(INDGL.EQ.0) RETURN
IX=KSX-1
JX=KSX+1
IY=KSY-1
JY=KSY+1
IZ=KSZ-1
JZ=KSZ+1
C
c vectors in the z plane
P1=XYZ(JX,jy,KSZ)
IF(P1.GT.0) ERG=ERG+AM(P1,J)
P2=XYZ(jX,iY,KSZ)
IF(P2.GT.0) ERG=ERG+AM(P2,J)
P3=XYZ(iX,jy,KSZ)
IF(P3.GT.0) ERG=ERG+AM(P3,J)
D4=XYZ(iX,iY,KSZ)
IF(p4.GT.0) ERG=ERG+AM(p4,J)
c vectors in the x plane
p5=XYZ(KSX,JY,jZ)
IF(p5.GT.0) ERG=ERG+AM(p5,J)
p6=XYZ(KSX,IY,jZ)
IF(p6.GT.0) ERG=ERG+AK(p6,J)
p7=XYZ(KSX,JY,iZ)
IF(p7.GT.0) ERG=ERG+AM(p7,J)
p8=XYZ(KSX,IY,iz)
IF(p8.GT.0) ERG=ERG+AM(p8,J)
C VECTORS IN THE Y PLANE
P9=XYZ(IX,KSY,JZ)
IF(P9.GT.0) ERG=ERG+AM(P9,J)
P10=XYZ(IX,KSY,IZ)
IF(PIO.GT.0) ERG=ERG+AM(P10,J)
P11=XYZ(JX,KSY,JZ)
IF(P11.GT.0) ERG=ERG+AM(F1I,J)
P12=XYZ(JX,KSY,IZ)
IF(P12.GT.0) ERG=ERG+AM(P12,J)
RETURN END C repulsion only to r2=5
C
C THIS FUNCTION COMPUTES THE STRENGTH OF REPULSIVE INTERACTIONS
c
FUNCTION EREPUL(XYZ, X,Y,Z,AREP)
INTEGER XYZ(150,150,150),X,Y,Z
DATA LO /-1/
l=0
IX=X-1
JX=X+1
IY=Y-1
JY=Y+1
IZ=Z-1
JZ=Z+1
c fcc lattice
IF(XYZ(IX,IY,Z) .LT.L0) I=I+1
IF(XYZ(IX,JY,Z) .LT.L0) 1=1+1
IF(XYZ(JX,IY,Z). LT.L0) I=I+1
IF(XYZ(JX,JY,Z). LT.L0) I=I+1
IF(XYZ(X,IY,IZ).LT.L0) I=I+1
IF(XYZ(X,IY,JZ).LT.L0) I=I+1
IF(XYZ(X,JY,IZ).LT.L0) I=I+1
IF(XYZ(X,JY,JZ).LT.L0) I=I+1
IF(XYZ(IX,Y,IZ).LT.L0) I=I+1
IF(XYZ(IX,Y,JZ).LT.L0) I=I+1
IF(XYZ(JX,Y,IZ).LT.L0) I=I+1
IF(XYZ(JX,Y,JZ).LT.L0) I=I+1
EREPUL=I*AREP
RETURN END
c
c HHYDROGEN BONDING AND "COOPERATIVITY" (BETA AND ALPHA MOTIFFS)
FUNCTION EHB(XYZ, ICA, PRODV, IX, IY, IZ, ID,AHB)
INTEGER XYZ(150,150,150),ICA(0:150),PRODV(24,24)
DATA LO /-1/
1-0
IXL=IX-3
IXP=IX+3
IYL=IY-3
IYP=IY+3
IZL=IZ-3
IZP=IZ+3
IC1=ICA(ID-l)
IC2=ICA(ID)
IF(XYZ(IXL,IY,IZ).LT.L0) THEN
IDD—XYZ( IXL, IY, IZ)
IN1=ICA(IDD-1)
IN2=ICA(IDD)
I=I+PRODV(IC1, IN1)+PRODV(IC1, IN2)+PRODV(IC2, IN1)+PRODV(IC2, IN2)
ENDIF
IF(XYZ(IXP,IY,IZ).LT.L0) THEN
IDD=-XYZ(IXP,IY,IZ)
IN1=ICA(IDD-1)
IN2=ICA(IDD)
I=1+PRODV(IC1, IN1)+PRODV(IC1, IN2)+PRODV( IC2 , IN1)-PRCDV(IC2, IN2)
ENDIF C
IF(XYZ(IX,IYL,IZ).LT.L0) THEN
IDD=-XYZ(IX, IYL, IZ)
IN1=ICA(IDD-1)
IN2=ICA(IDD)
I=I+PRODV(IC1, IN1)+PRODV(IC1, IN2)+PRODV(IC2,IN1)+PRCDV(IC2, IN2)
ENDIF C
IF(XYZ(IX,IYP,IZ).LT.L0) THEN
IDD=-XYZ(IX,IYP,IZ)
IN1=ICA(IDD-1)
IN2=ICA(IDD)
I=I+PRODV(IC1, IN1)+PRODV(IC1, IN2)+PRODV(IC2, IN1)-PRODV(IC2, IN2)
ENDIF C
IF(XYZ(IX,IY,IZL) .LT.L0) THEN
IDD=-XYZ(IX, IY, IZL)
IN1=ICA(IDD-1)
IN2=ICA(IDD)
I=I+PRODV(IC1, IN1)+PRODV(IC1, IN2)+PRODV(IC2, IN1)+PRODV(IC2, IN2)
ENDIF C
IF(XYZ(IX,IY,IZP).LT.L0) THEN
IDD=-XYZ(IX,IY,IZP)
INl=ICA(IDD-1)
IN2=ICA(IDD)
I=I+PRODV(ICl, IN1)+PRODV(ICl, IN2)+PRODV(IC2, IN1)+PRODV(IC2, IN2)
ENDIF
Figure imgf000109_0001
Figure imgf000110_0001
APPENDIX E SAMPLE TERTIARY INTERACTION TABLE
Figure imgf000111_0001
Figure imgf000112_0001
Figure imgf000113_0001
Figure imgf000114_0001
Figure imgf000115_0001
Figure imgf000116_0001

Claims

What is Claimed is:
1. Method of determining by a machine a three-dimensional structure of a protein or portion thereof including sidechains, the method comprising the steps of:
specifying a sequence of amino acid residues whose native tertiary structure is to be determined;
specifying local conformation preferences for respective residues of the sequence, and
representing tertiary interactions between all pairs of sidechains;
specifying a temperature;
automatically generating a representation of an unfolded chain of the residues in three
dimensions;
simulating in the machine folding of the chain and interactions between all pairs of
sidechains, in accordance with said conformation preferences and said temperature, and producing a representation of a corresponding native tertiary structure; and
displaying the representation of the tertiary structure.
2. The method of claim 1 where the step of displaying includes the step of presenting the tertiary structure as a three-dimensional
representation.
3. The method of claim 1 where the step of simulating includes the steps of stopping the
simulating operation at an intermediate stage, specifying another temperature, and resuming the simulating operation.
4. Method of determining a three- dimensional conformation of a globular protein utilizing Monte Carlo dynamics technique with
asymmetric Metropolis sampling criterion, the method comprising the steps of:
specifying a sequence of amino acid residues of the protein;
generating a three-dimensional
representation of an unfolded conformation consisting of an α-carbon backbone and sidechains in response to the specified sequence;
producing from the unfolded conformation, using said technique, successive likely conformations at a predetermined temperature according to the total energy of each conformation;
selecting from the successive likely conformations the lowest total-free-energy tertiary conformation which satisfies said criterion; and
determining the coordinates of the selected tertiary conformation for display.
5. The method of claim 4 where the step of producing includes the step of determining local conformational energetic preferences of the α- carbons.
6. The method of claim 5 where the step of producing includes the step of identifying spatially close pairs of sidechains in each conformation.
7. The method of claim 6 where the step of producing includes the step of simulating tertiary interactions between said spatially close pairs.
8. The method of claim 7 where the step of producing includes the step of determining the sum of the effective interaction contact energy between respective close pairs based on predetermined
frequency of contact between said pairs.
9. The method according to claim 8 where the step of determining the effective interaction contact energy includes the step of scaling said energy to a selected lowest level by referencing average interaction contact energies of non-polar residues to a hydrophobicity scale.
10. A computer-based model for
representing, in a three-dimensional cartesian coordinate system, a conformation of a protein or portion thereof, including the protein's α-carbon backbone and sidechains of finite surface area, as the protein folds from an unfolded sequence of amino acid residues to a folded tertiary structure, the model comprising:
a cubic arrangement of lattice sites disposed for framing the conformation;
the cubic arrangement being represented by unit vectors (±1,0,0), (0,±1,0), (0,0,±1), where the distance between adjacent lattice sites is unity; and where
each α-carbon occupies a lattice site located at a distance of 75 units from its adjacent α-carbon along a (±2,±1,0) vector or cyclic
permutation thereof.
11. The model of claim 10 where said cubic arrangement is a 24-nearest neighbor lattice.
12. The model of claim 11 where each α- carbon is represented as occupying a central cubic lattice site plus six adjacent cubic lattice sites defining a surface of interaction of finite size, and each sidechain is represented as being embedded in the lattice and occupying a selected number of lattice sites located relative to said central site, the number of sites occupied by the sidechain being proportional to the number of sites defining the surface of interaction.
13. A computer-based system for determining a three-dimensional structure of a protein or portion thereof including sidechains, the system comprising:
input means for specifying a sequence of amino acid residues whose native tertiary structure is to be determined, and for specifying a temperature and local conformation preferences for respective residues of the sequence;
first memory means for storing the specified sequence, temperature, and conformation preferences;
second memory means having a stored program with routines for performing Monte Carlo dynamics simulation with asymmetric Metropolis sampling criterion and for representing tertiary interactions between all pairs of the sidechains;
processing means coupled to the input, and first and second memory means, and responsive to the specified sequence, temperature, and conformation preferences for, under control of the stored program, generating a first set of coordinates representing a conformation of an unfolded chain of the residues in three dimensions, determining a total free
interaction energy from tertiary interactions between all pairs of the sidechains, simulating folding of the chain at the specified temperature and in accordance with said conformation preferences and total interaction energy, and producing a second set of coordinates representing a native tertiary structure; and
display means coupled to the processing means for displaying the second set of coordinates depicting the native tertiary structure in three dimensions.
14. An apparatus for determining a three- dimensional structure of a selected protein including a plurality of α-carbons comprising:
means for storing a representation of a selected sequence of amino acid residues of the protein and an initial starting temperature value;
means for generating a representation of a cubic arrangement of lattice sites, including means for positioning adjacent sites a unit distance from one another and means for positioning a plurality of α-carbons on selected lattice sites, each α-carbon located a distance on the order of √5 from an
adjacent α-carbon;
means for combining said generated representation of said cubic arrangement with said representation of said selected stored sequence;
means for producing, in response to said temperature and in accordance with said cubic
arrangement, a representation of one or more folded, three-dimensional protein structures; and
means for comparing said produced
representation of three-dimensional protein structure to a predetermined criterion and for selecting one of said produced representation for display only in response to a predetermined comparison result.
15. An apparatus as in claim 14 including means for interrupting said producing, for storing a new temperature value and re-initiating said
producing.
PCT/US1991/002786 1990-04-24 1991-04-23 System and method for determining three-dimensional structures of proteins WO1991016683A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US51391890A 1990-04-24 1990-04-24
US513,918 1990-04-24

Publications (1)

Publication Number Publication Date
WO1991016683A1 true WO1991016683A1 (en) 1991-10-31

Family

ID=24045109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/002786 WO1991016683A1 (en) 1990-04-24 1991-04-23 System and method for determining three-dimensional structures of proteins

Country Status (5)

Country Link
JP (1) JPH05501324A (en)
AU (1) AU7883791A (en)
IE (1) IE911347A1 (en)
PT (1) PT97480A (en)
WO (1) WO1991016683A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000004380A1 (en) * 1998-07-15 2000-01-27 Jerini Bio Tools Gmbh Determination of ligands for proteins

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005234699A (en) * 2004-02-17 2005-09-02 Yokohama Tlo Co Ltd Apparatus and method for estimation of spatial structure of protein by cellular automaton
JP5734586B2 (en) * 2010-07-12 2015-06-17 公益財団法人野口研究所 Sugar chain structure recognition analysis method, sugar chain structure recognition analyzer and program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4704692A (en) * 1986-09-02 1987-11-03 Ladner Robert C Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides
US4853871A (en) * 1987-04-06 1989-08-01 Genex Corporation Computer-based method for designing stablized proteins
US4881175A (en) * 1986-09-02 1989-11-14 Genex Corporation Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides
US4908773A (en) * 1987-04-06 1990-03-13 Genex Corporation Computer designed stabilized proteins and method for producing same
US4939666A (en) * 1987-09-02 1990-07-03 Genex Corporation Incremental macromolecule construction methods
US4985827A (en) * 1987-07-29 1991-01-15 Hitachi, Ltd. Computer for synchronized read and write of vector data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4704692A (en) * 1986-09-02 1987-11-03 Ladner Robert C Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides
US4881175A (en) * 1986-09-02 1989-11-14 Genex Corporation Computer based system and method for determining and displaying possible chemical structures for converting double- or multiple-chain polypeptides to single-chain polypeptides
US4853871A (en) * 1987-04-06 1989-08-01 Genex Corporation Computer-based method for designing stablized proteins
US4908773A (en) * 1987-04-06 1990-03-13 Genex Corporation Computer designed stabilized proteins and method for producing same
US4985827A (en) * 1987-07-29 1991-01-15 Hitachi, Ltd. Computer for synchronized read and write of vector data
US4939666A (en) * 1987-09-02 1990-07-03 Genex Corporation Incremental macromolecule construction methods

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000004380A1 (en) * 1998-07-15 2000-01-27 Jerini Bio Tools Gmbh Determination of ligands for proteins

Also Published As

Publication number Publication date
AU7883791A (en) 1991-11-11
IE911347A1 (en) 1991-11-06
JPH05501324A (en) 1993-03-11
PT97480A (en) 1993-05-31

Similar Documents

Publication Publication Date Title
US5265030A (en) System and method for determining three-dimensional structures of proteins
MacCallum Striped sheets and protein contact prediction
Westhead et al. Protein structural topology: Automated analysis and diagrammatic representation
Rose et al. Protein folding: predicting predicting
Yang et al. An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments
Dandekar et al. Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions
JP2016539443A (en) System and method for using paired-end data in a directed acyclic structure
JP2002536301A (en) Protein modeling tools
CA2881934C (en) Systems and methods for sampling and analysis of polymer conformational dynamics
Minary et al. Conformational optimization with natural degrees of freedom: a novel stochastic chain closure algorithm
JP2010517195A (en) Methods, systems, algorithms and means for the description of possible structures of actual and theoretical proteins, and for the evaluation of actual and theoretical proteins with respect to folding, overall shape and structure motifs
US5600571A (en) Method for determining protein tertiary structure
Chitturi et al. Compact structure patterns in proteins
Kolinski et al. Monte Carlo studies of the thermodynamics and kinetics of reduced protein models: Application to small helical, β, and α/β proteins
duVERLE et al. CaMPDB: a resource for calpain and modulatory proteolysis
Carlacci et al. The loop problem in proteins: a Monte Carlo simulated annealing approach
MacCarthy et al. Advances in protein super-secondary structure prediction and application to protein structure prediction
WO1991016683A1 (en) System and method for determining three-dimensional structures of proteins
de Brevern An agnostic analysis of the human AlphaFold2 proteome using local protein conformations
Trosset et al. Flexible docking simulations: Scaled collective variable Monte Carlo minimization approach using Bezier splines, and comparison with a standard Monte Carlo algorithm
Kondov Protein structure prediction using distributed parallel particle swarm optimization
Zhou et al. Prediction of one-dimensional structural properties of proteins by integrated neural networks
US20220284987A1 (en) Prediction device, trained model generation device, prediction method, and trained model generation method
Ashraf et al. Computational analysis of non-coding RNAs in Alzheimer's disease
Biro Seven fundamental, unsolved questions in molecular biology: Cooperative storage and bi-directional transfer of biological information by nucleic acids and proteins: an alternative to “central dogma”

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA FI JP NO

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

NENP Non-entry into the national phase

Ref country code: CA