The study of protein dynamics using Elastic Network Models (See Figs. 1-5 and Table I). We will develop a mixed coarse-grained method where the ‘interesting’ or functional parts of proteins are modeled at a higher resolution than the remainder of the structure. By using this approach, normal mode analysis can be performed to discern the important functional motions with high computational efficiency for large biologically important molecules. This model will allow project investigators to focus on the details of the functionally most important parts of these molecules, and study the stability and dynamics associated with their function. This approach has potential significant and practical applications to drug design and to cell simulations. Application of this method furthers the study of large-scale motions in biological molecules, specifically to study domain swapping. We will study diphtheria toxin (DT) – a protein that is subject to dimerization and examine motions involved in such domain swapping in DT and several other systems.

We developed a mixed coarse-grained method where the ‘interesting’ or functional parts of proteins are modeled at a higher resolution than the remainder of the structure and applied this to triose phosphate isomerase (Fig. 1). Normal mode analysis was performed to discern the important functional motions. Among a number of new applications of the elastic network approach being developed is its use for the study of proteins having multiple binding sites (Fig. 2). This method was also applied to the study of large-scale motions in biological molecules, including domain swapping. We studied domain swapping in a number of proteins and developed new ways to identify domain swapping hinge sites (Fig. 3, Fig. 4 and Table I). Other applications included a general treatment of conformational transitions where parts of the structure are taken to be rigid (Fig. 5). Three papers were published (1-3) and two others are in draft manuscript form.

The elastic network models for describing the motions of proteins are finding broad application now by us and many others, from use in structure refinement to predicting protein conformational transitions. These studies will aid in developing methods for predicting conformational changes from a single structure.

Related Papers

1. Kloczkowski, A., Sen, T.Z and Jernigan, R.L., Promiscuous vs. native protein function. Insights from studying collective motions in proteins by elastic network models, J. Biomol. Struct. Dyn., 22, 621-624, 2005.
2. Kim, M.K., Jernigan, R.L. and Chirikjian, G.S. Rigid-cluster models of conformational transitions in macromolecular machines and assemblies, Biophys. J., 2005, in press.
3. Kundu, S. and Jernigan, R.L., Molecular mechanism of domain swapping in proteins: an analysis of slower motions, Biophys. J., 86, 3846-3854, 2004.

Develop an extremely efficient transfer matrix method for attrition-free generation of lattice proteins on the square lattice in 2-dimensions and for the cubic lattice in 3-dimensions (See Fig. 6 and Table II). The proposed method is an extension of the transfer matrix method for generating and enumerating compact self-avoiding walks on lattices previously developed by the project’s investigators. In the original method, only the number of chain conformations was calculated. We will extend this method by incorporating the potentials of interactions between the nearest-neighbor contacts on the lattice. We will use the hydrophobic-polar (HP) model for the detailed calculations. In the future this approach will be extended to encompass the full twenty letters amino acid alphabet by using the Miyazawa-Jernigan-types of contact potentials. We will reformulate the transfer matrix method by applying the technique of direct products of matrices developed by Jernigan and Flory for the statistical mechanics of polymer chains. We will also develop coarse-graining for these protein lattice models.

In the original method, only the number of chain conformations was calculated. We have now extended this method by incorporating the potentials for interactions between the nearest-neighbor contacts on the lattice, to investigate the variation of simple potentials used for the Elastic Network Models (Fig. 6). This establishes an important new connection between the two which had not been previously planned. This was viewed as more important than the original planned applications. In addition we have developed the facility to factor such pairwise potentials into functional dependences on the hydrophobicities and charges of the individual residues (Table II), which is important for the planned applications of this lattice generation approach to realistic proteins. This will allow us to extend the transfer matrix approach in more general ways than had originally been anticipated. Two related papers are given below.

The lattice transfer matrix method for efficiently generating large numbers of coarse-grained protein structures will lead to a way to make initial predictions of the protein structure family for new sequences.

Related Papers

1. Kloczkowski, A., Sen, T.Z. and Jernigan, R.L.: The transfer matrix method for lattice proteins - an application with cooperative interactions, Polymer, 45, 707-716, 2004.
2. Pokarowski, P., Kloczkowski, A., Jernigan, R.L., Kothari, N.S., Pokarowska, M. and Kolinski, A., Inferring ideal amino acid interaction forms from statistical protein contact potentials, Proteins: Struct. Funct. Bioinf., 59, 49-57, 2005.

Conduct an off-lattice study of the dependencies between protein shapes and their conformations (See Figs. 7-11). The goal is to generate libraries of possible three-dimensional protein structures using a minimal set of assumptions. We constrict the shape of the protein within a three-dimensional ellipsoid of revolution and generate all possible compact protein conformations within the shape. The generation of structures will not be fully random; rather we will use a bias towards secondary structures of a-helices and b-sheets. Biased generation of conformations is quite realistic, since proteins (due to their evolutionary origin) contain larger amounts of secondary structures than would result from the random packing. We will study the dependence between the shape of the ellipsoid and the compact structures generated inside this shape. We will extend the transfer matrix method developed for lattice proteins to these off-lattice models of proteins contained within the ellipsoids. We will also study in a systematic way the interdependence between the structures and the shapes of the proteins, as well as their dynamics with Elastic Network Models.

The original goal was to generate libraries of possible three-dimensional protein structures using a minimal set of assumptions. Prior to this, we decided that two other aspects of protein structure were important to investigate. Packing density is a critical parameter for these chain generations in compact spaces, so we investigated the relationship between amino acid packing density and sequence conservation (Fig. 7). In addition, we viewed the prior generation of a range of shapes important as a step preceding embarking upon the generation of the chain conformations, and thus we began an investigation of the shapes of a set of protein structures, in ellipsoid, convex hull, and Delaunay tesselated representations (Fig. 8). This has been an extensive investigation of protein shapes that also included computations of surface areas and volumes (Fig. 9). In addition we have exhaustively generated conformations within one size of ellipsoid (Fig. 10) as originally planned. The generation of structures was not fully random but relied on specific biases towards overall fractions of a-helices and b-sheets. Biased generations of conformations are realistic, since proteins (due to their evolutionary origin) contain larger amounts of secondary structures than would result from the packing of all possible random conformations. Our promising results are beginning to show how these sets of conformations can be used for predictions of structure (Fig. 11). One paper related to this is in Ref. (6).

The strong relationship between packing density and sequence conservation that we uncovered (Fig. 7) provides an important new way to consider sequence conservation in structures, not for separate residues but for clusters of residues, and points toward better approaches for combining sequence conservation information with a broad range of protein structural computations and predictions. The conformation generation within compact shapes can aid in structure prediction. We have already seen cases where the generation within an ellipsoid suggests native-like structures (Fig. 11). Because of the loose fit to the native structure within the ellipsoid it would appear that these results do not depend strongly upon the details of the confining shape.

During the next period, we will study the interdependence between the shape of the ellipsoid and the compact structures generated inside a shape. We will extend the transfer matrix method developed for lattice proteins to these off-lattice models of proteins contained within the ellipsoids and extend the off-lattice conformation generation within an ellipsoid for a set of 25 larger proteins and investigate better ways to select native-like conformations. We will also study in a systematic way the interdependence between the structures and the shapes of the proteins, as well as their dynamics with the Elastic Network Models. Also we will begin investigating how significantly the detailed roughness of the surface as given by the convex hull representation (see Fig. 8) may affect the conformations generated, in comparison with their generation within smooth ellipsoids. We will continue developing ways to select conformations from this set with various coarse-grained potentials (see Fig. 11), with the eventual aim of making structure predictions.

Related Paper

1. Liao, H., Yeh, W., Chiang, D., Jernigan, R.L. and Lustig, B. Protein sequence entropy is closely related to packing density and hydrophobicity. Prot. Eng. Des. Select, 18, 59-64, 2005.

Twelve other recent papers all relating to coarse-grained molecular models.

(For summaries see Figs. 12-15 and Tables III-V)

1. Sen, T.Z., Kloczkowski, A., Jernigan, R.L., Yan, C., Honavar, V., Ho, K.M., Wang, C.Z., Ihm, Y., Cao, H., Gu, X. and Dobbs, D., Predicting binding sites of hydrolase-inhibitor complexes by combining several methods, BMC Bioinformatics, 5, 205, 2004.

2. Miyazawa, S. and Jernigan, R.L., How effective for fold recognition is a potential of mean force that includes relative orientations between contacting residues in proteins? J. Chem. Phys., 122, 024901, 2005.

3.Aimin, Y. and Jernigan R.L., How do sidechains orient globally in protein structures? Proteins, 2005, in press.

4. Cheng, H., Sen, T.Z., Kloczkowski, A., Margaritis, D. and Jernigan, R.L., Prediction of protein secondary structure by mining structural fragment database, Polymer, 2005, in press.

5. Kloczkowski, A., Sen, T.Z, and Sharaf, M.A., The largest eigenvalue method for stereo regular vinyl chains, Polymer, 2005, in press.

6. Sen, T.Z., Jernigan, R.L., Garnier J., and Kloczkowski, A., GOR V server for protein secondary structure prediction, Bioinformatics, 2005, in press.

7. Plewczynski, D., Jaroszewski, L., Godzik, A., Kloczkowski, A., and Rychlewski, L., Molecular modeling of phosphorylation sites in proteins using database of local structure segments., J. Mol. Model., 2005, in press.

8. Plewczynski, D., Tkacz, A., Wyrwicz, L.S., Godzik, A., Kloczkowski, A. and Rychlewski, L., The Support Vector Machine classification of linear functional motifs in proteins, J. Mol. Mod., 2005, in press.

9. Kloczkowski, A. and Kolinski, A., Theoretical models and simulations for polymer chains, In: J. E. Mark (Editor), Physical Properties of Polymers Handbook, 2nd Edition, New York, Springer Verlag, 2005, in press.

10. Kloczkowski, A. and Sen, T.Z., Magnetic, Piezoelectric, pyroelectric and ferroelectric properties of synthetic and biological polymers, In: J. E. Mark (Editor) Physical Properties of Polymers Handbook, 2nd Edition, New York, Springer Verlag, 2005, in press.

11. Sen, T.Z. and Jernigan, R.L., Optimizing cutoff distances and spring constants for the Gaussian Network Model of ATP-binding proteins, in “Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems,” CRC Press, in press.

12. Sen, T.Z, Sharaf, M.A., Mark, J.E., and Kloczkowski, A., Modeling the elastomeric properties of stereo-regular polypropylenes in nanocomposites with spherical fillers, Polymer, 2005, in press.

Table I. Comparison of Locations of Hinges in Proteins.
Two methods can be used to identify the location of hinges. When the GNM modes of motion cross the zero axis this divides a structure into multiple domains, each moving in the opposite direction, i.e., a hinge. This measure is good for monitoring changes in the displacements that are large, but another measure is better to follow the relatively small-scale motions at hinge sites. The change in the sum of internal distances for each residue is an appropriate parameter for identifying the locations and motions of hinges during transitions (Hinsen, 1998; Hinsen et al., 1999).

Protein name

Monomer
(PDB)

Oligomer
(PDB)

Hinge Location
(Literature)*

Hinge Location
(GNM)+

Hinge Location
(eq. in Table Heading)

Barnase

1brn

1yvs

37-41

39

39

Calbindin

4icb

1ht9

38-47

41

43

Cro

1orc

1cro

55

53

56

Cyanovirin-N

2ezm

3ezm

50-53

52

52

Diphtheria toxin

1mdt

1ddt

379-387

387

381

Human prion

1qlx

1i4m

188-198

192

195

Protein L B1 domain

1hz5

1jml

52-55

47

53

RNase A N-terminal

5rsa

1a2w

15-22

24

22

RNase A C-terminal

5rsa

1f0v

112-115

108

112

Phosphorylated N-Spo0A

1qmp

1dz3

103-109

106

105

Suc1

1sce

1puc

85-91

84

87

CksHs1

1dks

1cks

60-65

37

62

IFN-b

1rmi

1ilk

108-118

116

111

* Liu, Y., and D. Eisenberg. 2002. 3D domain swapping: as domains continue to swap. Protein Sci. 11:1285–1299.

+Slowest mode of GNM

(from Kundu, S. and Jernigan, R.L. Molecular Mechanism of Domain Swapping in Proteins: An Analysis of Slower Motions Biophys. J. 2004, 3846–3854.

Text Box: Table II. Singlet Approximations of Contact Energies.

 

Matrix

Hp

Hp.Dx

Hp.pH

pH.Hp

Hp.Dx.pH

 

 

cor

err

cor

err

cor

err

cor

err

cor

err

 

MJPL

88

11

90

10

90

10

95

7

95

7

 

MJ3h

86

49

93

37

88

46

96

28

96

28

 

TEl

59

80

61

79

66

75

71

70

71

70

 

B1

21

98

68

73

50

72

83

55

83

55

 

VD

29

96

47

88

45

90

55

83

59

80

 

 

 

 

 

 

 

 

 

 

 

 

 

 

50

 

60

 

70

 

80

 

90

 

100

Table III. Predicting protein binding sites. Overall classification performance results averaged over 7 proteins showing results for Sensitivity+, Specificity+, overall Sensitivity, overall Specificity, and Correlation Coefficient averaged over the 7 proteins in the dataset. <>pdenotes averaging over the total number of proteins and <>r over the total number of residues. Protein-protein interactions play a critical role in protein function. Identification of protein-protein interaction sites and the detection of specific amino acids that contribute to the specificity and the strength of protein interactions is an important problem with broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. In order to increase the power of predictive methods for protein-protein interaction sites, we have developed a consensus methodology that combines four different methods: data mining using Support Vector Machines, threading through protein structures, prediction of conserved residues on the protein surface by analysis of phylogenetic trees, and the Conservatism of Conservatism method of Mirny and Shakhnovich. Results obtained for a dataset of hydrolase-inhibitor complexes demonstrate that the combination of all four methods yield improved predictions over the individual methods. This is illustrated below where we compare the performance of individual prediction methods with the consensus method, where the gains are significant. (from Sen, T.Z., Kloczkowski, A., Jernigan, R.L., Yan, C., Honavar, V., Ho, K.-M., Wang, C.Z., Ihm, Y., Cao, H., Gu, X., Dobbs, D.L., Predicting binding sites of protease-inhibitor complexes by combining several methods, BMC Bioinformatics, 5, 205, 2004.)

Method

<Sen+>p

<Spe+>p

<Spe>p

<Spe>r

<Sen>p

<Sen>r

<Cor>p

<Cor>r

Phylogeny

0.39

0.71

0.90

0.89

0.91

0.89

0.43

0.37

COC

0.71

0.31

0.89

0.88

0.81

0.80

0.38

0.37

SVM

0.51

0.41

0.89

0.88

0.88

0.88

0.39

0.37

Threading

0.59

0.57

0.91

0.89

0.92

0.91

0.53

0.48

Consensus

0.70

0.56

0.92

0.91

0.90

0.89

0.56

0.55

Table IV. Prediction of phosphorylation sites in proteins. The mean value and the standard deviation of the C score for the set of segments confirmed and not confirmed to be phosphorylated by PKA or PKC kinases. For each segment we calculate the combined sequence and structure probability score C

,

Here Sikj is the normalized sequence preference for the k-th type of amino acid at the i-th residue in a segment around a phosphorylation site (i=j/2+1). Qisjrepresents the corresponding normalized structural preference for the –th structural state at the i-th residue in a segment. The Sikj and Qisj are computed separately for each type of phosphorylation process (in our case, for sites acted upon by PKA and PKC kinases). The C score gives the likeness for phosphorylation of a given residue. We assume that a potential phosphorylation site should have a C score higher than a specified cut-off value C0. The cut-off values for different phosphorylation processes should be different.

In calculations (to remove noise) we only use segments having C scores larger then the cut-off value 0.1. We also show the minimum and the maximum values obtained for each dataset. A new bioinformatics tool for molecular modeling of local structure around phosphorylation sites in proteins has been developed. Our method is based on a library of short sequence and structure motifs. Basic structural elements to be predicted are local structure segments (LSSs). That enables us to avoid a problem of non-exact local description of structures, caused by either a diversity of the structural context, or uncertainties in prediction methods. We have developed a library of LSSs and a profile-profile matching algorithm that predicts local structures of proteins from their sequence information. Our fragment library prediction method server (FRAGlib) is publicly available online at http://ffas.ljcrf.edu/Servers/frag.html. The algorithm has been successfully applied for the characterization of local structure around phosphorylation sites in proteins. Our computational predictions of sequence and structure preferences around phosphorylated residues have been confirmed by phosphorylation experiments for PKA and PKC kinases. Quality of predictions has been evaluated by various independent statistical tests. We have observed significant improvement in the accuracy of prediction by incorporating the structural information into the local description of the neighborhood of the phosphorylated site. Our results strongly suggest that sequence information should be supplemented by additional structural context information (predicted by our segment similarity method) for successful predictions of phosphorylation sites in proteins. The accuracy in prediction of phosphorylation sites by PKA and PKC kinases is shown below. (from Plewczynski, D., Jaroszewski, L., Godzik, A., Kloczkowski, A., and Rychlewski, L, Molecular modeling of phosphorylation sites in proteins using database of local structure segments, Journal of Molecular Modeling, 2005, in press.)

 

Phosphorylation Type

Mean Value

Standard Deviation

Minimal/Maximal values

confirmed

not

confirmed

confirmed

not confirmed

confirmed

not confirmed

PKA

0.26

0.154

0.032

0.074

0.144/0.338

0.1/0.331

PKC

0.205

0.142

0.028

0.049

0.111/0.295

0.1/0.295

Related

Text Box: Table V.

Figure 1. Investigation of Mixed Coarse Graining for TIM. (A) B factors computed for triose phosphate isomerase at two different levels of coarse-graining. The solid TPH N curve is for a 1 point per residue uniformly grained model, and the MCG dashed curve is the same model except that atoms are included for the active site and substrates. Close correspondence can be seen between the two curves.

 

(B) Comparison between the modes of the mixed coarse-grained elastic network model and the uniform coarse-grained model, showing that the slowest modes correspond closely between the two representations. (O. Kurkcuoglu, P. Doruker and R.L. Jernigan, unpublished).

(A)

(B)

Text Box: Figure 2

Figure 3. Domain swapping in diphtheria toxin. (A) shows the monomeric state (closed), (B) is the corresponding cartoon; (C) shows the monomer in the dimeric state (open), (D) is the corresponding cartoon; (E) shows the dimer in the dimeric state (two open monomers intertwined), and (F) is its corresponding cartoon. For these structures there is an axis of rotation perpendicular to the linking segment (shown as the z axis) about which a rotation takes place during the transition to the dimer with a additional slight twist along the x axis. (from Kundu, S. and Jernigan, R.L., Molecular mechanism of domain swapping in proteins: an analysis of slower motions, Biophys. J., 86, 3846-3854, 2004.)

Text Box: RMSD

residue index

Text Box: Figure 4

Figure 5. Motions of the Chaperonin Protien GroEL with rigid domains. (A) Division of GroEL subunit into three rigid parts based on RMSD between two forms. (B) Whole assembled GroEL/GroES with one subunit shown in red. (C) The subunit showing the apical domain at the top in yellow, the intermediate domain in purple in the middle and the equatorial domain in brown at the bottom. (D) Side view and (E) top view of the GroEL assembly undergoing the computed transition between its two forms. (from Kim, M.K., Jernigan, R.L. and Chirikjian, G.S. Rigid-cluster models of conformational transitions in macromolecular machines and assemblies, Biophys. J., in press.)

Figure 6. The statistical average <Nh >as a function of h, the weight for hydrophobic contact pairs, for several different values of EHP (in units RT), where < Nh> is the average number of H-H interacting pairs and EHP is the interaction energy for H-P contact pairs. These are all circuits (no ends) on the square lattice within rectangles of size 4´10. Hydrophobicity and polarity are defined by physical location in the space - all 2´(n-2) residues interior in the rectangle are hydrophobic, while all 2n+4 residues on the “surface” of the rectangle are polar. This figure shows characteristic sigmoidal shapes similar to the Zimm-Bragg helix-coil transition. The average value < Nh> (calculated with the use of statistical weights) is a measure of the extent of hydrophobic interactions. This shows that the hydrophobic affinity increases with h. The transfer matrix method enables the generation of all conformations within the rectangle of size 4´n. The details of generation of conformations are provided in our paper. Three distinct non-bonded interactions are utilized depending on the types of nodes that involved in the interaction: hydrophobic, polar, and mixed with non-bonded energies EHH, EPP, and EHP respectively. The statistical weights for these interactions are defined as >h, p, and m, as

   

A matrix formulation is used to count the non-bonded interactions. The main advantage of the method is the extremely efficient attrition-free generation and enumeration of compact conformations. We have shown for compact conformations that the growth of the chain in a piecewise way, cross-section by cross-section, is much more efficient than the traditional linear chain growth. We have extended the method by including information about the amino acid sequence, during the generation of conformations. We developed a Zimm-Bragg like theory of hydrophobic cluster formation by using the transfer matrix method. We have shown that the transfer matrix approach to the generation and averaging over chain conformations can be formulated as an algebraic problem. It is worth mentioning that the sigmoidal character of the plots is nearly universal and occurs for many values of these parameters. (from Kloczkowski, A., Sen, T.Z. and Jernigan, R.L.: The transfer matrix method for lattice proteins - an application with cooperative interactions, Polymer, 45, 707-716, 2004.)

Figure 7. Relationship between sequence variability, expressed as sequence entropy (for each sequence position, the sum of p ln p for each of the 20 types of residues from a multiple sequence alignment) and inverse residue packing density. These results are shown for a set of 113 proteins. Note that there are two different regions seen in the figure on the left - a nearly linear region for high packing densities and a more constant region for lower packing densities, except that at extremely low packing densities a broad range of values are seen. In the right figure, the data for the higher packing densities have been fit with a straight line. (from Liao, H., Yeh, W., Chiang, D., Jernigan, R.L. and Lustig, B., Protein sequence entropy is closely related to packing density and hydrophobicity. Prot Eng Des Select, 18, 59-64, 2005.)

Figure 8. Sample of Protein Shapes Computed for Different Levels of Detail. Rows (A) and (C) are based on all heavy atoms and rows (B) and (D) are based on Ca atoms only. Rows (A) and (B) structures, left to right are for crambin (PDB:1cnr), guanylate binding protein-1(1dg3), topoisomerase I (1ecl), heat stable enterotoxin (1etl); Rows (C) and (D are cytochrome P450Nor (1f24), germin (1fi2), cobalamin transporter (1nqe), peptide F (octadecapeptide 1pef). For small structure the differences between the two levels of granularity are extreme - see especially 1pef. In the coarser grained structures, there is some loss of surface roughness and an overall shrinking. We have also generated surfaces using Delaunay tessellations, and these closely resemble those shown here generated as convex hulls. An additional 65 proteins have also been studied, but all PDB structures will be used and results placed on a web site. (R.L. Jernigan, A. Kloczkowski and D.T. Flatow, unpublished.)

Figure 9. Dependence of surfaces and volumes on the number of heavy atoms for different surface representations with a set of 72 proteins. The smallest ellipsoid enclosing each structure has the largest volume and the smallest surface area. The atom-based convex hull has both a larger surface area and a larger volume than the Ca-based residue surface area. The smoother surfaces of the residue level structures have an advantage of showing less variability than the atom-level structures. (D.T. Flatow, A. Kloczkowski and R.L. Jernigan, unpublished.)

(A)

 

 

 

(B)

 

 

 

(C)

 

 

 

(D)

 

 

 

Figure 10. Samples of protein conformations generated inside an ellipsoid. Each protein has 70 Ca atoms, with different biases for secondary structures. A total of 70,000 conformations was generated. Many of these are protein-like in appearance. Biases for each row for helix were different: in row (A) the bias was 0% helix, in row (B) 20% helix, in row (C) 40% helix, and in row (D) 50% helix. Helices shown as red cylinders, and strands as blue arrows. (R.L. Jernigan, A. Kloczkowski and D.T. Flatow, unpublished.)

Text Box: Figure 11

Text Box: Figure 12

Text Box: Figure 13

Text Box: Figure 15