ECCC-3 Contents Page THEOCHEM Home Page Elsevier Chemistry Home Page

Multiple regression analysis of the beta-sheet propensity of amino acids

Tomoko Sotomatsu Niwa*, Akio Ogino

Research Laboratory
Nippon Shinyaku Co., Ltd.
Nishiohji Hachijo Minami-ku Kyoto, 601 Japan


Table of Contents

Greek characters are replaced by Roman characters to avoid anchoring too many image files in the text.


Abstract

To study the nature of the beta-sheet propensities of naturally occurring amino acids, multiple regression analyses were performed with the use of various physicochemical parameters and experimentally determined physicochemical properties of amino acids. The beta-sheet propensities used for the analyses were free energies changes in the zinc-figure peptide and IgG-binding domain of protein G taken from the literature. Correlation equations of high quality were obtained for all data sets. Steric parameters of the amino acid side chains and the nuclear magnetic resonance (NMR) chemical shift of the alpha carbon of amino acids were statistically highly significant in all cases. The addition of an indicator variable expressing the electronic effects of amino acids improved the correlations in some cases. From the correlation equations, the beta-sheet propensities were shown to be determined mainly by the electronic and steric effects of the amino acid side chains. The results obtained also serve as a basis for evaluating more precise beta-sheet propensity scales.


Introduction

The beta-sheet structure is one of the major secondary structures of proteins, and the beta-sheet propensity scale is indispensable to conformational studies of proteins [Nesloney, 1996]. The beta-sheet propensity is also important in drug design studies, because proteolytic enzymes such as aspartic acid and serine proteases bind both substrates and inhibitors as beta-pleated sheets. In spite of these findings, the physical basis of beta-sheet propensities is not well understood.

The objective of this study was to elucidate the factors determining the beta-sheet-forming propensities of amino acids. Their propensities may be governed by steric, electronic, and hydrophobic factors, and it would thus be difficult to analyze them by the linear correlation method. Hence, we selected the multiple correlation method using various kinds of independent variables.

The statistically determined beta-sheet propensity scale by Chou and Fasman (Pbeta) [Chou, 1974] is well known. However, their Pbeta values were the averaged values derived from the known structures of different kinds of proteins. Here we chose experimentally determined free energy changes of proteins, and analyzed the data from different sources separately, in order to make clear the environmental and positional effects on the beta-sheet propensities.


Methods

The thermodynamic data used in the present analyses were taken from the literature [Kim, 1993; Minor, 1994; Smith, 1994]. Statistical analyses were performed with the StatView program [StatView, 1992] run on a Macintosh PowerPC 8500. In regression analyses, we used various hydrophobic, steric, and electronic parameters of amino acid side chains and experimentally determined values as NMR chemical shifts and pKa values of amino acids. When necessary, indicator variables were applied. These parameters (except for the hydrophobic parameters) were taken from the literature [Verloop, 1987; Fauchere, 1988]. The hydrophobic parameters were those determined by us with the use of experimentally determined 1-octanol/water partition coefficients of oligopeptides [Niwa, 1995].


Results

Zinc-figure peptide

Propensities for residues in the beta-sheet have been studied with the use of various host-guest systems [Kim, 1993; Minor, 1994; Smith, 1994]. Kim and Berg measured the thermodynamic beta-sheet propensities for each of the twenty commonly occurring amino acids by using a zinc-figure peptide host system in which amino acids were substituted into a guest site (residue 3), a solvent-exposed position in an antiparallel beta-sheet [Kim, 1993]. Because these peptides are unfolded in the absence of bound metals but are folded in their presence, they assumed that the thermodynamics of metal binding fully reflect peptide-folding energy. In order to determine these energies with high precision, they used a competitive cobalt(II)-binding assay. For the relative free energies, delta.deltaG, listed in Table 1, Eq. 1 was given. Pro was excluded from the analyses throughout the present study. The delta.deltaG values and parameters used in analyses are listed in Table 1.

   delta.deltaG = - 0.492B1  - 0.037B5 + 0.483                     (1)
                   (0.111)    (0.013)   (0.172)

n = 19, s = 0.040, r = 0.949, F = 72.560, p < 0.0001

B1 and B5 are the STERIMOL parameters by Verloop [Verloop, 1987; Fauchere, 1988], and express the minimum and maximum width of amino acid side chains. In Eq. 1 and the following equations, n is the number of compounds, s is the standard deviation, r is the correlation coefficient, F is the ratio of variances between the calculated and observed values, and p is the p-value, probability of the null hypothesis. The figures in parentheses are the 95% confidence intervals. Eq. 1 shows that sterically bulky amino acid side chains increase the beta-sheet propensities. Next, we applied the NMR chemical shift of the alpha carbon of amino acids [Fauchere, 1988], deltaHc, and formulated Eq. 2.

   delta.deltaG = -0.265deltaHc/10 - 0.126                         (2)
                  (0.074)           (0.089)

n = 19, s = 0.058, r = 0.879, F = 57.705, p < 0.0001

The deltaHc values were divided by 10 to make the coefficient with deltaHc comparable to those with other parameters. Eq. 2 shows that the higher the deltaHc/10, the higher the beta-sheet propensities.

IgG-binding domain of protein G

Minor and Kim measured the beta-sheet propensity for the naturally occurring amino acids in a variant of the small, monomeric, beta-sheet-rich, IgG-binding domain from protein G [Minor, 1994b]. This protein exhibits a reversible two-state thermal denaturation transition. Amino-acid substitutions were made at a guest site (residue 44) on the solvent-exposed surface of the beta-sheet. The stability of each protein was measured by thermal unfolding as monitored by circular dichroism. The measured delta.deltaG values listed in Table 1 were analyzed as shown below.

   delta.deltaG(44) = 0.965B1 + 0.735I(ST) - 1.572                 (3)
                     (0.596)   (0.343)      (0.921)

n = 19 s = 0.214, r = 0.840, F = 19.140, p < 0.0001

I(ST) is an indicator variable which is one when the guest amino acid is Ser or Thr, and is zero for the other amino acids. Next, we applied the NMR chemical shift of the alpha carbon of amino acids [Fauchere, 1988], deltaHc, and formulated Eq. 4.

   delta.deltaG(44) = 0.508deltaHc/10 + 0.627I(ST) - 0.654         (4)
                     (0.261)           (0.326)      (0.306)

n = 19, s = 0.196, r = 0.867, F = 24.302, p < 0.0001

The results for the IgG-binding domain of protein G were similar to those obtained for the zinc-figure peptide, except for the I(ST) term. Note that due to the differences in the methods of studying the beta-sheet propensities, the delta.deltaG values of the IgG-binding domain of protein G are opposite in sign to those of the zinc-finger peptide.

Minor and Kim also measured the beta-sheet propensity of the IgG-binding domain of protein G, in which amino acid substitutions were made at a guest site (residue 53) at the central strand bordered on both sides by other beta-sheet strands [Minor, 1994a]. The neighboring environment of the guest site was modified to minimize the potential for artifactual interactions. Statistical analyses gave Eq. 5, where I(anion) is an indicator variable which is one when the amino acid at the position 53 is Glu or Aps, and is zero for the other amino acids; I(anion) expresses the anionic charge effect of amino acid side chains.

   delta.deltaG(53) = 1.276deltaHc/10 - 0.734I(anion) - 1.046      (5)
                     (0.300)           (0.376)         (0.370)

n = 19, s = 0.236, r = 0.933 F = 53.488, p < 0.0001

As in the case of Eq. 4, deltaHc/10 was statistically highly significant. It should be noted that anionic side chains decreased the beta-sheet propensities in this case.

Smith and co-workers also studied the beta-sheet propensity for the naturally occurring amino acids in the B1 domain of staphylococcal IgG-binding protein G [Smith, 1994]. Amino acid substitutions were made at the same guest site (residue 53). The protein's thermal stability was determined to measure the beta-sheet forming propensities. The obtained delta.deltaG(53)* values for beta-sheet formation were analyzed as shown below.

   delta.deltaG(53)* = -1.429deltaHc/10 + 0.779I(anion) + 1.003    (6)
                       (0.397)           (0.496)         (0.487)

n = 19, s = 0.311, r = 0.908, F = 37.699, p < 0.0001

As in Eq. 5, deltaHc/10 and I(anion) were statistically significant. Due to the differences in the methods, the delta.deltaG(53) values obtained by Minor and Kim are opposite in sign to the delta.deltaG(53)* values by Smith and co-workers.

Correlations between B1 and deltaHc/10

In the above analyses, both B1 and deltaHc/10 were statistically highly significant. We then examined the correlations between B1 and deltaHc/10, and obtained Eqs. 7 and 8.

   deltaHc/10 = 1.857B1 - 1.721                                    (7)
               (0.577)   (0.896)

n = 19, s = 0.211, r = 0.855, F = 46.128, p < 0.0001

deltaHc/10 = 1.818B1 + 0.360I(STC) + 0.381I(aro) - 1.778 (8) (0.313) (0.154) (0.153) (0.486)

n = 19, s = 0.112, r = 0.966, F = 68.893, p < 0.0001

As shown in Eqs. 7 and 8, there are high correlations between B1 and deltaHc/10. Since B1 mainly expresses the branching effects on the beta carbon of amino acid side chains (see Table 1), the beta-branching structure is demonstrated to increase the deltaHc/10 values. I(STC) is an indicator variable which is one when the amino acid is Ser, Thr, or Cys, and is zero for the other amino acids, and expresses the contribution of inductive electronic effects of hetero atoms of the amino acid side chains upon the alpha carbon. I(aro) is an indicator variable which is one when the amino acid is Phe, Tyr, or Trp, and is zero for the other amino acids. Aromatic residues also increase the deltaHc/10 values. Table 2 lists the correlations among the parameters used above in Eqs. 1 - 6; high colinearity was found.


Discussion

Statistical quality

As shown in Table 1, the ranges of the free energy differences were small, except for Pro, which has no hydrogen atom on the backbone amide group. However, we were able to formulate correlation equations of high quality. The correlation coefficients, r, were greater than 0.80 in all cases. The multiple regression method proved to be more suitable than the usual simple regression method to analyze the beta-sheet propensity.

Contributing Factors

Hydrophobic effect

We previously evaluated the hydrophobic parameters of amino acids using experimentally determined 1-octanol/water partition coefficients of oligopeptides [Niwa, 1995]. These parameters are free from conformational factors such as beta-turn, and have been shown to work well to rationalize the thermal stability of proteins [Niwa, 1995]. Our hydrophobic parameters, paib, were statistically insignificant throughout the present analyses. The hydrophobic factor is hence unimportant for the beta-sheet preference.

Steric effect

Compared with the alpha-helix structure, the beta-sheet structure is less crowded. We did not find critical steric interactions among the side chains of amino acids around the guest site of the IgG-binding domain of protein G (PBD code: 2GB1) from molecular graphics studies. Additionally, the guest site of zinc-figure peptide is located near the terminal position (residue 3), and the neighboring environment of the guest site was modified to minimize the potential for artifactual interactions in the IgG-binding domain of protein G. The steric interaction between the amino acid side chains is thus unimportant. In fact, as is shown in Eqs. 1 and 3, bulky and beta-branched side chains increased the beta-sheet propensities. Hence it appears that the steric effects by bulky and beta-branched side chains restrict the access of water and protein groups and thereby enhance the hydrogen-bonding interactions of backbone amide groups or electrostatic interactions between main chains.

Electronic effect

The deltaHc/10 values are roughly regarded as the inductive effect of amino acid side chains, since the degree of shielding of the alpha carbon-nucleus depends on the density of the circulating electrons. Eqs. 2, 4, 5, and 6 thus show that the inductive effect largely determines the beta-sheet propensities. The significance of the I(ST) term also suggests the importance of the inductive effect.

Recently, Avbelj and Moult [Avbelj, 1995] reported that the main chain electrostatics are important to determine the conformational preferences of amino acids. Maccallum and co-workers proposed that coulombic interactions between charged main-chain atoms not hydrogen-bonded to each other influence the conformations of antiparallel beta-sheets [Maccallum, 1995]. The significance of the deltaHc/10 term in the present Eqs. 2, 4, 5, and 6 supports both models. Of course, our results do not deny the hydrogen-bonding interactions between main chain amide groups.

In Eqs. 5 and 6, an indicator variable, I(anion), was highly significant, but not in Eqs. 3 and 4. This could be explained in two ways. First, residue 44 locates on the solvent-exposed surface of the edge beta-sheet, the anionic side chains are well solvated, and the charged effects are weakened. In contrast, residue 53 is at the central strand bordered on both sides by other beta-sheet strands, is not fully solvated, and still interacts with protein groups. Second, the electrostatic interactions with nearby amino acids decrease the beta-sheet propensities of residue 53. Because electrostatic interactions are long-range interactions, we could not at present identify the amino acids that interact with residue 53.

Because B1 is highly correlated with deltaHc/10 (Eqs. 7 and 8), separation of the steric effect from the electronic effect is unfortunately impossible. Steric parameters B1 and B5 gave the best equation for zinc-figure peptides, and deltaHc/10 gave the best equations for the IgG-binding domain of protein G. It is thus more reasonable to conclude that both the steric and electronic effects operate cooperatively to support the beta-sheet structures.

Environment effects

B1 or deltaHc/10 was significant in all cases, and B1 was highly correlated with deltaHc/10. Hence B1 and deltaHc/10 can be understood to express the intrinsic beta-sheet preferences of amino acids. We compared the coefficients with B1 and deltaHc/10 in Eqs. 1 - 6. The absolute values of the coefficients for zinc-finger peptides were much smaller than those for the IgG-binding domain of protein G. Additionally, these coefficients changed with the change of the guest position for the IgG-binding domain of protein G. The beta-sheet propensities were thus demonstrated to be largely dependent on the kinds of peptides or proteins and the positions of guest amino acids.

Evaluation of propensity scales

The Chou-Fasman probability values are averaged over all possible beta-sheet environments, such as middle and edge positions as well as partially and fully hydrogen bonded positions. Actually, we failed to formulate statistically significant correlation equations for their probability values. This is easily understood when Eqs. 3 - 6 are compared. To gain beta-sheet propensity scales by a statistical survey of experimental structural data, selection of the data is critically important; only central and fully hydrogen bonded beta-sheets should be selected. Another way to evaluate beta-sheet propensity scales is to prepare good model proteins. The IgG-binding domain of protein G appears to be an appropriate, when further amino acid mutations are done to elucidate the electronic contributions expressed by the I(anion) term in Eqs. 5 and 6.


Summary

In spite of the difficulties described above, we were able to formulate correlation equations of high quality though the use of the multiple correlation method and make clear the basis of the beta-sheet propensities. Such analyses are expected to be useful to study other structural properties of proteins.


References

F. Avbelj, J. Moult, Biochemistry, 34 (1995) 755.

P. Y. Chou, G. D. Fasman, Biochemistry, 13 (1974) 211.

J.-L. Fauchere, M. Charton, L. B. Kier, A. Verloop, V. Pliska, Int. J. Pept. Protein Res., 32 (1988) 269.

C. A. Kim, J. M. Berg, Nature, 362 (1993) 267.

P. H. Maccallum, R. Poet, E. J. Milner-White, J. Mol. Biol., 248 (1995a) 361.

P. H. Maccallum, R. Poet, E. J. Milner-White, J. Mol. Biol., 248 (1995b) 374.

D. L. Minor Jr, P. S. Kim, Nature, 367 (1994a) 660.

D. L. Minor Jr, P. S. Kim, Nature, 371 (1994b) 264.

C. L. Nesloney, J. W. Kelly, Bioorg. Med. Chem., 4 (1996) 739.

T. S. Niwa, A. Ogino, ECCC-2. paper 10 (accept for publication in Theochem).

C. K. Smith, J. M. Withka, L. Regan, Biochemistry, 33 (1994) 5510.

StatView, Abacus Concepts, Inc., Berkeley, 1992.

A. Verloop, J. Tipker, Pharmacochem. Libr. 10(QSAR Drug Des. Toxicol.) (1987) 97.