| ECCC-3 Contents Page | THEOCHEM Home Page | Elsevier Chemistry Home Page |
Leonid Kirnarsky, Oleg Shats, and Simon Sherman
Eppley Cancer Institute, University of Nebraska Medical Center,
Omaha, NE 68198-6805
E-Mail: lkirnars@mail.unmc.edu
Comprehensive computational experiments were performed to evaluate
efficiency of the newly proposed COMBINE procedure on protein structure
calculations from NMR data. This procedure is intended to combine merits
of the previously developed FiSiNOE method with the DIANA program,
widely-used for NMR structure calculations. The new version of the
FiSiNOE program, FiSiNOE-3, was developed to determine local
conformations of proteins consistent with short-range NMR data
(intraresidue and sequential distance constraints and coupling
constants). For each residue, the program determines the allowed
ranges of
,
,
torsion angles consistent with the NMR data.
The benchmark calculations were carried out on three proteins:
bovine pancreatic peptide inhibitor, crambin, and avian pancreatic
polypeptide. The results of the calculations obtained by the COMBINE
protocol were compared with the results obtained by the STANDARD
run of the DIANA program. The COMBINE procedure allowed one to
significantly narrow ranges of the dihedral angle constraints
before the structure calculations that, in turn, resulted in
more stereospecific assignments. For all three proteins, the use of the
COMBINE procedure almost doubled the numbers of unambiguously
assigned ßCH2 groups in comparison with STANDARD. The computational
experiments clearly showed that the use of allowed ranges for torsion
angles obtained by the COMBINE procedure as input data for the DIANA
program provides a higher precision and accuracy of 3D protein
structures reproduced from NMR constraints. The COMBINE procedure
may be incorporated into any protocol using as input data the allowed
ranges of torsion angles consistent with a given set of NMR costraints.
Since the COMBINE procedure proved to be effective, reliable, and robust,
it may be recommended for general use in 3D structure determination
of proteins and peptides from NMR data.
Keywords: protein structure determination, NMR, refinement.
Goodness of any computational method may be characterized by precision and accuracy of protein structures reconstructed by the method. Precision is a measure of the variation within the reproduced structures, while accuracy is the measure of the closeness of the reproduced structures with the "true" structure ("gold standard"). By the benchmark calculations, each of the computational methods was shown to introduce its own systematic bias [14]. Taking into account the accuracy of experimental data provided by NMR, the attainable accuracy of the protein structure calculations was shown to be varying of 1 to 2 Å, while the precision within a family of the calculated structures can reach 0.4 - 0.7 Å [14]. Both precision and accuracy depend on the method used, on precision and accuracy of the input data, and on the size of the molecule [14, 15].
It was shown previously that the quality of the calculated structures can be improved by increasing the number of stereospecific assignments for prochiral groups of protons, in particular, for ß-methylene protons [16-18]. The stereospecific assignments can be obtained by matching the experimental spin-spin coupling constants and the intraresidual and sequential NOEs with the calculated ones for allowed conformations provided by either a systematic conformational search [16, 18] or analysis of crystallographic databases on known protein structures [18, 19].
The crystallographic information on high-resolution protein structures derived from the structural databases was shown to be beneficial for protein structure calculations from NMR [18-20]. Indeed, the problem of NMR structure determination falls into the category of ill-posed mathematical problems [19, 20], since the number of variables (coordinates of atoms) to be determined is usually larger than the number of experimental constraints derived from NMR data. The way to decisively improve this situation is to use additional information that is not contained directly in NMR data. The Brookhaven Protein Data Bank (PDB) [21] is an excellent source of such information accumulated on physical realities of many proteins that may be used for the NMR structure calculations.
The previously developed FiSiNOE approach [20, 22, 23] attempts to combine
the data from PDB with the NMR measurements. The PDB information was used to
provide an a priori probability distribution for protein
conformations. The probable values of torsion angles for any amino acid
residue in the protein sequence can be determined based on intensities of the
NOE cross-peaks and coupling constants [20, 22, 23]. It was
shown that the interproton distances measured from NOE data and coupling
constants in conjunction with structural information derived from PDB provided
refinement of the
,
angles, determination of
conformations, and
stereospecific 1H assignments of ß-protons [20].
Using methods of cluster analysis for (
,
,
) angle distributions
observed in high-resolution proteins from PDB, it was shown that these
distributions are well clustered [24, 25]. For every cluster,
the first statistical moments, i.e. the mathematical expectations of the
,
, and
angles and their standard deviations were estimated.
The statistical data of the 16 clusters were
used as a priori data in a new version of the FiSiNOE approach, the
FiSiNOE-3 program [26].
Recently, we have shown that combined use of the FiSiNOE approach with the
systematic scanning of sterically allowed conformational space, implemented in
the COMBINE procedure, results in a more precise determination of angular
constraints and in an increased number of stereospecific assignments of
ß-protons [27]. The COMBINE procedure integrates merits of the
FiSiNOE method with the systematic scanning of sterically allowed
conformational space, implemented in the HABAS program [16], which is
supplemental to the DIANA program [17] widely-used for NMR structure
calculations. Using the same input data (short-range NMR data) FiSiNOE and
HABAS are based on fundamentally different principles. FiSiNOE is a
knowledge-based approach implemented to estimate mathematical expectations and
standard deviations for
,
, and
angles by combining a
previously-known distribution function for these dihedral angles with a
measured set of experimental data, while HABAS aims to obtain stereospecific 1H
NMR assignments for a pair of ß-methylene protons by the systematic
scanning of the (
,
,
) space. As a by-product of the scanning, HABAS
also determines sterically allowed intervals for the angles consistent with a
given set of NMR data. Our COMBINE procedure strives to find the angular
intervals that match both probabilistic and steric conditions.
In this paper, the efficiency of the COMBINE protocol for 3D protein structure determination from the simulated NMR data was evaluated by the benchmark calculations on three proteins: bovine pancreatic peptide inhibitor (BPTI), crambin, and avian pancreatic polypeptide (APPT). The results obtained with the COMBINE protocol were compared with those of the standard protocol of the DIANA program [17, 28].
The FiSiNOE-3 program. FiSiNOE-3 uses, as a priori data, a set
of 16 clusters for
,
, and
angles [25] and, as input data, intraresidue
and sequential distance constraints and coupling constants (short-range NMR
data). For each residue, the program determines a subset of clusters for
,
, and
angles consistent with the NMR restraints. Posterior
probabilities, mathematical expectations, and standard deviations of the angles
are estimated for each (
,
,
) cluster, as are overall ranges of these
angles. The allowed range for a torsional angle is considered as a confidence
interval, determined as a product of the half-width (input parameter of
FiSiNOE-3) and the standard deviation,
, of the angle from its
mathematical expectation. The lowest limit within the confidence intervals for
the angle within subset of the (
,
,
) clusters, consistent with the
NMR restraints, is taken as the lower limit of the overall range of the angle.
Analogously, the uppermost limit is taken as the upper limit of the overall
range. The FiSiNOE-3 program is written on C++ and compiled under IRIX 5.3 and Solaris 2.5
operating systems. The program is available through the World Wide Web
(http://www.unmc.edu/Eppley/ECCC3/fisinoe3.htm).
The STANDARD protocol. The DIANA program with REDAC strategy, implemented in SYBYL 6.2 [29], was used in the calculations utilizing the STANDARD protocol. The angle constraints and stereospecific assignments for ß-methylene protons were determined by the HABAS program. The DIANA calculations were performed with the standard selection of minimization levels and parameters [17, 29]. The computational experiments, with 50 and 250 randomly generated structures, were performed by the STANDARD protocol.
The COMBINE protocol. For each amino acid residue in the
protein sequence, a set of sterically allowed conformations and overall ranges
for the
,
, and
angles (angular constraints), consistent with a set
of its short-range NMR data, was determined by the FiSiNOE-3 program. The
angular constraints, along with short-range NMR data, were used as input data
into HABAS. The renewal angular constraints and the stereospecific assignments
of ß-methylene protons determined by HABAS were used for protein
structure calculations by the DIANA program. The calculations were carried
out with 50 randomly generated structures. The half-width of the allowed
angular intervals determined by the FiSiNOE-3 program was taken as equal to
3
.
Estimation of quality of the structure determination. In each of the
computational experiments, a family of 20 structures was selected by the lowest
values of the target function and taken for further consideration. The selected
structures were superimposed with the crystal structure, from which NMR data
were simulated, to determine the root mean square deviation (RMSD) between
coordinates of corresponding atoms in the structures. Mean squared error (MSE)
of each structure was estimated as the square of the corresponding RMSD. The
overall precision within each structural family was determined as the RMSD of
variances between the 20 structures and their average structure. To calculate
the average structure, the 20 structures belonging to the same structural
family were superimposed, by minimizing atomic RMSDs, and then averaged.
Accuracies were estimated as atomic RMSDs between the average structure of each
family and the "gold standard." Precision, accuracy, and MSE were
estimated for both backbone atoms (C
, C', and N) and all heavy
atoms.
All calculations were performed on the Silicon Graphics Indigo2 workstation with the R4400 processor.
Table 1. Stereospecific assignments of ß-methylene protons obtained by
the STANDARD and COMBINE procedures.
____________________________________________________________________
Number of Number of Stereospecific assignments
Protein residues ßCH2 groups ________________________________
(without Pro) STANDARD COMBINE
____________________________________________________________________
APPT 36 23 12 (52%) 22 (96%)
Crambin 46 19 7 (37%) 16 (84%)
BPTI 58 36 19 (53%) 31 (86%)
____________________________________________________________________
As can be seen in Table 1, numbers of ßCH2 groups unambiguously
assigned by the use of the COMBINE procedure were significantly greater in
comparison with those assigned by STANDARD. For all three proteins, the use of
the COMBINE procedure almost doubled the numbers of unambiguously assigned
ßCH2 groups in comparison with STANDARD.The precision and accuracy of structure calculations obtained by the use of the COMBINE procedure in comparison with the STANDARD run of DIANA were assessed for the 20 selected structures with the lowest values of target function in each computational experiment. The calculations were carried out for 50 and 250 starting structures produced by STANDARD (STD50 and STD250, correspondingly) and for 50 structures produced by COMBINE (COMB50). The atomic RMSDs between the average of those structures and the original crystal structure were used as a measure of the accuracy of structure determination, while RMSDs between the average and the members of the structural family served as a measure of precision. These data are presented in Tables 2 and 3.
Table 2. Accuracy of the structures reproduced by STANDARD and COMBINE protocolsIn Table 2 and 3, the accuracy and precision of the 20 best structures selected by the lowest values of the target function determined by the COMBINE procedure are compared with those determined by STANDARD._____________________________________________________________________________ RMSD for all heavy atoms RMSD for backbone atoms Protein _________________________________ _______________________________ STD50 STD250 COMB50 STD50 STD250 COMB50 _____________________________________________________________________________ APPT 1.00 1.02 0.91 0.69 0.70 0.65 Crambin 0.70 0.63 0.63 0.50 0.51 0.46 BPTI 0.93 0.92 0.89 0.51 0.55 0.45 _____________________________________________________________________________
Table 3. Overall precision of structures reproduced by STANDARD and COMBINE protocols: average and standard deviation (in parentheses) of the RMSDs between accepted conformers and the average structure._________________________________________________________________________________________ All heavy atoms Backbone atoms Protein _________________________________ ________________________________________ STD50 STD250 COMB50 STD50 STD250 COMB50 _________________________________________________________________________________________ APPT 0.46 (0.07) 0.46 (0.04) 0.44 (0.06) 0.18 (0.04) 0.15 (0.05) 0.15 (0.05) Crambin 0.60 (0.13) 0.39 (0.08) 0.35 (0.06) 0.41 (0.12) 0.26 (0.08) 0.24 (0.07) BPTI 0.48 (0.06) 0.44 (0.05) 0.45 (0.04) 0.23 (0.04) 0.19 (0.03) 0.18 (0.05) _________________________________________________________________________________________
Comparing the results for the same 50 starting structures, the overall quality of the structures produced by the COMBINE procedure was always better than it was produced by the STANDARD protocol. The precision of the STANDARD calculations starting with 250 structures (~5 times longer computations) was close to the results obtained by COMBINE with 50 starting structures, while for all three proteins the accuracy of the COMBINE calculations was always better. Interestingly, the five-fold increase in the number of starting structures did not guarantee the better accuracy (see Table 2, all heavy atoms for APPT and backbone atoms for all three proteins). In all calculations, the accuracy provided by the COMBINE procedure was better than that provided by STANDARD starting with 50 or 250 structures.
For the STANDARD protocol, the precision and accuracy were dependent upon the number of randomly chosen trial structures. An increase in the number of trial structures improved the precision to some extent but sometimes decreased the accuracy. For the same number of randomly chosen trial structures, the precision of the 20 accepted structures determined by COMBINE was always better than that obtained by STANDARD. To attain the precision provided by COMBINE, the number of trial structures to be generated using the STANDARD protocol would be, on average, five times greater.
The narrowing of the overall ranges of the dihedral angles provided by COMBINE did not introduce an additional bias, since the accuracy of the structures determined by the COMBINE procedure was always lower than that obtained by STANDARD. Moreover, in contrast to the precision, the higher accuracy achieved by COMBINE cannot be reached by STANDARD even by increasing the number of randomly chosen trial structures. The observed improvement in the accuracy suggests that COMBINE estimates the overall ranges of the dihedral angles not only more precisely but also more accurately, in comparison with the STANDARD protocol.
To better evaluate the quality of the structure determination, the distribution of the values of mean squared error for the individual structures obtained by the different procedures was analyzed. MSE of each structure was estimated as the square of the corresponding RMSD between the calculated structure and the "gold standard." According to the MSE values, all structures were divided into four categories: excellent, very good, good, and fair. The structures with MSE values of less than 0.5 for all heavy atoms were assigned as excellent; the structures with the value of MSE within the interval of 0.5-0.8 - as very good; those within the interval 0.8-1.25 - as good; and the structures with the value of MSE greater than 1.25 as fair. For the backbone atoms, the corresponding categories were determined by the following MSE values: excellent - less than 0.2; very good - within the interval 0.2-0.3; good - 0.3-0.55; and fair - greater than 0.55. The number of structures determined by the STANDARD and COMBINE procedures that fell in these categories is shown in Table 4.
Table 4. Number of the structures categorized by values of MSEAs seen in Table 4, most of the structures determined by both procedures fall in the good category. However, for all heavy and backbone atoms the distribution of the structures produced by the COMBINE procedure was significantly shifted to the very good and the excellent categories. In contrast, STANDARD produced significantly lower amounts of the very good and the excellent structures; the fair structures were mostly produced by STANDARD but not by COMBINE.________________________________________________________________________________ All heavy atoms Backbone atoms Quality of _____________________________ ________________________________ structure STD50 STD250 COMB50 STD50 STD250 COMB50 ________________________________________________________________________________
APPT
Excellent 0 0 0 0 0 0 Very good 0 0 6 0 0 0 Good 16 12 14 16 15 19 Fair 4 8 0 4 5 1
Crambin
Excellent 1 5 10 1 2 5 Very good 7 15 10 3 3 7 Good 11 0 0 13 15 8 Fair 1 0 0 3 0 0
BPTI
Excellent 0 0 0 0 0 1 Very good 1 0 7 7 6 19 Good 19 20 13 13 14 0 Fair 0 0 0 0 0 0 ________________________________________________________________________________
Total
Excellent 1 5 10 1 2 6 Very good 8 15 23 10 9 26 Good 46 32 27 42 44 27 Fair 5 8 0 7 5 1 ________________________________________________________________________________
It is likely that a careful narrowing of the overall ranges of the dihedral angles, provided by COMBINE, helps the process of the target function minimization avoid some "false" local minima. In fact, we found that the values of the target function are poorly correlated with the corresponding values of the MSE; the structures with the lowest values of target function are not necessarily the ones closest to the X-ray structure or vice versa. Thus the values of target function per se cannot serve as an ideal characteristics of goodness of the structure. On the other hand, our calculations clearly showed that STANDARD produces a bigger number of structures with poor MSE in comparison with the COMBINE procedure. It is likely that COMBINE rejects the structures with poor value of MSE (i.e. the structures corresponding to the "false" local minima of target function) better than STANDARD does.
Recently, relative populations of the torsional angles derived from PDB were used to define a conformational database potential [19] implemented as a new term in the simulated annealing refinement by X-PLOR [10]. The knowledge-based distribution for dihedral angles in proteins was noted as a useful addition for the refinement of NMR structures that provides improvements in the physicochemical reasonableness of dihedral angles and the overall packing, but increases the precision only slightly [19]. At the same time, our computational experiments showed improved accuracy and precision of structure calculations when the NMR data were used in conjunction with the probability density function for the torsional angles derived from PDB.
The COMBINE approach and the method of the conformational database potential are based on a common idea: to restrict sampling of dihedral angles during structure calculations to those that are known as physically realizable. However, an implementation of this idea by two approaches is different. In contrast to conformational database potential, COMBINE uses the preliminary clustered mixture distribution from the conformational database and selects only those conformational clusters that are consistent with the given set of NMR data. It allows COMBINE to decrease informational "noise" and increase the "signal/noise" ratio. COMBINE acts directly on allowed ranges for torsion angles and restricts the ranges before structure determination increasing the number of stereospecific assignments of ß-protons. By increasing accuracy and precision of input data, COMBINE provides improved precision and accuracy of output data, i.e. better quality of NMR protein structures.
(1) Wüthrich, K. NMR of Proteins and Nucleic Acids; John
Wiley and Sons: New York, 1986.
(2) Clore, G.M.; Gronenborn, A.M. CRC Crit. Rev. Biochem. Mol. Biol.
1989, 24, 479-564.
(3) Clore, G.M.; Gronenborn, A.M. Science 1991, 252,
1390-1399.
(4) Clore, G.M.; Gronenborn, A.M. Protein Sci. 1994,
3, 372-390.
(5) Wagner, G. J. Biomol. NMR. 1993, 3, 375-385.
(6) Havel, T.F.; Wüthrich, K. J. Mol. Biol. 1985,
182, 281-294.
(7) Crippen, G.; Havel, T.F. Distance Geometry and Molecular
Conformation; Research Studies Press: Taunton, Somerset, England, 1988.
(8) Braun, W.; Go, N. J. Mol. Biol. 1985, 186, 611-626.
(9) Brünger, A.T.; Clore, G.M.; Gronenborn, A.M.; Karplus, M. Proc.
Natl. Acad. Sci. USA 1986, 83, 3801-3805.
(10) Brünger, A.T. X-PLOR, Version 3.1. A system for X-ray
crystallography and NMR; Yale University Press: New Haven and London,
1993.
(11) Altman, R.B.; Jardetzky, O. Methods in Enzymology, Nuclear magnetic
resonance, Part B: Structure and mechanism (Oppenheimer, N.J., & James,
T.L., edc.), Academic Press: New York, 1989, 177, 218-246.
(12) Borgias, B.A.; James, T.L. J. Magn. Reson. 1988,
79, 493-512.
(13) James, T.L. Curr. Opinion Struct. Biol. 1994, 4,
275-284.
(14) Zhao, D.; Jardetzky, O. J. Mol. Biol. 1994, 239,
601-607.
(15) Liu,Y.; Zhao, D.; Altman,R.; Jardetzky, O. J. Biomol. NMR
1992, 2, 373-378.
(16) Güntert, P.; Wüthrich, K. J. Am. Chem. Soc. 1989,
111, 3997-4004.
(17) Güntert, P.; Braun, W.; Wüthrich, K. J. Mol. Biol.
1991, 217, 517-530.
(18) Nilges, M.; Clore, G.M.; Gronenborn, A.M. Biopolymers 1990,
29, 813-822.
(19) Kuszewski, J.; Gronenborn, A.M.; Clore, G.M. Protein Sci.
1996, 5, 1067-1080.
(20) Sherman, S.A.; Johnson, M.E. Progress in Biophysics and Molecular
Biology. 1993, 59, 285-339.
(21) Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyers, Jr., E.F.;
Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. J.
Molec. Biol. 1977, 112, 535-542.
(22) Sherman, S.A.; Andrianov, A.M.; Akhrem, A.A. J. Biomolec. Struct.
Dyn. 1987, 4, 869-884.
(23) Sherman, S.A.; Johnson, M.E. J. Magn. Reson. 1992,
96, 457-472.
(24) Sclove, S.L.; Sherman, S.A. In: 1994 Proceedings of the
Biopharmaceutical Section, American Statistical Association: Alexandria,
VA, 1995, 399-404.
(25) Sclove, S.; Sherman, S. Third
Electronic Computational Chemistry Conference (ECCC-3),
Nov. 1 - Nov. 30, 1996. Submitted to J. Mol. Struct.: THEOCHEM.
(26) Shats, O.; Sherman, S. Third Electronic Computational Chemistry
Conference (ECCC-3),
Nov. 1 - Nov. 30, 1996.
(http:/www.unmc.edu/Eppley/ECCC3/fisinoe3.htm).
(27) Sherman, S.; Sclove, S.; Kirnarsky, L.; Tomchin, I.; Shats, O. J. Mol.
Struct.: THEOCHEM 1996, 368, 153-162.
(28) Güntert, P.; Wüthrich, K. J. Biomol. NMR 1991,
1, 447-456.
(29) SYBYL Molecular Modeling, version 6.2. Tripos Associates, Inc.:
St.Louis, MO, 1995.
(30) Blundell, T.L.; Pitts, J.E.; Tickle, I.J.; Wood, S.P.; Wu, C.-W. Proc.
Natl. Acad. Sci. USA 1981, 78, 4175-4182.
(31) Hendrickson, W.A.; Teeter, M.M. Nature 1981, 290,
107-113
(32) Marquart, M.; Walter, J.; Deisenhofer, J.; Bode, W.; Huber, R. Acta
Crystallogr. 1983, B39, 480-490.
(33) Pardi, A.; Billeter, M.; Wüthrich, K. J. Mol. Biol.
1984, 180, 741-751.
E-Mail: lkirnars@mail.unmc.edu
Phone: (402) 559-7809