Quality assessment of a homology model

Let us have a look at the output we get from the Swiss Model server for the BchI modeling project.
What does all that tell us?
For assessing the quality of the homology model the server provides several scores as well as graphical plots of Anolea (mean force potential), GROMOS (empirical force field energy) and QMEAN (Qualitative Model Energy ANalysis).
Gromos is a molecular dynamics simulation software, but it may also be used for the estimation of model energy. Anolea uses atomic empirical mean force potential (Melo et al.), to assess packing quality of structures. Lets start with the scores shown above (QMEAN4 global scores). Clicking the question mark will tell us that QMEAN4 scoring function (the original paper Benkert et al. 2008, and a more recent paper Benkert et al., 2012) is a linear combination of four structural descriptors:
-The local geometry is analyzed by a torsion angle potential over three consecutive amino acids.
-Two distance-dependent interaction potentials are used to assess long-range interactions: First, at a residue-level it is based on C-beta atoms only, at the second level an all-atom potential is used.
-A solvation energy is calculated to investigates the burial status (accessibility to water) of the residues.

In the paper about QMEAN it is
stated that "QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models". For our model the values of the score components look like this:

My Image
My Image

The Z-score is -1.94, and the plot informs us that the score for high-resolution X-ray structures on average is around 0. We can see from the plot above, that while the values for Cβ interactions, all-atom interactions and solvation energy are rather close to zero, the value of the torsion energy is -2.2, which also causes our overall score to deviate from the ideal value. We can also get the PDB coordinates of the modeled structure (there is "save pdb" link under the image) or an image of the structure showing the regions with high error values:

My Image

The figure above is colored according to error values - low-error regions blue and high-error regions red. We may recognize the largest red-color region as the one where the extra 7 amino acid insertion in the sequence is located. It was built by the server according to our alignment, but apparently the energy functions do not approve results - the structure in this region needs to be modified somehow. Since we do not have any experimental data to improve the structure, we could, for example, try to find another server, specialized in building this type of models. There are different possibilities, this region could be rearranged into a β-hairpin or it could also include a short α-helix - a secondary structure prediction may give some indication on that.
It could also be tested experimentally, if we had both
R. capsulatus BchI and SyncChlI proteins expressed and nicely purified in our hands, we could do some CD spectroscopic measurements to compare secondary structure content of the two proteins. For example higher percentage β-structure in SyncChlI would indicate that this region is a β-hairpin. One could also try to run some molecular dynamics simulations on the protein and see if this region would converge to some other structure. The resulting QMEQN can always be checked at the QMEAN server.

Another option for verifying the modeling we did here is to use a different sequence for modeling, preferably without the 7 residues insertion as in SyncChlI. Looking at the sequence alignment we made earlier, I would suggest the third sequence from the top. Then we can check if this region has high energy values.

Of course the best would be to crystallize the protein and determine its structure - it will immediately reveal the differences. I have actually seen small-angle X-ray scattering (SAXS) data for SyncChlI. SAXS provides low resolution structural models. Normally people try to fit available X-ray structures or a homology model into a low-resolution molecular shape obtained from SAXS, and if the structural model does not fit so well, it is possible to modify it to fit the SAXS shape. The SAXS shape for SyncChlI indicates that there are differences between the this structure and that of
R. capsulatus BchI. We also have some small crystals of the protein, but they need optimization before we can collect X-ray data on them. At some moment in the future we may find out.

The plots also show a more detailed energy analysis along the sequence, which reveals the high-energy amino acid residues:

My Image

It can bee seen that Anolea, Qmean and Gromos, all agree on the higher energy regions, amon which is the one approximately located between residues 70 to 100. In the next page we will discuss the geometry factors and their relation to model quality and how we can use the SwissPDB Viewer for assessing homology models.