Introduction to Homology Modeling

The term "homology modeling", also called comparative modeling or template-based modeling (TBM), refers to modeling a protein 3D structure using a known experimental structure of a homologous protein (the template). Structural information is always of great assistance in the study of protein function, dynamics, interactions with ligands and other proteins. The "low-resolution" structure provided by homology modeling contains sufficient information about the spatial arrangement of important residues and may guide the design of new experiments, such as site-directed mutagenesis and could even be used in ligand docking and design of new ligands/inhibitors in structure-based drug discovery and drug design. In some cases modeling is combined with electron microscopy or small-angle X-ray scattering (SAXS) data to generate low-resolution models of a protein complex.

Experimental elucidation of a protein structure may often be delayed by difficulties in obtaining a sufficient amount of material (cloning, expression and purification of milligram quantities of the protein) and difficulties associated with crystallization. Even the crystallographic part of the project may become a source of problems. In this context, it is not surprising that methods dealing with the prediction of protein structure have gained much interest. Among these methods, the method of homology modeling usually provides the most reliable results. The use of this method is based on the observation that two proteins belonging to the same family (and sharing similar amino acid sequences), will have similar three-dimensional structures. This is possible because the degree of conservation of protein three-dimensional structure within a family is much higher than the conservation of the amino acid sequence.

As discussed in the previous chapter on sequence alignment, before starting a homology modelling project, we need to learn as much as we can about the protein. A good starting point is always the UniProt database, the CATH database or the Interpro database. These databases provide a broad range of information about a protein, its domain content, important functional aspects, links to publications, etc.

After doing the preliminary research, we need to follow the following general steps for making a homology model:

  • identification of a template - related protein with a known experimental structure of good resolution;
  • amino acid sequence alignment;
  • alignment correction;
  • backbone generation;
  • generation of loops;
  • side chain generation & optimization;
  • ab initio loop building in regions in which template does not have structure;
  • overall model optimization;
  • model verification using quality criteria;

Modeling software usually uses its own sequence alignment, which we need to verify against our own alignment to make sure that there are no substantial differences. The modeling process includes backbone generation, building missing parts (usually loops), generation of side chains for residues that are different in the model, optimization of new residue side chain conformations and subsequent energy minimization of the entire model. The server outputs an assessment of model quality, which should be carefully examined.

There is a number of different servers that can be used for homology modeling, and they often use different algorithms and different modeling philosophy (see the links provided by the Expasy server). In our tutorial we will use the Swiss Modeling site at the Expasy server, which is automatic, relatively fast and provides nice model quality assessment. Some servers that use different methods may take days (or even weeks!) to return the modeling request. Although, sometimes it may be an advantage to use different servers and compare the output. However, for any modelling project, the higher the sequence identity between the model and the template, the better will be the expected quality of the model. One also needs to remember that a model is most reliable in the most conserved regions, while the conformations of surface loops is often variable. When choosing a template for modelling, it is also important to remember that the quality of the model cannot be better than the quality of the X-ray structure used for modelling. When several potential templates are found in the PDB, we will always choose the best structure − that is the structure with the highest sequence identity, the best (highest) resolution (of the X-ray data used to build the structure) and best refinement parameters (structure quality will be discussed later).

Error sources in homology modeling
The earlier we become aware of possible errors, the better we can eliminate them and handle our modeling project in a proper way. Errors to avoid include the following:

• Incorrect sequence alignment.
• Incorrect choice of template - may happen, especially for multi domain proteins.
• Incorrectly built loop regions - loops are usually built automatically by the server and different servers use different methods for building loops. If loop conformation is important for the project one could try to do the modeling with different servers and then compare the models.
• Errors made by the person doing the modeling - this type of errors may include anything and are difficult to predict in advance. Knowledge on the basic principles of protein structure is important for minimizing this type of errors.
• Errors which may be present in the template - difficult to eliminate. A model can hardly be better than the template.

Based on general observations, we may conclude that modeling may be straightforward, but may also be rather challenging, for example, if the sequence identity is low and we need to use 2-3 different templates to model different domains of the protein.