Introduction to Homology Modeling

The term "homology modeling", also called comparative modeling or sometimes template-based modeling (TBM), refers to modeling a protein 3D structure using a known experimental structure of a homologous protein (the template). Structural information is always of great assistance in the study of protein function, dynamics, interactions with ligands and other proteins. The "low-resolution" structure provided by homology modeling contains sufficient information about the spatial arrangement of important residues in the protein and may guide the design of new experiments, for example site-directed mutagenesis. Even within the pharmaceutical industry homology modeling can be valuable in structure-based drug discovery and drug design.

Experimental elucidation of a protein structure may often be delayed by difficulties in obtaining sufficient amount of material (cloning, expression and purification of milligram quantities of the protein) and difficulties associated with crystallization. Even the protein crystallographic part of the project may become a source of problems. In this context, it is not surprising that methods dealing with the prediction of protein structure have gained much interest. Among these methods, the method of homology modeling usually provides the most reliable result. The use of this method is based on the observation that two proteins belonging to the same family (an sharing similar amino acid sequences), will have similar three-dimensional structures. In reality, the degree of conservation of protein three-dimensional structure within a family is much higher than conservation of the sequence.

The steps required in homology modeling are the following:

◦ template identification;
◦ amino acid sequence alignment;
◦ alignment correction;
◦ backbone generation;
◦ generation of loops;
◦ side chain generation & optimization;
◦ ab initio loop building;
◦ overall model optimisation;
◦ model verification. Quality criteria, model quality;

After finding a template it is an absolute requirement to make a
multiple sequence alignment, which should include your sequence of course, the sequence of the template and some other sequences of proteins of the same family. This will give an overview of the general features of the protein family, the degree of conservation, the presence and location of consensus sequence motifs, etc. It would also be very desirable to make secondary structure prediction, discussed in the tutorial on sequence alignment. Most importantly, the positions of insertions and deletions should be correct (outside regions of secondary structure), likewise the conserved residues, for example active site residues, should be aligned against each other. When the sequence analysis is done and the alignment is corrected accordingly, we may proceed to the modeling. Modeling software will most probably use its own sequence alignment, which must be checked against your own alignment to make sure that there are now substantial differences. The steps followed by the software include backbone generation, building missing parts (e.g. loops), generation of side chains for residues, optimization of side chain conformations, and energy minimization of the model. The server usually also outputs an assessment of model quality. There are several servers that may be used for modeling, here we will use the Swiss Model site, which is relatively fast and provides nice model quality assessment. Some other servers (like Phyre, I-tasser or ROBETTA), which may use more sophisticated algorithms, can take days (or even weeks!) to return the modeling request. In complicated cases it may be an advantage to use different servers and compare the outputs from them. Of course, the higher the sequence identity between the model and the template the better the expected quality of the model will be.

Like sequence alignment, it is important to keep in mind that depending on the degree of sequence conservation, modeling may be straightforward, but may also be rather challenging, for example, if we need to use 2-3 different templates to model different domains of our protein. The question then will be - how to put these different domains together into one structure? In some cases one could combine modeling with electron microscopy or small-angle X-ray scattering (SAXS) methods, which can provide low-resolution overall shape of the protein in solution. Subsequently, the models of the different domains may be docked into the EM or SAXS densities.

In this chapter we discuss some of the essential issues related to the technique of homology modeling:


The tutorial describes a relatively easy case of modeling. For doing it you will need some knowledge in sequence alignment (see the sequence alignment Tutorial for details). In addition, we always need to have some basic understanding of protein structure before starting a homology modeling project.