Introduction to Homology Modeling

The term "homology modeling", also called comparative modeling or template-based modeling (TBM), referes to modeling a protein 3D structure using a known experimentally determined structure of a homologous protein as a template. A protein structure is always of great assistance in the study of protein function, dynamics, interactions with ligands and other proteins, and even within pharmaceutical industry in structure-based drug discovery and drug design. Homology modeling can provide the molecular biologists and biochemists with "low-resolution" structures, which will contain sufficient information about the spatial arrangement of important residues in the protein and which may guide the design of new experiments. For example, the design of site-directed mutagenesis experiments could be considerably improved if such "low-resolution" model structures could be used.

Experimental elucidation of a protein structure may often be delayed by difficulties in obtaining a sufficient amount of protein (cloning, expression and purification of milligram quantities), by difficulties associated with crystallization, and even the protein crystallographic part may become a source of problems. In this context, it is not surprising that methods dealing with the prediction of protein structure have gained much interest. Among these methods, the method of homology modeling usually provides the most reliable result. The use of this method is based on the observation that two proteins belonging to the same family and sharing similar amino acid sequences, will have similar three-dimensional structures. By other words, the problem is reduced to finding a modeling template. The steps in homology modeling are the following:

  • template identification;
  • amino acid sequence alignment;
  • alignment correction;
  • backbone generation;
  • generation of loops;
  • side chain generation & optimization;
  • ab initio loop building;
  • overall model optimisation;
  • model verification. Quality criteria, model quality;
After finding a template it is an absolute requirement that before starting the modeling project, you make a multiple sequence alignment, which should include your sequence, the sequence of the template and some other sequences of proteins belonging to the same family. This will give you an overview of the general features of the protein family, the degree of conservation, the consensus sequence motifs, etc. It would also be very desirable to make a secondary structure prediction, discussed in the tutorial on sequence alignment. Most importantly, the positions of insertions and deletions should be correct, likewise the conservation of important residues, for example active site residues. When the sequence analysis is done and the alignment is corrected accordingly, we may proceed to the modeling. The modeling software will thread your sequence on the template structure, thus creating a preliminary model of you protein (backbone generation). After that it will try to build missing parts, generate side chains for replaced residues and optimize side chain conformations, etc. At the last step the overall model needs to be optimized followed by verification of model quality. There are several servers which may be used for modeling, but here I will stick to one of them, the Swiss Model site. For learning purposes I think it is better to start with this server using SwissPDB Viewer as a graphics interface, than directly using automatic modeling servers, like Phyre, I-tasser or ROBETTA.

Depending on the degree of sequence conservation, modeling may be easy or not so easy. Sometimes it may be rather challenging, for example, if you need to use 2-3 different templates to model different domains of your protein. The question then will be - how to put them together into one structure? In complicated cases I would recommend using different servers and compare the results obtained from them. There are also so-called meta-servers, which try to run several different methods in parallel.

In this chapter I will discuss some of the essential issues related to the technique of homology modeling:

The tutorial describes a relatively easy case of modeling. For doing it you will need some knowledge in sequence alignment (see the sequence alignment Tutorial for details). In addition, one always needs to have some basic understanding of protein structure, before attempting homology modeling.