To work with the Swiss Model server (before the start of template identification and modeling), we need to create an account. After providing an e-mail address the password will be sent back to the same email. The sequence of the protein (or just UniProt entry code) to be modeled can be fetched as we did in the sequence alignment tutorial. Then we just paste it into the template identification window and wait for the server to run the Blast search. The Blast search will be run against the sequences of known protein structures from the ExPDB, the SwissModel template library. It is derived from PDB entries, after excluding predicted structures and structures containing only C-alpha atoms. In ExPDB coordinate files containing two or more chains (usually distinguished by chain identifier present in the PDB file after each amino acid name), are split into two or more files, depending on the number of chains (usually denoted A, B, C, etc). For example, PDB entry 1cpc contains two chains, A and B. In ExPDB there will be two entries corresponding to this structure: 1cpcA and 1cpcB. For details on the PDB coordinate file content, please check the related page. Please keep in mind that the Blast run may take some time, all depends on how busy the server is.
As mentioned earlier, it is essential to have an idea on the complexity of the protein homology modeling project before starting the modeling. This can be done by making and analyzing a multiple sequence alignment of your protein with some homologues, including the amino acid sequence (or sequences) of the modeling template identified by the server. As a rule of thumb, a percentage sequence identity above 50% will mean a relatively strait forward modeling project, while anything below that will require careful planning. However, this is just a general rule, it does not mean that careful analysis is not required. There are 5 modeling alternatives available at the Swiss-Model server. The alignment should include a group of homologous sequences from different organisms, including the template (or templates, if more than one) sequences. As discussed in the sequence part, try to choose 3-4 bacterial and an equal amount of eukaryotic sequences. The alignment will show if there are any large insertions and deletions in the protein being analyzed, compared to the template. Insertions (amino acid segments which are not present in the template) mean regions for which the structure is not known and may need to be modeled separately. These also are the regions, which potentially may contain most errors. Usually the server will attempt to model these regions automatically. Loops may be relatively easily modeled, but modeling larger regions is not straightforward.
It is also possible to use the “Target-Template Alignment” mode at the server, which lets us to start with own manually adjusted sequence alignment.
Error sources in homology modeling
The earlier we become aware of possible errors, the better we can eliminate them and handle our modeling project in a proper way. Errors to avoid include the following:
1- Incorrect sequence alignment - among the most devastating error in homology modeling.
2- Incorrect choice of template - may happen, especially for multi domain proteins.
3- Incorrectly built loop regions - loops are usually built automatically by the server. If correct loop conformation is important for the project one could try to do the modeling with different servers and then compared the models from each of them.
4- Errors made by the person doing the modeling - this type of errors may include anything and are difficult to predict in advance. Knowledge on the basic principles of protein structure is important for minimizing this type of errors.
5- Errors which may be present in the template - difficult to eliminate. A model can hardly be better than the template.
Step by step modeling
In this example, we will make a model for the enzyme magnesium chelatase subunit BchI from Cyanobacteria Synechocystis (SWISSPROT entry P51634). The results of the BLAST sear are shown below. Please note that the results will be saved in your personal account and can be viewed after loggin in into the account.
The best choice in the list is R. capsulatus BchI (PDB ID 1g8p). The other proteins in the list are of very low resolution (7.5 Å) and originate from electron microscopic (EM) modeling. They are essentially the same protein (1g8p), which has been slightly modified to fit the EM model. Longer in the list there are other structures. The sequence identity with our protein is around 20%, and none of them is a magnesium chelatase. However, they are interesting since they are members of the family of AAA ATPases one could have a closer look at them at some later stage, they may shed additional light at the mechanism of magnesium chelatase. These proteins are involved in a large number of biochemical processes in organisms.
By clicking the arrow in the right, you will get the alignment of the target sequence with the template:
We may choose to build a model, however, we need first to check the sequence alignment to ensure that everything is correct. The CHLI_SYNY3 sequence was actually included in the sequence alignment exercise we made earlier and if we compare the alignments we may see that the largest gap is slightly shifted to the left here. This is probably ok and we can proceed with the modeling. We can always come back to this if we are not satisfied with the model for some reason. The modeling results output looks like this:
Here we should start analyzing the quality of the model. You may notice that the QMEAN value is satisfactory, and if we check the local quality plot we will immediately notice the source of a potential problem (low similarity values) is located in at the beginning of domain 1, which is essentially the position of the insertion shown in the alignment above. The graphics model on the right (not visible in the images above) shows this region in red color. Clicking on the question mark close to QMEAN will open a page with explanation of its meaning. In the model quality part we will have a closer look at these problems.
What would have happened if we would choose the 7.5 Å resolution structure for modeling? The answer is in the image below, which shows considerably lower model quality with problems both at the beginning and end of the structure, as seen on the local quality plot: