To work with the Swiss Model server, we first need to create an account. After providing an e-mail address the password will be sent back to the same email. Modeling and other results will be saved at “My Workspace” for later use and can be accessed again for a week or so by login in to “My Workspace” (link is found in the Modeling menu).
The sequence of the protein to be modeled (or just UniProt entry code) can be fetched as we did in the sequence alignment tutorial. Then we just paste it into the template identification window and wait for the server to run the Blast search. The Blast search will be run against the sequences of known protein structures from the ExPDB, the SwissModel template library. It is derived from PDB entries, after excluding predicted (theoretical) structures and structures containing only C-alpha atoms. PDB coordinate files that contain two or more chains (subunits) are usually distinguished by the so-called chain identifier (discussed in the PDB section). In ExPDB these entries are split into two or more files, depending on the number of chains (usually denoted A, B, C, etc). For example, PDB entry 1cpc contains two chains, A and B. In ExPDB there will be two entries corresponding to this structure: 1cpcA and 1cpcB. Please keep in mind that the Blast run may take some time, all depends on how busy the server is.
As mentioned earlier, it is essential to have an idea on the complexity of the protein homology modeling project before starting the actual modeling. This can be done by making and analyzing a multiple sequence alignment of your protein with some homologues, including the amino acid sequence (or sequences) of the modeling template after it has been identified by the server. As a rule of thumb, a percentage sequence identity above 50% will mean a relatively straightforward modeling, while anything below that will require careful examination of the alignment and further planning. As usual, it is an advantage to include in the alignment a group of homologous sequences from different organisms. One could try to choose 3-4 bacterial and an equal number of eukaryotic sequences. This is required especially if both bacterial and eukaryotic proteins are found among the potential templates. Any large insertions and deletions in the alignment need to ba carefully analyzed and compared to the sequence of the template. Insertions (amino acid segments which are not present in the template) mean regions for which no structure is available in the template and may need to be modeled separately. These regions may potentially contain alignment errors. If these are short loop regions, the server will attempt to model them automatically. Loops may be relatively easy to model, but modeling larger regions is not straightforward.
In cases when there are larger insertions and deletions it is possible to use the “Target-Template Alignment” mode of the server, which lets us submit our own manually adjusted sequence alignment.
Step by step modeling
In this example, we will make a model for the enzyme magnesium chelatase subunit BchI from Cyanobacteria Synechocystis (SWISSPROT entry P51634). The results of the BLAST search are shown below. Please note that the results will be saved in your personal account and can be viewed after loggin in into the account.
The best choice in the list is R. capsulatus BchI (PDB ID 1g8p). The other proteins in the list are of very low resolution (7.5 Å) and originate from electron microscopic (EM) modeling. They are essentially the same protein (1g8p), which has been slightly modified to fit the EM model. Longer in the list are other structures, with sequence identity with our protein in the range of around 20%. They could be of interest since they are members of the AAA+ ATPase family. One could have a closer look at them at some later stage, if the modeling is not satisfactory. They may also shed additional light at the mechanism of magnesium chelatase.
By clicking the down-arrow in the right, we can see the alignment of the target sequence with the template:
Here we may choose to build a model, however, but we first need to examine the sequence alignment to ensure that there are no misplaced insertions and deletions. The CHLI_SYNY3 sequence was actually included in the sequence alignment exercise we made earlier and if we compare the alignments we may see that the largest gap is slightly shifted to the left here. This is probably ok and we can proceed with the modeling. We can always come back to this if we are not satisfied with the model for some reason. The modeling results output looks like this:
Here we should start analyzing the quality of the model. You may notice that the QMEAN value is satisfactory, and if we check the local quality plot we will immediately notice the source of a potential problem (low similarity values) is located in at the beginning of domain 1, which is essentially the position of the insertion shown in the alignment above. The graphics model on the right (not visible in the images above) shows this region in red color. Clicking on the question mark close to QMEAN will open a page with explanation of its meaning. In the model quality part we will have a closer look at these problems.
What would have happened if we would choose the 7.5 Å resolution structure for modeling? Below you can see that model quality is considerably lower, with high QMEAN (-6.48 vs the acceptable -2.65 for the first model) that is due to problems both at the beginning and end of the structure, as can be seen on the local quality plot. In addition, the comparison plot (first on the right) shows that the structure (the red star) is outside the range found for experimental PDB structures.