Sequence alignment tutorial Part 2: BchI-BchD alignment


Now that we have an idea about how to make a simple sequence alignment and how to analyze it, for example by coloring according to percentage identity, coloring only hydrophobic residues, etc, we can look at a more demanding case with some insertions and deletions. This is going to be the second subunit of magnesium chelatase, called BchD, which is almost twice the size of BchI. The task now is to compare the amino acid sequences of the two subunits and find out if they contain homologous parts/domains. We will also learn here how to use secondary structure information in sequence alignment.

For the BchI-BchD sequence alignment it is important to make sure that BchD sequences included in the alignment are really BchD and not something else. I think the problem is that some proteins, which probably belong to the not so well characterized Ni-chelatase, are annotated as Mg-chelatase. When trying to make sequence alignment including some of these proteins, I get rather chaotic results with some sequences having only 15% identity to R. capsulatus BchD. Meanwhile, BchD, similarly to BchI, is highly conserved and 15% seems to me to be very low. However, I never had time to investigate the problem closer. To get an idea on the conservation pattern within BchD you may make a separate sequence alignment of BchD sequences similarly to the alignment of BchI we did previously. We also need to check the InterPro database (we discussed earlier) to get an idea on, for example, domain content of BchD, the presence of domains of known function or structure (apart from the domains homologous to BchI), etc.
You will notice that the N-terminus of the protein includes an AAA+ domain, which apparently will be the one homologous to BchI. You may also notice that there is a von Willebrand type domain at the C-terminus of BchD. This domain has very interesting properties and when we discovered its presence in magnesium chelatase (many years ago) we were very excited. It gave us a lot of clues on the possible functional mechanism of the enzyme. It was discussed in a paper we published on the structure of BchI (Fodje et all, 2000). A later publication describing the complex between subunits BchI and BchD largely confirmed our initial hypothesis (Lundqvist et all, 2010). After building an idea on the protein, we will be ready for the sequence alignment.

For the alignment I used 3 sequences of BchD and 3 of BchI from 3 different organisms. We will need to fetch the two groups of sequences separately since the Blast search we used earlier will primarily identify BchI or BchD sequences (within the limits of the server). We will need to paste the sequences (don't forget the FASTA format) into the
CulstalW window of the EBI server, and run the alignment with the default parameters (don't forget to change the output order to "input" instead of the default "align"). Below I have pasted the N-terminal part of the alignment, which includes the BchI sequence:

ID-alignment

From this alignment the homology between BchI and the N-terminal part of BchD is very clear. You may also notice that there are insertions and deletions both in BchD and BchI. The question is if their placement is correct, or by other words, can we trust this alignment and use it at a later stage, for example in homology modeling? We could get some clues by doing secondary structure prediction on Rhodobacter capsulatus BchD. Secondary structure prediction is rather reliable and may be used in guiding sequence alignment. Here are the result of the secondary structure prediction using the Jpred server, which provides a so called consensus prediction (using different prediction methods) and a nice output. In this figure I just pasted by hand the predicted strands (sticks) and helices (arrows) on the alignment shown above. Lets have a look at the figure:

alignment-sec2

The results look convincing, the gaps do not interfere with the position of the secondary structure elements. But are there any other ways to verify the correctness of the positions of the gaps? We could compare the secondary structure prediction for BchD with the secondary structure of BchI (the X-ray model). If we assume that the fold of the two proteins and the overall 3D structures are conserved, we should get more or less similar positions for the secondary structure elements. You may see the results in the figure below:

structure-based

My former PhD student, Joakim Lunqvist originally made this alignment for a publication. The blue helices and strands are from the Jpred prediction on BchD, while the actual secondary structure of BchI is shown in green and red. Red actually shows parts, which are present in BchI, but are deleted in R. capsulatus BchD. This is an interesting difference between R. capsulatus BchD and BchD from other organisms, which appear to be more BchI-like. Although the predicted positions of helices and strands agree well with the position of helices and strands in BchI 3D structure, you may notice that the predicted length is different in most cases. Also a beta-hairpin in the middle of helix alpha2* (marked on the figure) was predicted by Jpred to be a helix. Interestingly, if we would check the structure of BchI we would find out that the hairpin is actually inserted into a helix, something which does not happen very often. It is also interesting to note that helix alpha5* was not predicted at all. However, to be absolutely sure that this is not correct we need the 3D structure of this domain in BchD, which we unfortunately don't have.

Although, verification of the position of secondary structure elements is always useful, for example if we plan to do some mutational studies, there is not always a crystallographic structure waiting for us to be used for such verification. In some cases we may use our biological knowledge on the protein to make sure that the alignment is correct. For example, in the case of the BchD-BchI alignment, we could check the position of the residues, known to be conserved in AAA+ proteins. In the alignment above the residues characteristic for the AAA+ family (to which both BchI and BchD belong), are marked under the alignment. Among these is the Walker A and B motifs, sensor-1 and 2 (S-1, S-2) and the so called Arg-finger. You may check if these residues are aligned properly in the alignment of BchD sequences above. Although, the Walker motif is not conserved in
R. capsulatus BchD, which means that this protein, in contrast to BchI, is not able to hydrolyze ATP.

Here we conclude the sequence alignment part. Just few last notes on structural information which may be extracted from a multiple sequence alignment, useful to keep in head when making and analyzing a sequence alignment. For example, the position of insertions and deletions suggest loop regions, invariant Gly and Pro residues often may be associated with beta-turns, while conserved segments of hydrophobic residues suggest buried beta-strands.

Sequence alignment and substitution matrices