Sequence alignment tutorial 2

Now that we have an idea about how to make a simple sequence alignment and how to analyze it, for example by coloring according to percentage identity, coloring only hydrophobic residues, etc, we can look at a more demanding case with some insertions and deletions. This is going to be the second subunit of magnesium chelatase, called BchD, which is almost twice the size of BchI. The task now is to compare the amino acid sequences of the two subunits and find out if they contain homologous parts/domains. We will also learn here how to use secondary structure information in sequence alignment.

For the BchI-BchD sequence alignment it is important to make sure that BchD sequences included in the alignment are really BchD and not something else. Due to automatic annotation procedures used in genomic projects, some proteins, which probably belong to the not so well characterized Ni-chelatase, are annotated as Mg-chelatase. If we would try to make sequence alignment including Ni-chelatase proteins, we will get rather chaotic results with some sequences having only 15% identity to R. capsulatus BchD.

To get an idea on the conservation pattern within BchD we can make a separate sequence alignment of BchD. The alignment shown below was run by fetching the sequences from UniProt and pasting them into the alignment window (don't forget the FASTA format) of MAFFT of the EBI server. The “OUTPUT FORMAT” was set to “ClustalW” and “ORDER” to “Input”, while “Gap open penalty” set to 2.0 to avoid having many small gaps. Image colored using JalView. Click to get a larger image!

My Image

We may notice from the alignment above that the N-terminal sequence of these proteins is generally not well conserved - there are several large insertions and deletions in this region. However, the rest of the sequences appears to be well conserved. We could also check the InterPro database to get an idea on domain content of BchD. The analysis will show that the N-terminal part of BchD is an AAA+ domain, which is homologous to BchI (which we need to find out). There is also a von Willebrand type domain at the C-terminus of BchD. This domain has very interesting properties and when we discovered its presence in magnesium chelatase (many years ago) we were very excited. It gave us a lot of clues on the possible functional mechanism of the enzyme. It was discussed in a paper we published on the structure of BchI (Fodje et all, 2000). A later publication describing the complex between subunits BchI and BchD largely confirmed our initial hypothesis (Lundqvist et all, 2010). It is always useful to check the literature to get an idea about the protein before proceeding to sequence alignment and homology modeling.

For the alignment shown below 3 sequences of BchD and 3 of BchI from 3 different organisms were used, and
Clustal Omega used for the alignment (don't forget to change the output order to "input" instead of the default "align”), NUMBER of COMBINED ITERATION and MAX HMM ITERATIONS were set to 3. Below I have pasted the N-terminal part of the alignment, which includes the BchI sequence (click to get a larger image!):

My Image

You may also notice that there are insertions and deletions both in BchD and BchI. The question is if their placement is correct, or by other words, can we trust this alignment and use it at a later stage, for example in homology modeling? We could analyze the secondary structure prediction of BchI (the X-ray model) with respect to the position of the insertions and deletions, to make sure that they (the insertions and deletions) do not disrupt the secondary structure elements. Interestingly, if we would check the structure of BchI we would find out that a hairpin is inserted into one of the helices, something which does not happen very often. Try to find this!

Although, verification of the position of secondary structure elements is always useful, for example if we plan to do some mutational studies, there is not always a crystallographic structure available for us to be used for such verification. In that case we may use secondary structure prediction to find the position of helices and strand. In some cases we may need to use conserved motifs in the sequence to make sure that the alignment is correct. In AAA+ proteins among these conserved motifs is the Walker A and B motifs, sensor-1 and 2 (S-1, S-2) and the so called Arg-finger. You may also notice that the Walker motif is not conserved in
R. capsulatus BchD, which means that this protein, in contrast to BchI, is not able to hydrolyze ATP.