As mentioned earlier amino acid sequence alignment may be be rather simple to perform, but may also need some extra attention, for example in cases when the proteins have considerably diverged and there is a large number of insertions and deletions, or in cases of multidomain proteins, especially if not all domains are present in one of the proteins being compared. This may happen, for example during homology modeling of a multi-domain protein. Information on the tertiary structure is of course of great help for obtaining a correct alignment. In this first tutorial we will look at an easy way for making a sequence alignment. We will be focusing on using the tools available at Expasy and EBI servers, although there are of course many others. We start with a protein of highly conserved sequence - subunit BchI of magnesium chelatase. BchI is one of three subunits of magnesium chelatase, which assemble together to catalyze the first committed step in chlorophyl biosynthesis, the insertion of a Mg2+ into protoporphyrin IX. In the second tutorial we will look into a slightly more complicated case of the second subunit of magnesium chelatase BchD, one domain of which is homologous to BchI. The alignments we make here will be used later in the homology modeling tutorial. It is essential to remember at this stage that a proper sequence alignment is central for creating a correct homology model.
To make the alignment we first need to find the sequence of the protein we are interested in. For that we can use the UniProtKB database, which appears to be a result of a collaborative project involving EMBL-EBI, PIR and the Swiss Bioinformatics Institute. To start, simply write the name of the protein (BchI) into the UniProt search window, and you will be presented a list of sequences of BchI from different organisms:
I am just showing the first few sequences, the list contained a total of 295 sequences when I did the search. There you need to click on BCHI_RHOCB (entry P26239), which is subunit BchI from Rhodobacter capsulatus. The page which will open is almost like a tutorial on megnesium chelatase - you will find there information on the biological function (photosynthesis, magnesium chelatase activity), type of ligands/substrate it binds (ATP), catalytic function (ATP hydrolysis), Protein Data Bank (PDB) entries, if available, links to published works, links to entries related to this particular protein in other databases, and of course the amino acid sequence of the protein. One of the links, which I find very useful, is the one to the InterPro database. It provides a plenty of information about the protein and the family to which it belongs:
For sequence alignment we first need to retrieve the sequences of BchI from different organisms. Normally the sequence is presented in the following format:
To make an alignment we need to choose some additional sequences (and sometimes also choose a server where we want to make the alignment). Normally one needs to spend few minutes and think which sequences to include in the alignment. Then we can run a Blast search (on the top right in the above figure). The results are shown in the image below: