The first step in sequence analysis is placing the protein we are interested in within a specific frame, which is the protein family to which it belongs. Within a family, proteins perform a similar function and have conserved and easily recognizable sequence features and conserved three-dimensional structures. An alignment will help us reveal all these characteristic features, find the conserved and variable regions in the sequence, show functionally essential residues, and extract information on the secondary and tertiary structure. Later in the chapter, we will discuss the techniques used for performing this type of analysis.
To find related sequences, we first need to run a
database search. The software will compare our sequence (
query sequence) to all other sequences in the database by using a so-called "
local alignment" and, based on specific criteria, will output several amino acid sequences related to our protein. Local alignment compares small segments of the query sequence to other sequences in the database to find matching amino acid segments. If we are happy with the search results and have a list of proteins to start the analysis, we use a different type of sequence alignment, the so-called
global alignment. In this case, we compare the entire query sequence to another sequence using
pair-wise alignment or a group of sequences using
multiple sequence alignment (MSA). MSA constitutes the basis of any sequence analysis and provides much more information than the alignment of just two sequences. This type of alignment is extensively used in secondary and tertiary structure prediction and modeling, identifying protein families and domains or identifying conserved residues essential for function.
There are many sequence alignment and analysis tools on the web. To guide you through the jungle, I will provide some examples to show the way I make this analysis. Then it is up to you to decide if you want to follow my examples or use some other tools.
The first step in sequence analysis is placing the protein we are interested in within a specific frame, which is the protein family to which it belongs. Within a family, proteins perform a similar function, have conserved and easily recognizable sequence features, conserved three-dimensional structures, etc. An alignment will help us reveal all these characteristic features, find the conserved and variable regions in the sequence, show functionally essential residues, extract information on the secondary and tertiary structure, etc. Later in the chapter, we will discuss the techniques used for performing this type of analysis.
To find related sequences, we first need to run a
database search. The software will compare our sequence (
query sequence) to all other sequences in the database by using a so-called
"local alignment" (see image below) and, based on specific criteria, will output several amino acid sequences related to our protein. Local alignment compares small segments of the query sequence to other sequences in the database to find matching amino acid segments. If we are happy with the search results and have a list of proteins to start the analysis, we use a different type of sequence alignment, the so-called
global alignment. In this case, we compare the entire query sequence to another sequence using
pair-wise alignment or to a group of sequences using
multiple sequence alignment (MSA). MSA constitutes the basis of any sequence analysis and provides much more information than the alignment of just two sequences. This type of alignment is extensively used in secondary and tertiary structure prediction and modeling, in identifying protein families and domains, identifying conserved residues essential for function, etc.