An evolutionary unit of 100 million years was adapted, resulting in the PAM (Percentage Accepted Mutations / 100 million years) matrix. 1 PAM corresponds to an average amino acid substitution in 1% of all positions. Although 100 PAM does not mean that all the amino acids in the sequence are different (as compared to the original sequence), since, as noted above, many of them will be mutated back to their original type.
Further analysis showed that, although at 256 PAM, 80 % of all amino acids will be substituted, 48% of Trp, 41% of Cys, and 20% of His will be conserved. On the other hand, only 7% of S residues will remain (discussed in Barton, J, 1996
). This conservation pattern presumably results from a combination of structural and functional restraints mentioned above. For example, tryptophan has a large side chain, and if positioned within the structure's core, its replacement by another amino acid may destabilize the protein. In addition, the other highly conserved residues, cysteine and histidine, are often involved in specific functions like proton abstraction (His), metal binding (His and Cys), or disulfide bridge formation (Cys).
After the construction of the mutation probability matrix, Dayhoff et al. defined the score Si,j
of two aligned residues i, j according to the following equation:
In which (Mij ) is the probability of these two residues being aligned (from the matrix above), and pi
- the probability of these two amino acids being aligned by chance.
This type of scoring will give higher numbers to the alignment of a pair of similar amino acids (e.g., Leu and Ile) and lower numbers when the amino acids are different (like I and D). Using this definition, the probability matrix (shown in the image above) can then be used to derive the so-called log-odds scoring matrix: