Protein Folds and Protein Fold Classification

Fold assignment usually follows after assignment of secondary structure. Fold analysis may reveal evolutionary relationships, which sometimes are difficult to detect at the sequence level, it may also help a better understanding of the mechanism of function of a protein, its activity and biological role. Study of the relationships between the amino acid sequence and the fold may also reveal deeper insights into the fundamental principles of protein structure, and may aid, e.g. in the design of new proteins with pre-defined structure and activity.

The relationship between the amino acid sequence and the three-dimensional structure of a protein is not unique – a large number of modifications in the sequence within a protein family can be tolerated and will result in a similar 3D structure. The higher degree of conservation of the three-dimensional structure, compared to sequence conservation, is a prerequisite for the function of a protein (structure is function!). By other words, the constraints put during evolution by Nature on the three-dimensional structure are much tighter than those put on the amino acid sequence. There are special techniques used to compare 3D structures and to judge the degree of similarity between them. Some discussion on this subject may be found in the homology modeling chapter.

A protein fold is defined by the arrangement of the secondary structure elements of the structure relative to each other in space. Some folds have already been mentioned in the previous section on protein motifs. The 4-helix bundle and the TIM barrel, mentioned earlier, are two types of very common protein folds. The amino acid sequences of proteins forming these two folds do not need to have any evolutionary relationships. Another example of a protein fold is the coenzyme-binding domain of some dehydrogenases, which adopts the so called Rossman fold, named after Michael G. Rossmann, a protein crystallographer who solved one of the very first structures with this type of fold. It is also the only protein fold named after the person who was first to discover it:

Rossman fold illustration
alcohol dehydrogenase domain

In this figure on the left a schematic presentation of the Rossmann fold. On the right the nucleotide binding domain of liver alcohol dehydrogenase is shown. Notice the central parallel β-sheet (shown in yellow) flanked by α-helices on both sides of its plane. There are of course many more types of protein folds, but how many in total? Taking into account the huge number of amino acid sequences, one would expect a high number of different folds. But in reality the number of folds is limited. Nature has re-used the same fold again and again for performing totally new functions. To check statistics on protein folds we can simply go to the Protein Databank (PDB) and click the PDB Statistics link on the right upper corner. This will bring us to a page where among other stuff the following two options are shown:

• Folds As Defined By SCOP
• Topologies As Defined By CATH

SCOP and CATH are the two databases generally accepted as the two main authorities in the world of fold classification. According to SCOP there are 1393 different folds. Also notice the graph, the last time a new fold was identified was 2008:

Number of protein folds defined as by SCOP

The next graph shows the folds identified by CATH database, a total of 1282 folds:

Number of protein folds as defined by CATH

Apparently the two databases use slightly different ways for fold definitions and classification, which results in different total numbers of folds. It is also interesting to note that during the recent years essentially no new folds have been discovered. Have we reached the limit? There is probably still a chance that some new folds will be discovered.
Since many proteins contain several domains with different folds, one could ask: What is actually being classified by these databases? The answer is the "simplest", or sometimes also called the "independent" folding unit of a protein − a domain. Knowing the fold of the different domains in a protein molecule is important in many cases. For example, in homology modeling we need to have a clear idea about the number of
domains in a protein and the type of folds they have.