Protein Folds and Fold Classification

Detailed analysis of the fold of a protein is often required to reveal its evolutionary history, which sometimes may be difficult to detect using information only from the amino acid sequence. Study of the relationships between the amino acid sequence and the fold may also reveal deeper insights into the fundamental principles of protein structure, and may aid, e.g. in the design of new proteins with predefined structure and activity.

For fold assignment we first need to assign the secondary structure, which is usually done by many computer programs. All PDB entries also contain a detailed description of the secondary structure of the protein, providing the number and type of the first and last residue of each helix and β-strand. They also provide information on the strands that belong to each β-sheet in the structure. A protein fold can then be defined by the arrangement of the secondary structure elements of the molecule relative to each other. Some folds have already been mentioned in the previous section on protein motifs, like the helix bundle and the TIM barrel folds. Another example of a protein fold is shown in the image below. It was named the Rossmann fold, after Michael G. Rossmann, a protein crystallographer who solved the structure of lactate dehydrogenase, the first structure that contained this domain type. It is also the only protein fold named after the person who discovered it.

schematic Rossmann fold
Rossmann type of fold

In the image on the left a schematic representation of the Rossmann fold is shown, which consists of a parallel 6-stranded β-sheet flanked by α-helices. On the right the nucleotide binding domain of liver alcohol dehydrogenase is shown. Notice the central parallel β-sheet (in yellow) flanked by α-helices on both sides of its plane.

There are of course other types of protein folds, but how many in total? Taking into account the large number of amino acid sequences in databases like UniProt, one would expect a high number. But in reality it is limited, it appears that nature has re-used the same fold again and again for performing new functions. Two databases, SCOP and CATH have analyzed protein folds and classified them accordingly. The total number of folds counted by these two databases is slightly different:

• Folds as defined by SCOP (1393)
• Folds (Topologies) as defined by CATH (1375)

It is interesting to note that as the graph below shows (folds according to SCOP), the last time a new fold was identified was in 2008. Is there a maximum number of possible folds? Is there any chance that a new fold will be discovered? It is hard to answer these questions.

Number of protein folds defined as by SCOP

The next graph shows the folds identified by CATH database, a total of 1282 folds:

Number of protein folds as defined by CATH

Since many proteins contain several domains with different folds, one could ask: What part of the structure is classified by these databases? The answer is the "simplest", or sometimes also called the "independent" folding unit of a protein − a domain. Knowing the fold of the different domains in a protein molecule is important in many cases. For example, in homology modeling it is essential to have a clear idea about the number of domains in a protein and their folds. Next section will discuss the domain concept.