Protein Domains & Fold Classification

Detailed analysis of a protein’s fold can be used to reveal its function and evolutionary history, which sometimes may be difficult to detect only using information from the amino acid sequence. Study of the relationships between the amino acid sequence and the fold may also provide deeper insights into the fundamental principles of protein structure and may aid in the design of new proteins with predefined structure and activity. For fold assignment we first need to assign the secondary structure, which is usually done by many computer programs. All PDB entries also contain detailed description of the secondary structure of the protein, including the sequence number and name of the first and last amino acid residues of each helix and β-strand.

Domains in proteins
As shown on the image below, while some proteins only contain a single domain, others may have several domains. Some domains have a clearly defined function associated with them, like the
Rossmann fold domain (also called coenzyme-binding domain, see Proteopedia for history and details of the Rossmann fold), discussed earlier. Such domains often “carry” their function with them when they get inserted into different proteins during evolution. Other domains, like the 4-helix bundle, are there probably just for their stability.

Below are examples of a one-domain (
hemoglobin, on the left), and a 4-domain protein (pyruvate kinase, on the right).

The domains in pyruvate kinase are well separated from each other and have different fold. The top domain on the figure above is a β-sheet domain, while the other two are of alpha/beta type (see the respective Proteopedia page for details).

In most organisms the functional unit of these two proteins is tetrameric (contains 4 subunits). In the case of hemoglobin there will be 4 molecules (and 4 domains) in each functional unit, while functional unit of pyruvate kinase will contain 12 domains. The quaternary structure of the proteins is shown below (hemoglobin left, and pyruvate kinase on the right.
Clicking the images will take you to the PDB 3D view):

Defining a domain
A domain may be characterized by the following:
1- Spatially separated unit of the protein structure
2- Often has sequence and/or structural resemblance to other protein structures or domains.
3- Often has a specific function associated with it.

Fold classification databases give detailed information on the domain content of each protein and the fold associated with the domains. The procedure followed by CATH (C-class, A-Architecture, T-Topology, H-Homologous superfamily) and SCOP (Structural Classification of Proteins), includes:

  • Assignment of secondary structure
  • Assignment of domains
  • Assignment of a Class to each domain (based on secondary structure content - alpha, beta or alpha/beta types of proteins)
  • Assignment of Architecture (same as Fold, amino acid sequences not necessarily homologous - common evolutionary origin not required)
  • Assignment of Topology (same Fold + common evolutionary origin - homology)
  • Assignment of Homologous superfamily (Superfamily defines a group of proteins that appear to be homologous, even in the absence of significant sequence similarity)

We can look at an example of CATH classification.
Performing a search with PDB ID 1T5A (the tetrameric pyruvate kinase structure above) in CATH will return the following result for the 3 domains:

CATH search results

1t5aA01 corresponds to chain A (since there are 4 chains in the PDB entry) and domain number 01 (of 3). If we click on the first domain, we get information about its classification (image below) - Class: Alpha Βeta, Architecture: 2-Layer Sandwich, Topology: Pyruvate Kinase C-terminal domain. This information is highly valuable in homology modeling, especially in cases when we need to model different domains using different modeling templates, the so-called multi-template homology modeling (discussed in more detail in the homology modelling tutorials).

My Image

We will dive deeper into the CATH database later when we discuss the second homology modelling project (coming soon).
In the next section we will look at the PDB and PDBsum protein databases, both essential for protein structure analysis, for example when planning a homology modelling project.