Protein Domains and Domain Classification

Many proteins only contain a single domain, while others may have several domains. Some domains have some clearly defined function associated with them, like the Rossmann-fold domain, also called coenzyme-binding domain, discussed earlier. Such domains often “carry” their function with them when they get inserted into different proteins during evolution. A domain may be characterized by the following:

1- A spatially separated unit of the protein structure
2- Often have sequence and/or structural resemblance to some protein structure or domain.
3- Often have a specific function associated with it.

The easiest way to characterize the fold of a protein domain is to search in the respective databases. The procedure followed by databases, for example
CATH or SCOP, includes:

1- Assignment of
secondary structure
2- Assignment of domains
3- Assignment of a structural class to each domain (3 possible structural classes, alpha, beta and alpha/beta)
4- Assignment of fold (called Architecture in the CATH database)
5- Assignment of topology (homologues superfamily)

Secondary structure is usually assigned automatically, using computer software. All protein structure visualization programs like Chimera and Pymol include this function, and all PDB files contain definition of secondary structure in a protein (shown in beginning of the file).

One needs to be aware that CATH and SCOP use slightly different terminology in fold assignment and have a different way of describing the entries. CATH follows the Class-Architecture-Topology-Homologous superfamily classification scheme. There are currently 53 million protein domains classified into 2,737 superfamilies in the CATH database. As an example, the figure below shows two proteins, one contains one domain (hemoglobin), while the second has 3 domains (pyruvate kinase). A subunit of hemoglobin consists of a single α-helical domain. You may also see the heme molecule (in sticks representation) bound within a pocket created by the α-helices:

Stacks Image 1890
Stacks Image 1904

The functional units of both proteins consist of 4 subunits, by other words they are arranged into a quaternary structure. In the case of hemoglobin this will make 4 domains, while for pyruvate kinase there will be 12 protein domains in the functional unit. The domains in pyruvate kinase are well separated from each other. The top domain on the figure below is built up by β-sheets, while the other two domains contain a mixture of helices and strands. For illustration, the figure below shows the quaternary structure of hemoglobin (left), and pyruvate kinase (right):

haemoglobin oligomer
pyrovate-kinase oligomer

In pyruvate kinase the domains are well separated from each other, but in many cases it may be difficult to separate them visually without prior knowledge. As an example, performing a search with PDB ID 1E0T would return the following result for the 3 domains:

CATH search

1e0tA01 corresponds to chain A (there are 4 chains – one for each subunit) and domain number 01. As mentioned above, there are in total 3 domains in each chain: 01, 02 and 03. If we click on one of the IDs, for example the first one for domain 1, we get information about its classification (image below) - Class: Alpha Βeta, Architecture: 2-Layer Sandwich, Topology: Pyruvate Kinase. This information is highly valuable in homology modeling, especially in cases when we need to model different domains using different modeling templates, the so called multi-template homology modeling.

Stacks Image 1942

In the next section we will make a short overview of some protein databases, which we are going to use later in the homology modeling project.