In the previous section, I discussed the definition of a domain and some examples of
domains and folds. Here we continue the discussion about fold classification. A detailed analysis of the fold of a protein can provide deeper insights into its function and evolutionary history, which may often be difficult to achieve based on amino acid sequence analysis alone. Studying the relationships between the amino acid sequence and the fold may provide insights into the principles of protein structure and function. In addition, it may aid in the design of new proteins with predefined structure and activity.
The first step in classification is defining the secondary structure of the protein or protein domains. This is routinely done when a new structure is deposited with the Protein Data Bank (PDB) and by graphics programs we can use to visualize protein structures. All PDB entries contain a detailed description of the secondary structure of the protein, including the name of the first and last amino acid residues of each helix and β-strand. The second requirement of classification is the definition of the domains in a protein. A domain is the primary "unit" of classification. As mentioned, the same domain type may be found in many unrelated proteins, such as the nucleotide-binding Rossmann fold domain. For this reason, it is not meant to base the classification on the whole protein if it contains more than one domain.
The PDB coordinate file does not provide information on the domain content of a protein, but it usually provides links to databases where this information can be found.
CATH (C-class, A-Architecture, T-Topology, H-Homologous superfamily) is the primary database on domains and domain classification. CATH gives information on the domain content of each protein and a detailed description of each domain. The assignment procedure includes:
•
Assignment of a Class to each domain (essentially refers to the secondary structure content - alpha, beta, or alpha/beta proteins)
•
Assignment of Architecture (the arrangement of secondary structure elements in space, irrespective of connectivity). The amino acid sequences within a particular architecture class are not necessarily homologous - common evolutionary origin is not required.
•
Assignment of Topology. Topology is what I was referring to fold. Here, connectivity between secondary structure elements is considered. Proteins with a similar fold do not need to have a common evolutionary origin.
•
Assignment of Homologous superfamily. A homologous superfamily defines a group of proteins that appear to be homologous (have common evolutionary origin), even without significant sequence similarity.
We will look at some examples of CATH classification to explain these definitions.
More about domains - multidomain proteinsAs noted earlier, some proteins contain a single domain, while others may have several domains. Below are examples of a one-domain (hemoglobin 1IT2, on the left) and a 3-domain protein (pyruvate kinase 1PKN, on the right):