Introduction to Protein Structure: Structural Levels, Domains, Motifs and Folds

Overview
To understand the basic principles of protein three-dimensional structure and the potential of their use in various applications in academia and industry, we first need to look at the big picture by defining the four levels of protein structure. The structural levels mutually depend on each other and together create a complex network of interactions between hundreds and thousands of protein atoms, often involving solvent molecules, various ligands and metal atoms.

Primary, secondary, and tertiary structure

The first basic level is the amino acid sequence. The 20 most common amino acids found in proteins are joined together into a polypeptide chain during the process of protein synthesis, catalyzed by the ribosome. The amino acid sequence to a large extent defines the secondary (α-helices and β-sheets) and tertiary structure of proteins, although in many cases we also need to consider the effect of the local environment on structure stabilization. The most obvious example are membrane proteins, a considerable part of which is embedded into the hydrophobic environment of lipid membranes. Outside the membrane these proteins loose their native structure and form large aggregates. For this reason, during purification we need detergent to keep these proteins in a soluble state.

The arrangement of the secondary structure elements in space defines the tertiary structure. The tertiary structure is characterized by a specific fold. The currently known three-dimensional protein structures have been classified into around 1300 different unique folds. In a later chapter we will discuss some examples of these folds and the CATH database where folds and domains are classified, and the relationships between fold and protein evolutionary origin are revealed.

Domains, folds & motifs

A domain is defined as an independent folding unit because often it may be cloned, expressed and purified separately from other domains of a multi-domain protein, and it will still form the same type of structure and could even show activity (e.g. small molecule ligand binding, interactions with other proteins, etc.) similar to that the domain has within the original protein. A fold is assigned to each protein domain. While some proteins consist of a single domain, others may contain two or more domains with different folds. At the sequence level domains that have similar fold may or may not be related to each other functionally and evolutionary, however, a fold can still be used to trace the evolutionary origin of a protein. This also means that similarity of folds cannot always be detected by just comparing their amino acid sequences. More on this can be found in the presentation of CATH domain classification database.

In addition to domain conservation, there are also other types of conserved structural elements in proteins, called structural motifs. These are smaller structural units which may be present within different and not necessarily evolutionary related domains. Examples of such motifs include helix-turn-helix motif, β-hairpins, Greek key motif, etc. These motifs are not considered to be independent folding units, rather, they are usually part of the three-dimensional structure. In a page on domains, fold and motifs more detailed discussion and examples can be found.

Quaternary structure and oligomers

The oligomeric structure of Mg-chelatase

The next structural level is the quaternary structure (also called oligomeric structure). These structures are built up by two or more polypeptide chains (subunits). An oligomer may consist of the same type of subunits (homo-oligomer) or built up by different protein molecules (hetero-oligomer). An oligomer is stabilized by interactions between its subunits, such as hydrophobic interactions, hydrogen bonds, salt bridges, etc. In cases of multi-subunit enzymes, the subunits within the structure often contribute to the formation of the active site or other types of ligand binding sites. An oligomeric complex may also interact with other proteins, forming a so-called transient complex

Oligomeric structures perform a wide range of functions in cells and often work as molecular machines which use ATP as a source of energy. In the image on the left is an example of the oligomeric structure of the enzyme Mg-chelatase.

This enzymes is involved in the biosynthesis of chlorophyll. It catalyses the insertion of magnesium into protoporphyrin IX, which is the first committed step in chlorophyll biosynthesis. About 20 more catalytic reactions are required before the chlorophyll molecule is synthesized.

Mg-chelatase has 3 different subunits. In bacteriochlorophyll synthesis they are named BchI, BchD and BchH. BchI and BchD build a large (about 600 kDa) 2-ring complex shown on the image above. The bottom rings is built up by BchI and the top ring by BchD. BchI belongs to the so-called AAA+ family of ATPases. It uses the energy of ATP to drive the Mg-chelatase reaction. It is an example of a molecular machine. In a later section I will discuss some examples of molecular machines.

Conservation of sequence & structure

Since large variations in the amino acid sequence within a protein family may still yield very similar three-dimensional structures, we say that structure has a higher degree of conservation than sequence. This can be understood if we consider function – for example binding of a certain ligand, specificity of interactions with other proteins, structural dynamics, etc. – they all depend on the three-dimensional structure. Therefore, the determination of a protein structure of unknown function and its subsequent comparison to other known structures in a database may help in revealing structural homologues and ultimately its function. The principle of structure conservation also allows us to rely on modelling (which we will discuss in a later chapter) to obtain reliable predictions of structures of proteins for which no experimental three-dimensional structure has been determined.