The big picture
To understand the basic principles of protein three-dimensional structure and the potential of their use in various applications in academia and industry, we first need to look at the big picture by defining the four levels of protein structure. The different structural levels are mutually dependent on each other, together creating an extremely complex network of interactions between hundreds and thousands of protein atoms, solvent molecules and often various ligands and metal atoms.
Primary, secondary and tertiary structure
The first basic level is the amino acid sequence. The 20 most common amino acids found in proteins are joined together into a polypeptide chain during the process of protein synthesis, the formation of the peptide bond catalyzed by the ribosome. The amino acid sequence to a large extent defines the secondary (α-helices and β-sheets) and tertiary structure of proteins, although in many cases we also need to consider the effect of the local environment on structure stabilization. The most obvious example are membrane proteins, a considerable part of which is embedded into the hydrophobic environment of lipid membranes. Outside the membrane these proteins lose their native structure and form large aggregate. For this reason, during purification we need detergent to keep these proteins in a soluble state.
The arrangement of the secondary structure elements in space defines the tertiary structure. The tertiary structure is characterized by a specific fold. The currently known three-dimensional protein structures have been classified into around 1300 different unique folds. In a later chapter we will discuss some examples of these folds and the databases where we can find information on fold classification, protein evolutionary origin, on the presence of conserved sequence motifs, ligand binding sites, etc.
Motifs, folds and domains
A fold is assigned to each domain in a protein. While some proteins consist of a single domain, others may contain two or more domains. A domain is defined as an independent folding unit because often it may be cloned, expressed and purified separately from other domains of a multi-domain protein, and it will still form the same type of structure and show the same activity (e.g. small molecule ligand binding, interaction with other proteins, etc.) it normally shows within the original protein. At the sequence level domains of the same fold may or may not be related to each other functionally and evolutionary, however, a fold can still be used to trace the evolutionary origin of a protein. This also means that similarity of folds cannot always be detected by just comparing their amino acid sequences.
In addition to domain conservation, there are also other types of conserved structural elements in proteins, called structural motifs. These are smaller structural units which may be present within different and not necessarily evolutionary related domains. Examples of such motifs include helix-turn-helix motif, β-hairpins, Greek key motif, etc. These motifs are not considered to be independent folding units, rather, they are usually part of the three-dimensional structure.
Quaternary structure and oligomers
The next structural level is the quaternary structure (also called oligomeric structure). These structures are built up by two or more polypeptide chains (subunits). An oligomer may consist of the same type of subunits (homo-oligomer) or built up by different protein molecules (hetero-oligomer).
The image shows the complex of two of the subunits of the enzyme magnesium chelatase (for details see text). The structure was obtained using single-particle reconstruction from cryo-electron microscopic (cryo EM) images of the complex. Where appropriate, the available X-ray structure of subunit BchI of the enzyme was docked into the EM density (shown in ribbon representation). Other domains where homology-modeled based on known structures from other proteins. Later, in the sequence alignment and homology modelling tutorials, we will analyze the individual subunits of the enzyme complex.
The model shown on the image was published in Lunqvist et al, Structure 2010.
A book by Anders Liljas and co-authors on structural biology is highly recommended if you wish to learn much more on this subject.
Jöns Jacob Berzelius (b. 1779), probably one of the most famous Swedish scientist, coined the word ”protein”
An oligomer is stabilized by interactions between its subunits, such as hydrophobic interactions, hydrogen bonds, salt bridges, etc. In cases of multi-subunit enzymes the subunits within the structure often contribute to the formation of the active site or other types of ligand binding sites. An oligomeric complex may also interact with other proteins, forming so-called transient complexes. Molecular machines often function as oligomeric structures and form transient complexes with other proteins. One such example is the class of AAA+ ATPases, known to be involved in many cellular processes. The image above, showing an oligomeric complex of two subunits of the protein magnesium chelatase (Mg-chelatase), provides an example of subunit arrangement in the AAA+ family of proteins. In this case two rings, each containing six molecules of each subunit of the enzyme, are packed against each other in a 600 kDa oligomeric complex. In a later chapter we will have a closer look at the sequence and structure of the subunits of this exciting enzyme.
Conservation of sequence and structure
Since large variations in the sequence within a protein family may still yield very similar three-dimensional structures, we say that structure has a higher degree of conservation than sequence. This can be understood if we take into account function – for example binding of a certain ligand, specificity of interactions with other proteins, structural dynamics, etc. – they all depend on the three-dimensional structure. This is why the determination of the structure of a protein of unknown function may help in revealing its function. The principle of structure conservation also allows us to rely on the method of homology modelling (which we will discuss in a later chapter) to obtain reliable predictions of structures of proteins for which no experimental three-dimensional structure have been determined.
An interesting example of structure conservation was provided by the anaerobic cobalt chelatase, an enzyme active in vitamin B12 synthesis (Schubert et al., 1999). Although the function of the protein was already known, only the determination of its three-dimensional structure revealed its similarity to the structure of ferrochelatase (Al-Karadaghi et al., 1997), an enzyme active in heme biosynthesis. The reason is that the sequence identity between the two proteins is only 11%, a number much smaller than the so-called “homology-threshold" (around 20-25%, as will be discussed later), normally considered to be an indicator of common evolutionary origin of proteins. The similarity between the two structures strongly suggested common evolutionary origin of the two proteins and similarity of the mechanism of the enzymatic reaction.