The big picture
To understand the basic principles of protein three-dimensional structure and the potential of their use in various applications in academia or industry, we first need to look at the big picture by defining the four levels of protein structure. The different structural levels are mutually dependent on each other, together creating an extremely complex network of interactions between hundreds and thousands of protein atoms, solvent molecules and often various ligands and metal atoms. The first most basic level is considered to be the amino acid sequence. The 20 most common amino acids found in proteins are joined together into a polypeptide chain during the process of protein synthesis, which is essentially the formation of the peptide bond catalyzed by the ribosome. The amino acid sequence to a large extent defines the secondary (α-helices and β-sheets) and tertiary structure of proteins, although we also need to consider the effect of the local environment on structure stabilization. The most obvious example are membrane proteins, for which a large proportion of the molecule is embedded into the hydrophobic environment of lipid membranes. Outside the membrane these proteins lose their native structure and form large aggregate. For this reason, during purification we need detergent to keep these proteins soluble, otherwise they will aggregate and precipitate.
The arrangement of secondary structure elements in space define the tertiary structure. The tertiary structure is characterized by a specific fold. The currently known three-dimensional protein structures have been classified into around 1300 different unique folds. In the corresponding chapter we will discuss some examples of these folds and the databases where we can find the information on sequence and structure, evolutionary origin, presence of conserved sequence motifs, ligand binding sites, etc.
A fold is usually assigned to a protein domain. A domain is an independent folding unit of a protein. It is called “independent” because often it may be cloned, expressed and purified separately from other domains of a multi-domain protein, and it will still form the same type of structure and show the same activity (e.g. small molecule ligand binding, interaction with other proteins, etc) it normally shows within the original protein. Domains of the same fold may or may not be related to each other functionally or evolutionary. While some proteins consist of a single domain, others may contain two or more domains.
Although a conserved fold is a general characteristic of a domain, there are other types of conserved structural elements called structural motifs. These are smaller structural units which may be present within different and not necessarily evolutionary related domains. Examples of such units include helix-turn-helix motif, β-hairpins, greek key motif, etc. These motifs are not considered to be a separate structural level in proteins.
The next structural level is the quaternary structure (this type of structures are often called oligomeric structures). Such structures are built up by two or more polypeptide chains (subunits). An oligomer may consist of the same type of subunits (homo-oligomer) or built up by different protein molecules (hetero-oligomer).
The image shows the complex of two of the subunits of the enzyme magnesium chelatase (for details see the text). The structure was obtained using single-particle reconstruction from cryo-electron microscopic (cryo EM) images of the complex. Where appropriate, the available X-ray structure of subunit BchI of the enzyme was docked into the EM density (shown in ribbon representation). Other domains where homology-modeled based on known structures from other proteins. Published in Lundqvist et al, Structure 2010. Later, in the sequence alignment and homology modelling tutorials, we will work with the individual subunits of the enzyme complex.
Published in Lunqvist et al, Structure 2010.
A book by Anders Liljas and co-authors on structural biology is highly recommended if you wish to learn much more on this subject.
Jöns Jacob Berzelius (b. 1779), probably one of the most famous Swedish scientist, coined the word ”protein”
An oligomer is stabilized by interactions between its subunits, such as hydrophobic interactions, hydrogen bonds, salt bridges, etc. The subunits within an oligomeric structure may contribute to an active site (or sites) or a ligand binding site, and may also interact with other proteins, forming a so-called transient complex. Molecular machines often function as oligomeric structures. One such example is the class of AAA+ ATPases, known to be involved in a large number of cellular processes. The image above shows an oligomeric complex of the protein magnesium chelatase (Mg-chelatase), a member of the AAA+ family. In this complex two rings, each containing six subunits, are packed against each other in a 600 kDa oligomeric complex. In a later chapter we will have a closer look at the sequence and structure of the subunits of this exciting enzyme.
Since large variations in the sequence within a protein family may still yield very similar three-dimensional structures, we say that structure has a higher degree of conservation than sequence (see discussion of sequences and sequence alignment). This can be understood if we take into account function – for example binding of a certain ligand, specificity of interactions with other proteins, structural dynamics, etc. – they all depend on the three-dimensional structure. This is why the determination of the structure of a protein of unknown function may help in revealing its function. An interesting example of structure conservation was provided by the anaerobic cobalt chelatase, an enzyme active in vitamin B12 synthesis (Schubert et al., 1999). Although the function of the protein was already known in this case, only the determination of its three-dimensional structure revealed the similarity to the structure of ferrochelatase (Al-Karadaghi et al., 1997), an enzyme active in heme biosynthesis. The reason is that the sequence identity between the two proteins is only 11%, a number much smaller than the so-called “homology-threshold" (around 20-25%, as will be discussed in the sequence alignment section), normally considered to be an indicator of common evolutionary origin of proteins.