Chapter I – Amino Acids & Sequence Alignment

20 Common Amino Acids: Structure, Classification & Properties

We analyze the twenty most common amino acids, their structure, classification, and properties, focusing on their roles in protein structure and function.

Overview
Each of the 20 most common amino acids has distinct chemical properties and a specific role in protein structure and function. Depending on the tendency of the side chains to interact with water (polar environment), amino acids can be divided into three categories:

  • Those with charged side chains.
  • Those with polar side chains.
  • Those with hydrophobic side chains.

Polar Amino Acids

When considering the polarity of amino acids, some are distinctly classified as polar, while others may lead to differing opinions. For instance, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are regarded as polar due to the presence of a hydroxylic (-OH) group. This group can engage in hydrogen bonding with other polar groups by either donating or accepting a proton. A table illustrating the donors and acceptors in polar and charged amino acids is available on the FoldIt site.

Many amino acids are often found at functional sites. Thus, tyrosine is involved in metal binding in many enzymes. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to the group of polar amino acids and can donate or accept a hydrogen bond. Histidine (His, H), on the other hand, depending on the environment and pH, can be either polar or charged. It has two –NH groups with a pKa value of around 6. When both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa can be modulated by the environment, allowing the side chain to donate a proton and become neutral or accept a proton and become charged. This ability makes histidine valuable in the active sites of enzymes when the chemical reaction requires proton extraction or donation.

The aromatic amino acids tryptophan (Trp, W) and tyrosine (Tyr, Y), and the non-aromatic methionine (Met, M) are sometimes referred to as amphipathic due to their capacity to exhibit both polar and nonpolar characteristics. These residues are frequently located near the interface between a protein and solvent. It is important to note that the side chains of histidine and tyrosine, along with the hydrophobic phenylalanine and tryptophan, can also form weak hydrogen bonds of the types OH−π and CH−O, utilizing the electron clouds within their ring structures. For a discussion of OH−π and CH−O kinds of hydrogen bonds, see Scheiner et al., 2002. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other. They are also highly conserved within protein families, with Trp having the highest conservation rate.

Gene to structure services by SARomics Biostructures

X-ray crystallography services

Hydrophobic Amino Acids

The hydrophobic amino acids include alanine (Ala, A), valine (Val, V), leucine (Leu, L), isoleucine (Ile, I), proline (Pro, P), phenylalanine (Phe, F), and cysteine (Cys, C). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and engage in van der Waals interactions, which are essential for stabilizing the globular structure of macromolecules. Additionally, Cys residues contribute to three-dimensional structure stabilization through the formation of disulfide (S-S) bridges, which may connect different secondary structure elements or different subunits in a complex. Another critical function of Cys is metal binding, sometimes occurring in enzyme active sites and other times within structure-stabilizing metal centers. It should be noted that regarding Cys, some disagreement arises about its classification within the hydrophobic group. For example, some studies indicate that it is found in the interior due to its high reactivity. Thus, Fiser et al., 1996, in their analysis of amino acid distribution in proteins, write, “Cys is the most highly conserved residue in this data set and the rarest on the surface, probably because it has the most reactive side chain.”

The Image below shows the distribution of buried/exposed fractions of amino acids in protein molecules, suggesting that Cys is rather exposed.

exposed and buried amino acids in protein structures

Buried/exposed fraction of amino acids within protein molecules.
The vertical axis shows the fraction of highly buried residues, while the horizontal axis shows the amino acid names in one-letter code. Image from the 1996 tutorial by J.E. Wampler,
Fiser et al., 1996, discuss the relationship between surface accessibility and the conservation of amino acids.

Charged Amino Acids

The charged amino acids at neutral pH (around 7.4) each carry a single charge in their side chains. There are four of these amino acids: the two basic ones, lysine (Lys, K) and arginine (Arg, R), carry a positive charge at neutral pH. The two acidic residues, aspartate (Asp, D) and glutamate (Glu, E), carry a negative charge at neutral pH. A salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges often play a role in stabilizing the three-dimensional structures of proteins, especially in thermophilic organisms (microorganisms that live at elevated temperatures, reaching up to 80-90 °C or even higher). The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of Asp and Glu. Metalloproteins and the role of metal centers in protein function represent a fascinating field of structural biology research.

Glycine & Proline

Glycine (Gly, G), one of the standard amino acids, lacks a side chain and is often located on the surface of proteins in loop or coil regions (areas without defined secondary structure), offering high flexibility to the polypeptide chain. This flexibility is essential for sharp polypeptide turns in loop structures. Proline, though considered hydrophobic, is also found on the surface, likely due to its presence in turn and loop regions. Unlike Gly, which grants the polypeptide chain significant flexibility, Pro provides rigidity by imposing specific torsion angles on this segment of the structure (Morgan & Rubenstein, 2013). This rigidity arises from its side chain forming a covalent bond with the main chain, which constrains the Phi-angle of the polypeptide at this position (see the section of the Ramachandran plot). Pro is sometimes referred to as a helix breaker since it is often located at the end of α-helices. The roles of Gly and Pro in protein folding have also been discussed by (Krieger et al., 2005).

The Three-Letter and One-Letter Codes for Common Amino Acids

Charged amino acids (side chains often form salt bridges), in BOLD mnemonic rules to remember the one-letter code:

  • Arginine – Arg – R
  • Lysine – Lys – K
  • Aspartic acid – Asp – D (AsparDic acid – hard t sound like D)
  • Glutamic acid – Glu – E (GluEtamic acid – feels like it is almost an E there)

Polar amino acids (form hydrogen bonds as proton donors or acceptors):

  • Glutamine – Gln – Q (Qlutamine – soft G)
  • Asparagine – Asn – N (AsparagiNe – clear N)
  • Histidine – His – H
  • Serine – Ser – S
  • Threonine – Thr – T
  • Tyrosine – Tyr – Y
  • Cysteine – Cys – C

Amphipathic amino acids (often found at the surface of proteins or lipid membranes, sometimes also classified as polar):

  • Tryptophan – Trp – W (the largest side chain and the largest letter)
  • Tyrosine – Tyr – Y
  • Methionine – Met – M (may function as a ligand to metal ions)

Hydrophobic amino acids are generally buried inside the protein core):

  • Alanine – Ala – A
  • Isoleucine – Ile – I
  • Leucine – Leu – L
  • Methionine – Met – M
  • Phenylalanine – Phe – F
  • Valine – Val – V
  • Proline – Pro – P
  • Glycine – Gly – G

Water & Hydrogen Bonds

A few words about hydrogen bonds: for a hydrogen bond to form, two electronegative atoms (for example, in the case of an α-helix, the amide nitrogen and the carbonyl oxygen) must interact with the same hydrogen. The hydrogen is covalently attached to one of the atoms (called the hydrogen bond donor) and interacts with the other (the hydrogen bond acceptor). In proteins, all groups capable of forming H-bonds (both main chain and side chain) are usually hydrogen-bonded to each other or water molecules (Derewenda et al., 1995). Due to their electronic structure, water molecules can accept two hydrogen bonds and donate two, thus simultaneously engaging in four hydrogen bonds. Water molecules can also stabilize protein structures by forming hydrogen bonds with main chain or side chain groups in proteins and even mediate the interactions of different protein groups by linking them to each other through hydrogen bonds. Additionally, water is often involved in ligand binding to proteins, mediating ligand interactions with polar or charged side chains or main chain atoms. It’s important to remember that the energy of a hydrogen bond depends on the distance between the donor and the acceptor as well as the angle between them, typically ranging from 2 to 10 kcal/mol. The work by Hubbard & Haider, 2010 can be recommended for more detailed information on hydrogen bonds.