X-ray Crystallography & Protein Structure Determination

Historical perspective
Although X-ray crystallography found its way to the study of the atomic and molecular structure of matter, the world had to wait for additional 45 years before the first X-ray structure of a protein was determined by crystallography. This was the structure of myoglobin, which gave the authors, Max Perutz and John Kendrew the Chemistry Nobel Prize in 1962. Since then, many other protein X-ray structures have been awarded the Nobel Prize. Probably the most spectacular was the structure of the ribosome, for which Ada Yonath, Venki Ramakrishnan & Thomas A. Steitz, received the Nobel Prize in Chemistry in 2009.

The structural revolution of the 1990ties and around 2000 resulted in an explosion in the number of X-ray structures in the Protein Data Bank. The growth of PDB content has been impressive, with the current number of structures approaching 200 000. One of the direct consequences of this growth was the creation of AlphaFold, an AI system developed by DeepMind to predict protein structures from their amino acid sequence. The latest release of the database includes more than 200 million predicted structures for nearly all cataloged proteins known to science. This will undoubtedly boost structural biology to unseen levels and dramatically increase our understanding of biological processes. This also means that knowledge of protein structures and the ability to use and analyze structural information in everyday work in life sciences becomes even more important!

An outline of X-ray crystallography

X-ray diffraction
Max von Laue demonstrated that X-rays are electromagnetic waves, having the same nature as visible light or radio waves. The only difference is the very short wavelength of X-rays, which is around 1-1.5 Å (1 Ångström is 10-10 meters). For comparison, the wavelength of visible light is between 400 to 700 nm (one nm is 10 Å).

X-ray diffraction is caused by the interaction of electromagnetic waves with the atoms inside the crystals, particularly with the electrons. The waves get scattered by the electrons; each electron becomes a miniature X-ray source. Scattered waves from all the electrons within each atom are added to each other, giving diffracted waves from each atom, etc. When the scattered waves are added, they may either get stronger or cancel each other out (in optics, this process is called interference). As shown in the figure below, the X-ray detector registers those that get stronger. Interestingly, we do not necessarily need X-rays to observe interference; we can, for example, go to a lake nearby, through two stones into the water, and then watch how the waves from the two rocks either reinforce each other or become weaker. There are many demonstrations of wave addition on the web; one can be found here.

X-ray data collection, electron density calculation, and model building
X-rays may be generated using various laboratory X-ray sources or at synchrotrons, where very high intensity and highly focused X-rays can be generated. Several synchrotrons worldwide have stations adapted for collecting X-ray data from protein crystals. Depending on the type of crystal (crystals may have different cell dimensions and symmetry groups), different strategies for data collection are followed. Usually, the crystal is rotated in the X-ray beam one degree at a time and exposed to X-rays for a short period (seconds or even less) until a complete data set is collected. The total data collection time depends on the intensity of the X-ray source, the size of the crystal, and how well it diffracts. The data are subsequently processed using specific software (a process in crystallography called "data reduction").

Each spot on the image below is a diffracted X-ray beam, which emerged from the crystal and was registered by the X-ray detector. Thousands of diffraction spots must be collected from a protein crystal to get a complete data set. The intensities of the diffraction spots are extracted during data processing and used to calculate an electron density map of the molecules inside the crystal. The electron density, in turn, will tell us where the atoms are located, information that is used to build a model of the molecule. The image on the right shows a side chain of tryptophan built into its electron density.
diffraction by a protein crystal

X-ray diffraction by a protein crystal

electro density of triptophane side chain

A side chain of Trp is built into a high resolution electron density map.

What will happen in the AlphaFold age?

I guess many old biochemical data will see new interpretations, and many oligomeric complexes will be easier to reconstruct from SAXS and EM data. Shortly, many discoveries are to be made by studying this massive volume of new structures. Does this mean the end of X-ray crystallography? The answer is not yet; many applications, such as drug discovery, require much higher structure accuracy than what AlfaFold can currently provide. For instance, AlfaFold cannot provide accurate predictions on protein-ligand interactions needed in drug discovery projects. I also guess that the dynamic behavior of proteins and protein complexes still needs experimental studies.

In any case, at this very moment in the Autumn of 2022, we are entering a new era of biological research. And I think that structural biology is preparing many exciting surprises for us in the coming years!