A new method for x-ray crystallography expands the technique to identify the structures of molecules that don't easily form large, symmetrical nanocrystals.

A new method for x-ray crystallography expands the technique to identify the structures of molecules that don’t easily form large, symmetrical nanocrystals.

Single-crystal X-ray crystallography is a proven tool for identifying the architecture of unknown molecular structures: by firing an X-ray beam at a crystallized sample of a material and recording how it diffracts, the structure of the material can be deduced. But the limitations of crystallography are right there in the name: what’s to be done with a material that doesn’t form crystals that are large enough to study, or symmetrical enough, or tend to break down in the face of radiation?

Supported by high-performance computing resources at the National Energy Research Scientific Computing Center (NERSC), scientists at Lawrence Berkeley National Laboratory (Berkeley Lab) have debuted a new form of X-ray crystallography that gets around some of these limits, broadening the range of materials and processes that can be investigated using crystallography. The team’s results were published in January 2022 in the journal Nature.

“We work with materials that only form nanocrystals, so it’s very hard to grow them any bigger than about five microns — and to do single-crystal x-ray crystallography, that’s just too small,” said University of Connecticut graduate student researcher Elyse Schriber, a primary author on the paper. “Because the size of the crystal is directly proportional to the amount of signal you get on the detector, if you have a really small crystal and you don’t have a very bright light source, you’re kind of stuck.”

Or, you were stuck. Small-molecule serial femtosecond X-ray crystallography (smSFX), originated by Schriber and her team, uses short, intense bursts from a free electron X-ray laser (XFEL) to determine the structures of nanocrystals not well-suited to single-crystal X-ray crystallography: molecules that are very small, don’t crystallize easily, and are sensitive to radiation.

Collecting data in a flash

smSFX begins with an XFEL, a specialized tool that’s essentially a cross between an x-ray microscope and a laser. For one femtosecond — 1/1,000,000,000,000,000, or one quadrillionth, of a second — the XFEL fires a beam of X-ray light at a sample of crystals suspended in a liquid. The blast of energy is so bright and so brief that, although radiation from the beam will destroy the sample, scientists are able to capture the diffraction pattern, which shows up as a pattern of dark spots on an image called a diffractogram, while it’s still intact.

“Radiation will tend to damage crystals, especially crystals of interest to material scientists. But if you can deliver the photons fast enough, they’ll diffract before the damage starts showing,” said Berkeley Lab research scientist Aaron Brewster, a member of the Molecular Biophysics and Integrated Bioimaging (MBIB) Division and an author on the paper. This phenomenon is called “diffraction before destruction.”

In addition to diffraction before destruction, the XFEL beam is so narrow — about one micron, or 1/1,000th of a millimeter — that it’s able to capture separate diffraction patterns for individual crystals in a sample rather than the jumble of overlapping patterns that would come from exposing multiple crystals in one sample to a wider beam, a process known as powder diffraction. The more focused beam captures more detail and leads to less noise in the data, and to more precise understanding of the molecule’s structure.

“They’ve been doing this for 100 years. You can put a powder in an X-ray beam, but then you get rings instead of spots,” said MBIB senior computer scientist Nick Sauter, a co-author on the paper. “And the rings are actually overlapping each other, and for a complex molecular structure, you can’t disentangle that. But it is possible to disentangle it if you do it one nanocrystal at a time, which was what we were able to do, and there’s a fairly simple mathematical framework for analyzing that.”

The framework in question turned out to be an algorithm developed by Brewster in 2014 to analyze amyloid peptides. The unit cells, or repeating crystal structures, found in these peptides echo those of small molecules in that they may consist of just a few atoms, and therefore yield similarly sparse diffraction patterns that are harder to use for deducing molecular structure. Brewster’s algorithm used graph theory to help fill in the gaps — and by applying it to small-molecule samples, the team found that they had just the tool they needed.

Collaborating for speed and capacity

The initial experimental attempts using smSFX confirmed the previously known structure of mithrene, or silver benzeneselenolate — and then went even further, bringing the unknown structures of thiorene and tethrene to light as well. Data collection took place at the Linac Coherent Light Source (LCLS) at the Stanford Linear Accelerator (SLAC), with data transferred automatically and in real time to the Cori XC40 supercomputer at NERSC using ESnet, the U.S. Department of Energy’s dedicated ultra-high-speed network for science.

“Using ESnet to connect LCLS and NERSC and help enable new scientific discoveries illustrates the amazing capabilities of the national​ laboratory facilities,” said Eli Dart, acting group lead for ESnet’s Science Engagement Team. “We all know the old saying ‘the whole is greater than the sum of its parts,’ but this is a real-world example that shows the power of the superfacility model.”

This collaboration, a demonstration of the superfacility model that seamlessly integrates high-performance computing resources like those at NERSC with experimental facilities by connecting them using high-performance networks, yielded initial analysis in as little as ten minutes — a record for XFEL experiments. Determining a molecular structure from raw experimental data can take much longer due to the large amount of computing power required.

Schriber runs the accelerator

Elyse Schiber runs the accelerator while her team looks on.

“We’ve been collaborating with NERSC for years,” said Brewster, “and that collaboration has allowed us to transfer the data from LCLS to NERSC and to process it on the Cori supercomputer, so we can determine the structure in real time. In fact, one of the three samples that are in the paper, we actually solved live during one of the beam times. And it was because of this collaboration that we were able to process the data that quickly and get an initial structure — and it was really proof that it was working.”

Rapid analysis makes it possible for scientists to confirm mid-experiment that the data they’re collecting is valid and not corrupted, and to fine-tune their experimental methods if necessary — and to use precious beam time efficiently.

“We want to be able to get immediate feedback because if we’re not solving the atomic structure, we want to change the experiment — we want to change the rate at which the crystals are flowing or something,” said Sauter. “So we want feedback in 10 minutes. And we have to have a facility that has large enough computing available to be able to handle that.”

In addition to speed of analysis, this superfacility collaboration offers something that will only become more essential as detectors such as XFELs become more sensitive: capacity to store and process massive, and ever-increasing, amounts of data.

“It’s amazing to have NERSC’s capability, because for the average XFEL experiment we collect about eight to ten terabytes of data, and that’s a lot to store,” said Schriber. “LCLS does have storage facilities….but NERSC is kind of unparalleled in the computing power you get.”

And the volume is only going to grow: in the next few years, improvements in the fields of accelerator technology and detector resolution are expected to increase the amount of data collected by XFEL detectors by a factor of 400. However, comparable advances in computing capabilities will be able to address the expected tidal wave of data. Exascale computers able to perform one quintillion (1,000,000,000,000,000,000) calculations per second are expected to be online by 2022.

“What’s unique to XFELs is the increasing intensity of their beams; [in the future] these pulses we’ve described will be delivered at a much higher rate,” said NERSC application performance specialist Johannes Blaschke, another co-author on the paper. “Some of these experiments were done at 120 Hz, which was already such a high rate that the local computer infrastructure was unable to keep up with the analysis. And by 2025, they expect to be using kilohertz data collection, so you can imagine that’s double the detector resolution, and I’m increasing the rate at which I’m collecting images by a factor of ten — so then you just get this massive amount of data.”

A bright future

Now that smSFX works, what happens next?

“Right now we’re reaching out to other fields and trying to find other kinds of samples that are interesting. We’re kind of trying to help people start to consider this as something they could do regularly at XFELs,” said Brewster. Currently, he and his team are writing rapid access proposals: short periods of beam time allowing collaborators to test whether smSFX might be appropriate for their own work.

Some scientists are already using the technique in other experiments; Sauter worked with another team to take advantage of the room-temperature and -pressure capabilities of LCLS and used it to study the mechanism by which plants perform photosynthesis. By taking tiny snapshots of the water network at various points during the reaction, the team was able to see how water molecules moved and changed and observe changes in hydrogen-bonding networks in proteins.

“When you understand the protein so well that you start asking questions about how it would catalyze a chemical reaction, and how that happens over picoseconds and microseconds and seconds, that’s where these light sources and this type of data analysis come into play,” he said.

In the meantime, he and other scientists, including Dan Paley, a project scientist at Berkeley Lab who was critical for the chemical structures solved above, are working to streamline and improve the technique. “We’re trying to get higher-resolution data — essentially, we’re trying to match what you could do in a single crystal area,” said Brewster. “And also speed: we’re trying to get full data sets in under an hour.”

Someday, the algorithms used to read the diffractograms and piece together the molecules’ structures may also be enhanced by artificial intelligence. “Instead of trying to brute-force it or to use the algorithms that exist, we’re going to try and teach a computer to read these kinds of powder tracings,” said Brewster. “Additionally, we want to do things long-term with structure prediction, based on function.”

Going forward, smSFX may be a powerful technique for investigating previously invisible molecular structures and processes. But the journey is just beginning.

“We’re just trying to push the method and see how far we can actually take it,” said Schriber. “We believe that anything you do at a synchrotron, you could probably do at an XFEL. It’s just developing the right tools and methods to make it happen.”

For related information about this research, see this Berkeley Lab news release: Crystallography for the Misfit Crystals.


About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab's Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.