As people around the world marveled in July at the most detailed images of the world snapped by the James Webb Space Telescope, biologists got their first glimpse of a variety of image sets that could help revolutionize the lives of scientists.
The predicted images are 3-D shapes of more than 200 million proteins, created by an artificial intelligence system called AlphaFold. “You can think of it as covering the entire protein universe,” Demis Hassabis said at a news briefing on July 26. Hassabis is the founder and CEO of DeepMind, the London-based company that created the system. Combining several well-known techniques, computer software is designed to predict protein shapes by identifying patterns in structures that have already been resolved through decades of experimental work using electron microscopy and other methods.
The first AI funding entered in 2021, with predictions of 350,000 protein structures — including nearly all known human proteins. DeepMind has partnered with the European Bioinformatics Institute’s European Molecular Biology Laboratory to make the structures available in a publicly available database.
July’s new mission published a huge library of “virtually every organism on the planet that has had its genome sequenced,” Hassabis said. “You can look up a single 3-D structure of a protein almost as easily as doing a Google search.”
These are predictions, not the structures themselves. However, researchers have used some of the 2021 predictions to develop new malaria vaccines, to better understand Parkinson’s disease, to work to preserve the health of honey, to gain insight into human evolution and more. DeepMind has also moved AlphaFold into neglected tropical diseases, including Chagas disease and leishmaniasis, which can be debilitating or fatal if left untreated.
The release of the massive dataset has been greeted with excitement by many scientists. However, other researchers are concerned that the predicted structures may take the true forms of proteins. Still, the thing AlphaFold couldn’t do – and didn’t want to do – was to press it before the protein cosmos came fully into focus.
Having the new catalog open to everyone is “a huge benefit,” says Julie Forman-Kay, a protein biophysicist at the Hospital for Sick Children and the University of Toronto. In many cases, AlphaFold and RoseTTAFold, another AI researchers are excited about, predict shapes that match well with protein profiles from experiments. But she cautions, “not across the board.”
Predictions are more accurate for some servers than others. The predictions could go wrong, with some scientists thinking they understand how it works when, in fact, they don’t. Such experiments remain crucial to understanding how proteins double, Forman-Kay says. “There is a sense now that people don’t have an experimental determination of structure, which is not true.”
Progress is slow
Proteins begin with long chains of amino acids and fold into curlicues and other 3-D shapes. Some are similar to corkscrew rings, tight 1980s perms, or accordion plates. Others may be mistaken for baby spiral cards.
Aesthetics aesthetics is more than just proteins; protein that than that services. For example, proteins called enzymes may need a pocket where they can capture small molecules and carry out chemical reactions. And proteins that work in a protein complex, two or more proteins interacting like parts of a machine, need the right shapes to break the formation with their partners.
Knowing the folds, turns, and loops of a protein’s shape can help scientists interpret how, for example, a mutation changes the shape of a disease. That knowledge can also help researchers make better vaccines and drugs.
For years, scientists have beamed protein crystals with X-rays, flashed frozen cells and examined them under a magnifying electron microscope, and used other methods to discover the mysterious forms of proteins. Such experimental methods take “a lot of personnel time, a lot of work and a lot of money.” So it’s slower,’ says Tamir Gonen, a membrane biophysicist and Howard Hughes Medical Institute researcher at UCLA’s David Geffen School of Medicine.
Such painstaking and expensive experimental work revealed the 3-D structures of more than 194,000 proteins, their data files stored in the Protein Data Bank, which was supported by a consortium of research institutes. But accelerating the pace at which geneticists understand DNA’s instructions for making proteins has far advanced biologists’ ability to preserve structures, says systems biologist Nazim Bouatta of the Harvard School of Public Health. “The question for structural biologists has been, how do we close the gap?” he says
For many researchers, it was a dream to have computer programs that could scan the DNA of genes and predict how the protein they encode would fold into a 3-D shape.
Here comes AlphaFold
Over many decades, scientists have advanced toward that AI goal. But “up until two years ago we were really far from any good solution,” says John Moult, a computational biologist at the University of Maryland’s Rockville campus.
Moult is one of the authors of the competition: Critical Assessment of Protein Structure Prediction, or CASP. Competing organizers set up their own algorithm to fold and compare the machine’s predictions against experimentally determined structures. Most AIs have not come close to the actual forms of servers.
Then in 2020, AlphaFold showed up in a big way, predicting the structures of 90 percent of the total test with accuracy, two-thirds accurately rivaling experimental methods.
The training of single-core servers in the CASP competition had been delayed since 1994. With the AlphaFold effect, “it happened all of a sudden,” says Moult.
Since AlphaFold’s 2021 release, more than half a million scientists have accessed its database, Hassabis said in a news brief. Some researchers, for example, have used Alphafold’s predictions to get closer to helping them solve a major biological puzzle: the nuclear pore complex. Nuclear pores are the key gates that allow molecules in and out of cell nuclei. Without pores, cells would not function properly. Each pore is huge, relatively speaking, composed of 1,000 pounds of 30 or so different proteins. Researchers have previously managed to put about 30 percent of the pieces in the puzzle.
This puzzle is now about 60 percent complete, after combining alphaFold’s predictions with experimental techniques to understand how the pieces fit together, the researchers reported on June 10. Science.
Now that AlphaFold has pretty much solved how to fold individual proteins, this year’s CASP teams are looking to work on the next challenges: Predict the structures of RNA molecules and model how proteins interact with each other and with other molecules.
For these roles, says Moult, deep learning AI methods look promising, but we haven’t delivered the goods yet.
Where AI fails
A powerful model of protein interactions would be a great advantage because most proteins do not work in isolation. They work with other proteins or other molecules in cells. But AlphaFold’s accuracy in predicting how the shapes of two proteins might change when the protein interacts is “nowhere near” what its spot-on projections are for knocking out a single protein, says Forman-Kay, a University of Toronto protein biophysicist. That’s something the alphaFold writers recognize as well.
AI is capable of folding proteins by examining the patterns of known structures. And many fewer multiprotein complexes than individual proteins are resolved by the experiment.
Keeping Forman-Kay interested in those who don’t want to be limited by any type. These intrinsically disordered proteins are typically as floppy as wet noodles (.SN: 2/9/13, p. 26). Some will fold into defined shapes when they interact with other proteins or molecules. And they can fold into new shapes, with pairs of different proteins or molecules to perform different tasks.
The patterns predicted by alphaFold reach a high confidence level of about 60 percent for the wiggly proteins that Forman-Kay and colleagues examined, the team reported in a preliminary study posted in February on bioRxiv.org. Often the program depicts shapeshifters while corkscrews are called alpha helices.
The Forman-Kay group compared Alphafold predictions for three disordered proteins with experimental data. The structure that the AI has assigned to a protein called alpha-synuclein is similar to the shape that the protein takes when it interacts with lipids, the team found. But it’s not always about the protein itself.
For another protein, called eukaryotic translation initiation factor 4E-binding protein 2, AlphaFold predicted a protein mishmash of two shapes with two different functional partners. That Frankenstein structure, which doesn’t exist in actual organisms, could mislead researchers about how the protein works, Forman-Kay and colleagues say.
AlphaFold* can also be a little more rigid in its predictions. Stable “structure doesn’t tell you everything about how a protein works,” says Jane Dyson, a biologist at the Writers Research Institute in La Jolla, Calif. Even individual proteins with generally well-defined structures are not spatially constrained. . Enzymes, for example, undergo small forms when controlling chemical reactions.
If you ask Alphafold to predict the structure of an enzyme, it’s sure to show a picture that closely resembles what scientists have determined through X-ray crystallography, Dyson says. “But [it will] it doesn’t show you any subtleties that vary as different partners” interact with the enzyme.
“There are moves that Mr. AlphaFold can’t give you,” says Dyson.
A revolution in fact
Computational exchanges are giving biologists a head start on solving questions such as how a drug interacts with a protein. But scientists should remember one thing: “These are models”, not experimentally elaborated structures, says Gonen, at UCLA.
He sometimes uses Alphafold’s predictions to help experiment with experimental data, but he insists that researchers will accept AI predictions as gospel. If that happens, “the danger is that it will become harder and harder and harder to justify why you need to solve the experimental structure.” That could ensure funding, talent and other resources for the types of experiments needed to check computer work and break new ground, he says.
Harvard Medical School’s Bouatta is pregnant. He thinks researchers probably don’t need to invest experimental resources into protein types because AlphaFold does a good job of predicting which will help biologists triage structures where to put their time and money.
“There are servers for which AlphaFold is still struggling,” agrees Bouatta. Researchers will have capital there, he says. “Maybe if it generates more [experimental] challenging data for these servers, we were able to use them to maintain another AI system,” which could make better predictions.
He and his colleagues have now in turn engineered AlphaFold to produce a version called OpenFold that researchers can use to solve other problems, such as those large but tough protein complexes.
The vast amounts of DNA generated by the Human Genome Project have made possible a wide range of biological discoveries and opened up new fields of research (SN: 2/12/22, p. 22). Having an information structure on 200 million servers could be something new, Bouatta says.
In the future, through AlphaFold and its AI cousin, he says, “we don’t even know what kind of questions we’re going to ask.”
#AlphaFold #predicted #structures #proteins