It was not until 1957 that scientists gained special access to the molecular third dimension.
After 22 years of exhausting experimentation, John Kendrew of the University of Cambridge finally discovered the 3D structure of a protein. It was the twisted blueprint of myoglobin, a stringy chain of 154 amino acids that helps oxygenate our muscles. As revolutionary as this discovery was, Kendrew didn’t exactly open the door to protein architecture. Over the next decade, fewer than a dozen more would be identified.
Fast forward to today, 65 years since that discovery by the Nobel Prize winner.
On Thursday, Google’s sister company, DeepMind, announced that it had successfully used artificial intelligence to predict the 3D structure of nearly every cataloged protein known to science. That’s over 200 million proteins found in plants, bacteria, animals, humans – just about anything you can imagine.
“Basically, you can imagine it covers the entire protein universe,” Demis Hassabis, founder and CEO of DeepMind, told reporters this week.
That’s thanks to AlphaFold, DeepMind’s revolutionary AI system, which has an open-source database so that scientists around the world can incorporate it into their research at will and for free. Since AlphaFold’s official launch last July – when it pinpointed only about 350,000 3D proteins – the program has made a noticeable mark on the research landscape.
“More than 500,000 researchers and biologists have used the database to view over 2 million structures,” Hassabis said. “And these predictive structures have helped scientists make brilliant new discoveries.”
In April, for example, scientists at Yale University called on AlphaFold’s database to help them in their goal of developing a new, highly effective malaria vaccine. And in July last year, scientists at the University of Portsmouth used a system to engineer an enzyme to combat single-use plastic pollution.
“This has moved us a year ahead of where we were, if not two,” John McGeehan, director of Portsmouth’s Center for Enzyme Innovation and the researcher behind the second study, told the New York Times.
These ventures are just a small sample of AlphaFold’s ultimate reach.
“In the past year alone, there have been over a thousand scientific articles on a wide range of research topics using AlphaFold structures; I’ve never seen anything like it,” Sameer Velankar, DeepMind Fellow and team leader at the European Laboratory for Molecular Biology’s Protein Data Bank, said in a statement. for the public.
Others who have used the database, according to Hassabis, include those trying to improve our understanding of Parkinson’s disease, people hoping to protect bee health, and even some who want to gain valuable insight into human evolution.
“AlphaFold is already changing the way we think about the survival of molecules in the fossil record, and I see it soon becoming a fundamental tool for researchers working not only in evolutionary biology, but also in archeology and other paleo-sciences,” Beatrice Demarchi, associate professor at the University in Turin, which recently used the system in a study of the ancient egg controversy, said in a press release.
In the coming years, DeepMind also intends to partner with teams at the Drug Initiative for Neglected Diseases and the World Health Organization, with the goal of finding cures for little-studied but ubiquitous tropical diseases such as Chagas disease and Leishmaniasis.
“It will make many researchers around the world think about what experiments they could do,” Ewan Birney, a DeepMind fellow and deputy director of EMBL, told reporters. “And think about what’s going on in the organisms and systems they’re studying.”
Locks and keys
So why do so many scientific advances depend on this treasure trove of 3D protein modeling? Let’s explain.
Suppose you are trying to make a key that fits perfectly into a lock. But you have no way to see the structure of that lock. All you know is that this lock exists, some data about its materials, and maybe numerical information about how big each ridge is and where those ridges should be.
Developing this key might not be impossible, but it would be quite difficult. The keys must be precise or they won’t work. Therefore, before you begin, you will probably go out of your way to model several different dummy locks with all the information you have so that you can make your own key.
In this analogy, the lock is a protein and the key is a small molecule that binds to this protein.
For scientists, whether they are doctors trying to create new drugs or botanists dissecting the anatomy of plants to make fertilizers, the interaction between certain molecules and proteins is crucial.
With drugs, for example, the specific way a drug molecule binds to a protein could be the tipping point for whether it works. This interaction becomes complicated because even though proteins are just strings of amino acids, they are not flat or straight. They inevitably bend, bend and sometimes get tangled around themselves, like the headphone wires in your pocket.
In fact, a protein’s unique folds dictate how it functions—and even the slightest misfolding in the human body can lead to disease.
But going back to small molecule drugs, sometimes pieces of the folded protein are blocked from binding the drug. They can happen to be folded in a strange way that makes them inaccessible, for example. This kind of stuff is very important information for scientists trying to make their drug molecule stick. “I think it’s true that almost every drug that has come on the market in the last few years has been designed in part through knowledge of protein structures,” Janet Thornton, a researcher at EMBL, told the conference.
Because of this, researchers typically spend an incredible amount of time and effort decoding the folded, 3D structure of the protein they’re working with the way you’d begin your key-making journey by snapping together a lock mold. If you know the exact structure, it becomes much easier to tell where and how a molecule would bind to a given protein, and how that binding might affect the folding of the protein in response.
But this undertaking is not simple. Or cheap.
“The cost of solving a new, unique structure is on the order of $100,000,” said Steve Darnell, a structural and computational biologist at the University of Wisconsin and a researcher at the bioinformatics company DNAStar.
This is because the solution usually comes from Great complicated laboratory experiments.
Kendrew, for example, was using a technique called X-ray crystallography at the time. Basically, this method requires you to take solid crystals of the protein of interest, put them in a beam of X-rays, and watch what kind of pattern the beam makes. That pattern is mostly a position thousands of atoms inside the crystal. Only then can you use the pattern to figure out the structure of the protein.
There is also a newer technique known as cryo-electron microscopy. This is similar to X-ray crystallography, except that the protein sample is directly bombarded with electrons instead of a beam of X-rays. And although it is considered a much higher resolution than the other technique, it cannot penetrate everything. Furthermore, in the realm of technology, some have attempted to digitally create protein folding structures. But early attempts, like several attempts in the 80s and 90s, weren’t great. As you can imagine, laboratory methods are also boring – and difficult.
Over the years, such barriers have led to what is called the “protein folding problem.” Scientists simply don’t know how proteins fold, and they’ve faced significant hurdles to overcome that problem.
AlphaFold’s AI could be a game changer.
Solving the ‘bending problem’
In short, AlphaFold was trained by DeepMind engineers to predict protein structures without the need for a laboratory presence. No crystals, no electrons, no $100,000 experiments.
To get AlphaFold to where it is today, first, according to the company’s website, the system was exposed to 100,000 known protein folding structures. Then, as time went on, it began to learn how to decode the rest.
It really is that simple. (Well, except for the talent that went into coding the AI.)
“It takes, I don’t know, at least $20,000 and a lot of time to crystallize the protein,” Birney said. “That means experimentalists have to choose what to do – AlphaFold hasn’t had to make choices yet.” This feature of AlphaFold’s thoroughness is quite fascinating. This means that scientists have more freedom to guess and check, follow a gut or instinct, and cast a wide net in their research when it comes to protein structures. They won’t have to worry about costs or deadlines.
“Models also come with prediction error,” said Jan Kosinski, a DeepMind fellow and structural modeler at EMBL in Hamburg, Germany. “And usually — actually in many cases — the error is really small. So we call it almost atomic precision.”
Furthermore, the DeepMind team also says that it has performed a wide range of risk assessments to make sure that using AlphaFold is safe and ethical. Members of the DeepMind team also suggested that AI, more generally, could carry biosecurity risks that we hadn’t thought to assess before — especially as such technology continues to permeate the medical space.
But as the future unfolds, the DeepMind team says AlphaFold will fluidly adapt and address such concerns on a case-by-case basis. So far, it seems to be working—with a universe of protein models ranging back to a modest portrait of myoglobin.
“Just two years ago,” Birney said, “we just didn’t realize it was feasible.”
Correction at 6:45 a.m. PT: Janet Thornton’s last name and title have been fixed.