In reality, the nickname Hal refers to a different kind of killer: not of humans, but of bacteria.
In February 2020 — more than five decades after the science fiction film introduced the world to perhaps the first great AI villain — a team of researchers at the Massachusetts Institute of Technology used artificial intelligence to discover an antibiotic capable of killing E. coli, which hospitalizes thousands of people a year, as well as an antibiotic-resistant strain of another common bacterial infection, Acinetobacter baumannii. And taking a page from 2001, they named it halicin, after HAL 9000.
The discovery of halicin paints a picture of just how rapid AI-assisted drug discovery can be. Scientists trained their AI model by introducing it to approximately 2,500 molecules (1,700 of which were FDA-approved drugs, and 800 of which were natural products). Once the researchers trained the model to understand which molecules could kill E. coli, the team ran 6,000 compounds through the system, including existing drugs, failed drugs, natural products, and a variety of other compounds.
The system found halicin in a fraction of the time that traditional methods would take, said Bowen Lou, an assistant professor at the University of Connecticut’s School of Business who studies how AI is changing the pharmaceutical industry. “Not only can halicin kill many species of antibiotic-resistant bacteria, it is also structurally distinct from prior antibiotics,” he said in an email. “This discovery is groundbreaking because antibiotic-resistant ‘superbugs’ are a major public health issue that traditional methods have largely failed to address.”
“The idea that you can look at the structures of a small molecule and predict its properties is a very old idea. The way people thought of it is, if you can identify some structures within the molecule, some functional groups, and so on, you can sort of say, ‘What does it do?’” said Regina Barzilay, a distinguished professor of AI and health with MIT’s School of Engineering and co-author of a May 2023 study that identified another potential antibiotic candidate by building upon the methods used in the initial halicin study.
Prior to the use of AI, the challenge of discovering these structures and identifying a drug’s potential use was primarily one of speed, efficiency, and cost. Past analyses show that, between the early 1990s and the late 2000s, the typical drug discovery and development process took 12 years or more. In the case of halicin, the MIT team used AI that can test more than 100 million chemical compounds over the course of only a few days. “It became clear that molecular science is really a good place to apply machine learning and to use new technology,” Barzilay said.
With at least 700,000 deaths every year attributed to drug-resistant diseases — a number projected to grow to 10 million deaths annually by 2050 — the need for speed is great, especially given that the rate of drug advancements has stalled in recent decades. Since 1987, the year scientists identified the last successful antibiotic class used in treating patients, the world has entered what scientists call the “discovery void.”
Crucially, AI can analyze vast amounts of medical data, and, as the discovery of halicin suggests, it can meaningfully accelerate the drug discovery process. This new technology continues to spur significant advancements in the medical field and holds the potential to improve patient outcomes and facilitate more precise treatment methods. It could also lower costs, which would be vital for antibiotic development, given that at least some of the industry’s stagnation is due not to the inability to identify new drugs, but to a lack of market interest and incentive.
“The fact that 90 percent of drugs fail in the clinic tells us that there’s room for improvement. It’s a really complex system. This is exactly what machine learning is made for: really complex systems,” Chris Gibson, the co-founder and CEO of biotech company Recursion, told Vox of recent breakthroughs in the drug discovery space. “It doesn’t mean getting rid of the role people play in many ways, but it augments and turns our scientists into super scientists to have these tools to go faster and to explore more broadly.”
To be clear, the AI programs researchers use for drug discovery vary greatly from science fiction’s AI creations. These advances in pharmaceutical development do not mean robot doctors will run the medical field any time in the near future. But halicin and other recent breakthroughs represent the ability of AI to transform the pharmaceutical industry and bridge more than three decades of the antibiotic discovery void.
The history of AI in medicine
Language models and image generators like OpenAI’s ChatGPT, Google’s Bard, and Midjourney introduced many to the concept of AI when they launched widely in late 2022 and early 2023. But scientists have been using AI — of a sort — for decades.
In 1965, researchers at Stanford University attempted to use a computer program to identify chemical compounds. Considered the “first application of AI to a problem of scientific reasoning,” the DENDRAL project paved the way for future uses of the technology in the scientific community.
Almost a decade later, scientists at Stanford led further developments in medical AI when they created the computer system MYCIN, which helped health care workers diagnose bloodborne bacterial infections in patients. This rules-based system posed a series of questions on symptoms, medical history, test results, and various other factors and would generate a response reporting the likelihood of a particular diagnosis.
However, the rigidity of rules-based systems means they lack the precision necessary to thrive in the ever-changing medical field. (Rules-based systems do not learn new information unless someone manually changes the rules of the program.)
But, depending on your definition of AI, these early technologies don’t even qualify as such. “Many parties who you might talk to in our industry will say that AI existed for many decades or at least for over a decade,” said Alex Zhavoronkov, the CEO of Hong Kong-based AI drug discovery company Insilico Medicine. “They will be right if you define any form of machine learning to be AI.” (An MIT article defines machine learning as a machine’s ability to “imitate intelligent human behavior.”)
Zhavoronkov has a narrower definition of AI drug discovery, saying it refers specifically to the application of deep learning and generative learning in the drug discovery space. Deep learning is a type of machine learning where artificial neural networks (similar but not exactly like the neurons in our brain) allow a machine to learn and advance independent of human intervention.
The “deep learning revolution” — a time when development and use of the technology exploded — took off around 2014, Zhavoronkov said.
Throughout the 2000s, pharmaceutical giants and plucky startups saw an opportunity to accelerate the drug development process. Between 2008 and 2015, many companies focused on AI drug discovery launched, including Evaxion, Exscientia, Recursion, Benevolent AI, and Insilico Medicine. The industry grew even further in the late 2010s when Big Pharma started backing some of these new startups.
“It is worth noting that earlier generations of IT only achieved limited success in drug discovery,” Lou said. “Recent advances in AI have brought about a significant shift in this landscape. AI, with its powerful algorithms and data-driven approaches, has the potential to revolutionize the process of discovering new drugs.”
According to a number of experts who spoke to Vox, the cataloging of biological and chemical information aided in recent drug breakthroughs.
In 2018, DeepMind, a Google-backed AI research laboratory, developed Alphafold, a network that can determine a protein’s structure from its building blocks. “In my opinion, the most fundamental game-changer [in medical AI] is DeepMind’s Alphafold, which has now predicted the structure for essentially all proteins known to us and fundamentally advanced our understanding of biology,” Swarat Chaudhuri, a professor of computer science at the University of Texas Austin, told Vox in an email. “The findings from Alphafold are already having a massive impact on drug and vaccine development.”
Scientists have also been itemizing compounds, or molecules, into chemical libraries, such as the widely used Enamine REAL Space, which contains 36 billion novel molecules. Drug development and pharmaceutical companies order molecules from Enamine Real and then evaluate whether that molecule has the desired effect on the protein being studied (the structure of which is known thanks to Alphafold and other similar software).
Knowing the structure of these proteins and having access to a molecule library are instrumental in determining the potential usefulness of a drug candidate. In the case of halicin, the researchers found a successful antibiotic contender in a library of only 6,000 compounds.
With billions of data points, the potential for new drug discoveries is massive, and new advances could continue to accelerate the process. On August 8, Recursion announced that, in partnership with Nvidia, it predicted how the 36 billion target molecules in Enamine Real’s library interact with approximately 80,000 pockets, or protein binding sites, across over 15,000 proteins. Recursion evaluated around 2.8 quadrillion drug-target pairs, the first step to identifying new drugs.
“Think of this as locks and keys,” said Gibson. The target molecules are the keys, and the protein pockets are the locks. “The idea of a drug in many cases is like a key. You find a very specific key that fits a very specific lock.” Recursion’s work means all molecules in that library (not just those with a similar structure to known useful compounds) can be considered and tested.
Recursion’s development makes it easier to know which keys will fit which locks, but its predictions are not perfect. “It becomes a data layer upon which we can do fast searches,” Gibson said. “Just like a Google search result isn’t always exactly what you’re looking for, but they summarize the top searches ... and usually what you want is in one of the top five. It’s the same kind of thing here. We can take 2.8 quadrillion parameters and basically say, ‘If you want an inhibitor of this particular protein, here’s the molecules you might start with.’”
According to Gibson, Recursion’s lab currently conducts as many experiments every 15 minutes as he did in his entire five-year PhD program. “It’s taking almost like an artisanal and bespoke science of the old days and turning it into an industrialized science,” he said. “It’s almost like when making automobiles went from handmade automobiles — every one a little bit custom — to the assembly line.”
Biological tools, robotic automation, and improved computation, among other technological advances, have all played a role in advancing the field of drug discovery and development, Gibson said. AI is simply a critical contributor to these improvements.
What can’t AI do?
AI systems cannot yet accomplish every part of the drug development process alone, particularly in the late stages. Returning to Gibson’s automobile-manufacturing analogy, he says the drug discovery space is currently at the point the Ford Motor Company was when founder Henry Ford quipped, “Any customer can have a car painted any color that he wants so long as it’s black.”
“We’re in the early days where there’s less flexibility in some of these datasets, but they’re built in a standardized way that allows machine learning to take off,” Gibson said.
In June, Insilico Medicine began clinical trials for what Zhavoronkov told CNBC was “the first fully generative AI drug to reach human clinical trials.” The drug, INS018_055, aims to treat idiopathic pulmonary fibrosis, a chronic lung disease, and reportedly relies on both an AI-discovered target and an AI-generated design. Insilico even uses robots in their target discovery lab to develop their small-molecule drug candidates.
Drug candidates typically require at minimum six or seven years to pass through all the necessary human trials, said Zhavoronkov, and the first truly AI-generated drug candidates only started popping up about four years ago. “That is the reason why we haven’t seen AI-generated drugs on the market,” he said. “Many of those drugs that are true AI drugs, they were created just a couple years ago, so they didn’t have the time to get into the human clinical trials. We are I think the first one with a true generative AI drug.”
Still, according to Gibson, we are far from removing people entirely from the drug discovery process. “What’s important to know is that [machine learning and AI] is an incredible tool and, used well, it can help us with many steps in the process. The idea that somebody hits ‘enter’ on an AI algorithm and a drug just pops out, I believe that is a fallacy,” said Gibson. “I am confident no such technology exists today.”
According to Chaudhuri, questions of reliability are a major limiting factor in expanding the role of AI in the medical field. “To deploy AI systems in safety-critical domains, for example, real-time decision-making in health care, you need to trust them,” he said. “But how do you trust a system like GPT-4, which gives you reasonable-sounding answers one minute and complete lies in the next?”
And the truth is, it’s not clear yet when people should put their full trust in the machine’s decision-making processes, Barzilay said. No one needs a person to double-check that the products recommended to them by an Amazon algorithm align with their shopping needs, but medical decisions carry far more weight.
Even with current limitations, however, experts see a great deal of promise in medical AI. “Today, we are paying several billion dollars for each molecule that goes into a drug. It’s unsustainable,” Barzilay said. “There are still a lot of diseases for which we don’t have good drugs, or even for diseases for which we do have approved drugs but we have a whole bunch of side effects.” For example, she said, the breast cancer treatment drug Tamoxifen, while often necessary, comes with a host of harmful side effects, including brain problems.
“What [Tamoxifen] shows to us is we’re nowhere close to where we want to be because of how we develop drugs,” Barzilay said. She believes AI, however, can change the process for the better: “I really think that machine learning should be part of each one of these processes. And I hope and believe that in five or 10 years, drug discovery will be different.”