A large reason for that was that the scenario just felt silly. What did these folks think would happen — was some company going to build Skynet and manufacture Terminator robots to slaughter anyone who stood in their way? It felt like a sci-fi fantasy, not a real problem.
This is a misperception that frustrates a lot of AI researchers. Nate Soares, who runs the Machine Intelligence Research Institute, which focuses on AI safety, has argued a better analogy than the Terminator is the “Sorcerer’s Apprentice” scene in Fantasia. The problem isn’t that AI will suddenly decide we all need to die; the problem is that we might give it instructions that are vague or incomplete and that lead to the AI following our orders in ways we didn’t intend.
“The problem Mickey faces when he enchants a broom to help him fill a cauldron isn’t that the broom rebels or acquires a will of its own, but that it fulfills the task it was given all too well,” Soares stated in an interview last year. “He wants the cauldron full, and overflowing the workshop is a great way to be extra confident that the cauldron is full (and stays full). Mickey successfully ‘aimed’ his AI system but things still went poorly for Mickey.”
This — not Soares’s specific metaphor, but the observation that badly directed automated systems can lead to massive unintended consequences — is what made me start to agree with the argument that AI poses a major risk. I don’t think AI has malevolent motives, or is really capable of having malevolent motives. I definitely think humans are bad at precisely stating what they want, and that combined with a powerful enough technology, human failure could lead to very bad outcomes.
So what does that look like at scale? Computer scientist Paul Christiano, who works on AI safety at OpenAI, has a nice recent post laying out two broad scenarios. And I do mean “broad”: As a journalist, I’ve found it hard to get AI safety researchers to lay out a single concrete path through which they think AI could lead to doom, mostly because no one thinks any one particular scenario has a very high likelihood. The claim, rather, is that the odds of all possible catastrophically bad scenarios when added up represent a significant risk. But this makes it somewhat difficult to visualize the kind of outcome they fear.
Christiano’s categories make it a little bit easier, though. He divides potential disasters into two broad categories:
- Going out with a whimper
- Going out with a bang
Going out with a whimper
Human institutions are, already, better at maximizing easy-to-measure outcomes than hard-to-measure outcomes. It’s easier to increase standardized math test scores than it is to increase students’ actual math knowledge. It’s easier to cut reported robberies than it is to prevent actual robberies.
Machine-learning algorithms share this flaw, and exaggerate it in some ways. They are incredibly good at figuring out through trial and error how to achieve a human-specified quantitative goal. But humans aren’t always good at specifying those goals, and AIs are not good at distinguishing between reasonable interpretations of human instructions and unreasonable interpretations.
Suppose you’re the CEO of, say, Nike, and you have a new AI that can recommend approaches for maximizing your profits. If it recommends a new sneaker that will likely be incredibly popular, that’s great! If it instead recommends large-scale money laundering and targeted assassinations of Reebok customers to strike fear in the consumer markets — that’s less great!
That’s a dramatic example (and it’s mine, not Christiano’s). As Christiano notes, we might be able to avoid these kinds of outcomes early on by just changing code to guard against obvious bad results. “For a while we will be able to overcome these problems by recognizing them … and imposing ad-hoc restrictions that avoid manipulation or abuse,” he writes. “But as the system becomes more complex, that job itself becomes too challenging for human reasoning to solve directly.”
And by that point, there might not be agreement among humans that things are going awry, and without such agreement, it’s doubtful regulations and controls could be imposed: “As this world goes off the rails, there may not be any discrete point where consensus recognizes that things have gone off the rails.”
Humans may just accept ceding more and more decision-making authority to algorithms until much of human life appears to consist of humans implementing the recommendations of AI systems they believe to be smarter. That’s particularly true if lobbyists for AI-development companies like Google and Facebook fight aggressively against any regulations that limit the role of AI.
Even this might sound sci-fi-ish. But ask yourself: Do you trust your ability to find the cheapest flight when you have to go from New York to San Francisco? Or do you trust algorithms at Kayak or Hipmunk? I, personally, tend to trust Hipmunk.
If you had the ability to pay for an AI stock adviser, like the one Betterment already sells, would you trust it over your own ability to pick stocks? Probably, right? What about when you’re a college senior nervous about career choices, and a sophisticated AI recommends possible job paths? At what point do you jump off the train? Or do you just keep trusting the algorithms?
Going out with a bang
Even so, Christiano’s first scenario doesn’t precisely envision human extinction. It envisions human irrelevance, as we become agents of machines we created.
His second scenario is somewhat bloodier. Often, he notes, the best way to achieve a given goal is to obtain influence over other people who can help you achieve that goal. If you are trying to launch a startup, you need to influence investors to give you money and engineers to come work for you. If you’re trying to pass a law, you need to influence advocacy groups and members of Congress.
That means that machine-learning algorithms will probably, over time, produce programs that are extremely good at influencing people. And it’s dangerous to have machines that are extremely good at influencing people.
“Early in the trajectory, influence-seeking systems mostly acquire influence by making themselves useful and looking as innocuous as possible,” Christiano writes. “They may provide useful services in the economy in order to make money for them and their owners, make apparently-reasonable policy recommendations in order to be more widely consulted for advice, try to help people feel happy, etc.”
But eventually, the algorithms’ incentives to expand influence might start to overtake their incentives to achieve the specified goal. That, in turn, makes the AI system worse at achieving its intended goal, which increases the odds of some terrible failure. And if a failure occurs, well:
An unrecoverable catastrophe would probably occur during some period of heightened vulnerability — a conflict between states, a natural disaster, a serious cyberattack, etc. —since that would be the first moment that recovery is impossible and would create local shocks that could precipitate catastrophe. The catastrophe might look like a rapidly cascading series of automation failures: A few automated systems go off the rails in response to some local shock. As those systems go off the rails, the local shock is compounded into a larger disturbance; more and more automated systems move further from their training distribution and start failing. Realistically this would probably be compounded by widespread human failures in response to fear and breakdown of existing incentive systems.
Human reliance on these systems, combined with the systems failing, leads to a massive societal breakdown. And in the wake of the breakdown, there are still machines that are great at persuading and influencing people to do what they want, machines that got everyone into this catastrophe and yet are still giving advice that some of us will listen to.
Naturally, both of Christiano’s scenarios are fairly theoretical. But we can already see what increased reliance on AI looks like in our daily lives. It feels probable that the number of services like Hipmunk or Google Maps — commonly used algorithms that humans tend to trust more than they trust other humans — will multiply. And as Christiano suggests, that could have some major unintended consequences.
Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.