OpenAI, the maker of ChatGPT, believes it’s on the cusp of transforming our world with powerful AI systems. At minimum, it thinks these will fundamentally change how we work and live. At maximum, it could make our world unrecognizable overnight.
To make this go well, instead of catastrophically badly, OpenAI has created what it calls the superalignment team, which tries to understand how to make superhuman AI do what we want, instead of doing its own thing.
The team head is Jan Leike, a machine learning researcher who worked at Google’s DeepMind before joining OpenAI. His team is in a race against time: The goal is to figure out how to align powerful AI systems before unaligned powerful AI systems get developed. (An AI system is “aligned” if it’s trying to do the things that humans want, and “unaligned” if it’s trying to do other things outside our control. A big, unanswered question is how well we can tell what our AI systems are trying to do at all.)
“I think alignment is tractable,” Leike told Rob Wiblin on the 80,000 Hours podcast this August. “I think we can actually make a lot of progress if we focus on it and put effort into it. … Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on, we can actually build towards. And I think it’s pretty likely going to work, actually. And that’s really, really wild, and it’s really exciting. It’s like we have this hard problem that we’ve been talking about for years and years and years, and now we have a real shot at actually solving it.”
The basic approach is to develop techniques that work to align systems slightly more powerful than the ones we have today, safely build those systems, and then use them to align their successors.
Many people justifiably don’t want to gamble the fate of the world on the success of OpenAI’s internal alignment research team (I don’t, myself, want to take that gamble). But even if one would like to see technical alignment research accompanied with much stronger external oversight, governance, auditing, and measures to prevent the deployment of potentially dangerous systems, technical work on making AI systems safe will certainly be a huge element of any solution to this pressing challenge.
Sometimes, progress on the technical side can open up new options for political and governance solutions. And I think it’s to their immense credit that Leike’s team openly admits the insane stakes of the work they’re doing, and that they are willing to explain how they intend to do it. Their candor means that other researchers can evaluate their approach and figure out if this approach will get us to safe superintelligences — and if not, what will go wrong.