In the past five years, machine learning has come a long way. You might have noticed that Siri, Alexa, and Google Assistant are way better than they used to be, or that automatic translation on websites, while still fairly spotty, is hugely improved from where it was a few years ago.
But many still don’t quite grasp how far we’ve come, and how fast. Recently, two images made the rounds that underscore the huge advances machine learning has made — and show why we’re in for a new age of mischief and online fakery.
The first was put together by Ian Goodfellow, the director of machine learning at Apple’s Special Projects Group and a leader in the field. He looked over machine-learning papers published on the online open-access repository arXiv over the past five years, and found examples of machine learning-generated faces from each year. Each of the faces below was generated by an AI. Starting with the faces on the left, from 2014, you can see how dramatically AI capabilities have improved:
In 2014, we’d just started on the task of using modern machine-learning techniques to have AIs generate faces. The faces they generated looked grainy, like something you might see on a low-quality surveillance camera. And they looked generic, like an average of lots of human faces, not like a real person.
In less than five years, all of that changed. Today’s AI-generated faces are full-color, detailed images. They are expressive. They’re not an average of all human faces, they resemble people of specific ages and ethnicities. Looking at the woman above on the far right, I can vividly imagine a conversation with her. It’s surreal to realize she doesn’t exist.
How did we come so far so fast? Machine learning has seen a flood of new researchers and larger research budgets, driving rapid innovations, and a new technique invented in 2014 made a huge difference.
How AI learned photorealism, explained
Let’s start with a quick primer on how machine learning can generate images like these. Modern machine learning often uses a technique called a generative adversarial network (GAN). Ian Goodfellow, who compiled the above chart, invented the technique in 2014.
Here’s the idea: Imagine that an AI is trying to generate pictures of people. When it does unusually well, you want to tell it that it did unusually well (so it’ll try similar techniques next time). When it does unusually badly, you want to tell it that it did unusually badly (so it will correct whatever it was doing wrong). Your AI will need lots of practice — it may need to draw millions of pictures — to draw photorealistic humans. So you don’t want to sit at your computer giving feedback on each individual picture, because that’s totally unworkable.
Instead, you train a second AI to look at the first AI’s pictures of people, and guess whether they’re an AI-generated picture or a real picture. That’s the “adversarial” part of “generative adversarial network”: The two AIs are “adversaries,” with one trying to trick the other and the second one trying to guess when it’s being tricked.
When the two AIs start out, they’re both bad at their jobs. The first AI knows almost nothing about how people look, and the second AI knows almost nothing about how to tell AI-generated pictures apart from real ones. Over time, both get better at their jobs. (Much of the trick here, if you’re a researcher, is making sure that they get better at the same rate, so that they can keep learning from each other). Eventually, the first AI is making pictures that look real even to humans.
That’s a GAN, one of the most powerful new machine learning technologies.
Combining generative adversarial techniques with other recent advances lets you do even more. A video that started making the rounds recently shows just how much. It’s from a new working paper by Egor Zakharov and others at the Samsung AI lab in Moscow, published on arXiv this week, which shows how modern AIs can create a fairly realistic fake video from a single image:
The talking, smiling Mona Lisa here looks pretty creepy, because the AI had only one image of her to go from. In the same paper, the authors demonstrate that the technique makes more convincing fake videos when it has a few more images to go off, but the most striking example from their demonstration video remains this one, where the Mona Lisa is brought to life.
And, yes, techniques like these are already being used to make mischief, from people impersonating journalists on Twitter with AI-generated profile pictures to online tools for generating fake profiles yourself. But what’s really fascinating is what lies ahead. People rightly wonder, if we’ve come this far in the last five years, where will we be five years from now? How many more things considered the exclusive domain of humans will turn out to be easy for AI? One thing is for certain: Almost no one in 2014 saw this coming.
Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.