Last week, a 31-year-old construction worker took a few psychedelics and thought it might be fun to use AI image generator Midjourney to create a photorealistic image of Pope Francis wearing a big white Balenciaga-style puffer jacket. A lot of people who saw it thought it was fun, too, so they spread it around social media. Most of them probably had no idea that it wasn’t real.
Now, the Pope having that drip isn’t the worst nor most dangerous deployment of photorealistic AI-generated art, in which new images are created from text prompts. But it is an example of just how good this technology is becoming, to the point that it can even trick people who are usually more discerning about spreading misinformation online. You might even call it a turning point in the war against mis- and disinformation, which the people fighting were, frankly, already losing simply because social media exists. Now we have to deal with the prospect that even the people who are fighting that war may inadvertently help spread the disinformation they’re trying to combat. And then what?
It’s not just Coat Pope. In the last two weeks, we’ve seen several ominous AI-image stories. We had Trump’s fake arrest and attempted escape from the long AI-generated arm of the law, which was capped by a set of poorly rendered fingers. We had Levi’s announcing it would “supplement” its human models with AI-generated ones in the name of diversity (hiring more diverse human models was apparently not an option). Microsoft unleashed its Bing Image Creator in its new AI-powered Bing and Edge browser, and Midjourney, known for its photorealistic images, released its latest version.
Finally, there’s the news that AI image generators are getting better at drawing hands, which had been one of the tell-tale signs to detect if an image is fake. Even as convincing as Coat Pope appeared, a close look at his right hand would have revealed its AI origins. But soon, we may not even have that. Levi’s will be able to use AI models to show off its gloves, while the rest of us might be thrown into a new world where we have absolutely no idea what we can trust — one that’s even worse than the world we currently inhabit.
“We’ve had this issue with text and misinformation on social platforms. People are conditioned to be skeptical with text,” said Ari Lightman, a professor of digital media and marketing at Carnegie Mellon University. “An image ... adds some legitimacy in the user’s mind. An image of video creates more resonance. I don’t think our blinders are up yet.”
In just a few short years, AI-generated images have come a long way. In a more innocent time (2015) Google released “DeepDream,” which used Google’s artificial neural network programs — that is, artificial intelligence that’s been trained to learn in a way that mimics a human brain’s neural networks — to recognize patterns in images and make new images from them. You’d feed it an image, and it would spit back something that resembled it but with a bunch of new images weaved in, often things approximating eyeballs and fish and dogs. It wasn’t meant to create images so much as to show, visually, how the artificial neural networks detected patterns. The results looked like a cross between a Magic Eye drawing and my junior year of college. Not particularly useful in practice, but pretty cool (or creepy) to look at.
These programs got better and better, training on billions of images that were usually scraped from the internet without their original creators’ knowledge or permission. In 2021, OpenAI released DALL-E, which could make photorealistic images from text prompts. It was a “breakthrough,” says Yilun Du, a PhD student at MIT’s Computer Science and Artificial Intelligence Laboratory who studies generative models. Soon, not only was photorealistic AI-generated art shockingly good, but it was also very much available. OpenAI’s Dall-E 2, Stability AI’s Stable Diffusion, and Midjourney were all released to the general public in the second half of 2022.
The expected ethical concerns followed, from copyright issues to allegations of racist or sexist bias to the possibility that these programs could put a lot of artists out of work to what we’ve seen more recently: convincing deepfakes used to spread disinformation. And while the images are very good, they still aren’t perfect. But given how quickly this technology has advanced so far, it’s safe to assume that we’ll soon be hitting a point where AI-generated images and real images are nearly impossible to tell apart.
Take Nick St. Pierre’s work, for example. St. Pierre, a New York-based 30-year-old who works in product design, has spent the last few months showing off his super-realistic AI art creations and explaining how he got them. He may not have the artistic skills to compose these images on his own, but he has developed a skill for getting them out of Midjourney, which he says he uses because he thinks it’s the best one out there. St. Pierre says he dedicated the month of January to 12-hour days of working in Midjourney. Now he can create something like this in just about two hours.
“When you see a digital image on the internet and it’s AI generated, it can be cool, but it doesn’t, like, shock you,” St. Pierre said. “But when you see an image that’s so realistic and you’re like, ‘wow, this is a beautiful image’ and then you realize it’s AI? It makes you question your entire reality.”
But St. Pierre doesn’t usually put real people in his work (his rendering of Brad Pitt and John Oliver as female Gucci models from the ‘90s is an exception, though few people would look at either and think they were actually Brad Pitt or John Oliver). He also thinks social media companies will continue to develop better tools to detect and moderate problematic content like AI-generated deepfakes.
“I’m not as concerned about it as a lot of people are,” he said. “But I do see the obvious dangers, especially in the Facebook world.”
Du, from MIT, thinks we’re at least a few years away from AI being able to produce images and videos that flood our world with fake information. It’s worth noting that, as realistic as St. Pierre’s images are, they’re also the end product of hours and hours of training. Coat Pope was made by someone who said he’d been playing around with Midjourney since last November. So these aren’t yet images that anyone can just spin up with no prior experience. Lightman, from Carnegie Mellon, says the question now is whether we’ll be ready for that possibility.
Of course, a lot of this depends on the companies that make these programs, the platforms that host them, and the people who create the images to act responsibly and do everything possible to prevent this from happening.
There are plenty of signs that they won’t. Bing Creator won’t generate an image of a real person, but Midjourney — the source of both Coat Pope and Fugitive Trump — clearly does (it has since banned the creators of both images from the platform but did not respond to request for comment). They all have their own rules for what is or isn’t allowed. Sometimes, there aren’t any rules at all. Stable Diffusion is open source, so anyone with any motives can build their own thing on top of it.
Social media platforms have struggled for years to figure out what to do about the disinformation campaigns that run wild through them, or if and how they should curb the spread of misinformation. They don’t seem very well-equipped to deal with deepfakes either. Expecting all of humanity to do the right thing and not try to trick people or use AI images for malicious purposes is impossibly naive.
And while many leaders of the AI movement signing a letter from an effective altruism-linked nonprofit that urged a six-month moratorium on developing more advanced AI models is better than nothing, it’s also not legally compelling. Nor has it been signed by everyone in the industry.
This all assumes that most people care a lot about not being duped by deepfakes or other lies on the internet. If the past several years have taught us anything, it’s that, while a lot of people think fake news is a real issue, they often don’t care or don’t know how to check that what they’re consuming is real — especially when that information conforms to their beliefs. And there are people who are happy enough to take what they see at face value because they don’t have the time or perhaps the knowledge to question everything. As long as it comes from a trusted source, they will assume it’s true. Which is why it’s important that those trusted sources are able to do the work of vetting the information they distribute.
But there are also people who do care and see the potential damage that deepfakes that are indistinguishable from reality pose. The race is on to come up with some kind of solution to this problem before AI-generated images get good enough for it to be one. We don’t yet know who will win, but we have a pretty good idea of what we stand to lose.
Until then, if you see an image of Pope Francis strolling around Rome in Gucci jeans on Twitter, you might want to think twice before you hit retweet.
A version of this story was first published in the Vox technology newsletter. Sign up here so you don’t miss the next one!