In the past month, not one but two pieces of AI-generated content featuring the fashion brand Balenciaga went viral. The much bigger deal was the photo of Pope Francis in a white puffer coat (and absolutely dripping in swag) that a lot of people thought was real. But I’d argue the more interesting one was a video that imagined Harry Potter if it were a Balenciaga campaign in the late ’80s or early ’90s.
The clip, which is just under a minute and features mostly zoom-ins of recognizable characters and a deep house backbeat, isn’t really all that interesting in itself, unless you happen to be both a big Harry Potter person and a major fashion stan. Unlike the photo of Balenciaga Pope, the point isn’t to be like, “Haha, you got fooled by AI!” Instead, what’s interesting to me is the question of just how long we, as a society, have before AI-powered video becomes most of what we think of as visual entertainment.
To find out, I asked the clip’s creator, a YouTuber, photographer, and AI hobbyist who goes by the username Demon Flying Fox and lives in Berlin. (He asked to be referred to by his handle to avoid conflating his photography business and his work with AI.) On where the concept came from, he says, “I was brainstorming random video ideas, and it’s helpful when there’s a big surprising contrast. Harry Potter has been spoofed so many times, so it’s evergreen, and Balenciaga is the most memorable company because of its marketing and aesthetics.”
More notable than the concept itself, however, was the fact that the clip only took him about two days to create using the AI tools Midjourney, ElevenLabs, and D-ID, and that he’s only been playing around with AI for a few months. Thanks in part to the success of Balenciaga Harry Potter, he’s now able to earn a full income through YouTube ads and Patreon subscribers.
One possible takeaway from all of this is that the future of AI-generated media is thrilling and possibly even mind-opening, allowing us to “greatly increase the raw material of plausible worlds the mind can imagine inhabiting and, through them, the kinds of futures we perceive as possible,” as my colleague Oshan Jarow argues. Another viable takeaway is that AI could have potentially devastating consequences for art, sidelining subjective human experiences and encouraging the culture industry to only invest in safe, lowest-common-denominator drivel that can be endlessly focus-grouped into maximum profit. “In general, the more a movie seems created by consensus — as many big franchise flicks designed for maximum box office earnings are — the less good it is,” writes my other colleague Alissa Wilkinson. “It’s designed to please many and challenge few, not for any philosophical reason but because that’s what makes a lot of money.”
Both of these futures are already happening in tandem. We’re already at the point where anyone can spend a few months, days, or hours experimenting with AI and creating something like Balenciaga Harry Potter, and in doing so circumventing (or, depending on your perspective, stealing) the creative labor of actors, graphic artists, and filmmakers. But how far are we from a world in which those months, days, and hours are reduced to seconds? How far are we, really, from typing a few words into an AI and it spitting out an entire feature-length film?
So I tried it. Specifically, I tried to follow Demon Flying Fox’s method for AI video creation to see how easy it was for someone with extremely limited experience with either video editing or AI to replicate something like Balenciaga Harry Potter.
Not wanting to totally rip off his idea but still honor the spirit of it, I chose a different highly recognizable visual style — the films of Wes Anderson — and another hugely popular fantasy series, The Lord of the Rings. Using Demon Flying Fox’s preferred AI tools (for which I’ll be noting the costs not as an advertisement but for full context), I started by asking ChatGPT to write me a scene from a film adaptation of LOTR if it were directed by Wes Anderson. Here’s the first bit of the full scene it sent me:
INT. BAG END - DAY
The camera pans across the cozy interior of Bag End, a hobbit hole in the Shire. We see Frodo (ELIJAH WOOD), sitting on a comfortable armchair, holding a small ring in his hand. Gandalf (BILL MURRAY), an old wizard, enters the room.
Gandalf: Good morning, Frodo! How are you feeling today?
Frodo: (looking up from the ring) Oh, hello Gandalf. I’m fine, I guess. Just a bit worried about this ring.
Gandalf: Ah yes, the ring. It’s a dangerous thing, Frodo. But don’t worry, I have a plan.
The camera zooms in on Gandalf’s face, and we see a mischievous twinkle in his eye.
Gandalf (continued): We need to take this ring to Mordor, and throw it into the fires of Mount Doom. And I have just the team to help us.
The camera cuts to a montage of Gandalf gathering a group of characters, each with their unique personality and skill set. We see Legolas (OWEN WILSON), the elf, Aragorn (JASON SCHWARTZMAN), the human, Gimli (ADRIEN BRODY), the dwarf, and several other characters.
It’s pretty bad as far as screenwriting goes, but the fact that it offered suggestions for which actors would play which characters was an unexpected delight (although at 6-foot-1, Adrien Brody is much too tall to play a dwarf, and apparently AI hasn’t heard we’re not casting Bill Murray in anything these days).
Next, I used Midjourney (annual subscription cost for basic plan: $96) to create portraits of each character in the scene. This is where it gets complicated, and where some of Demon Flying Fox’s artfulness makes itself apparent. I started with the most basic of prompts — “Gandalf the Grey if he were filmed in a Wes Anderson movie,” for instance, which gave me this:
Nice-looking, sure, but I didn’t want a perfect square shot. From watching his tutorial on creating AI avatars, I learned that if you want to change the aspect ratio of Midjourney images, you have to include “—ar 3:2” in the prompt, and that it helps to include “full body” if you don’t want super close-ups.
After I interviewed Demon Flying Fox, however, he mentioned a couple of other keywords that might be helpful. Although he wouldn’t say exactly what his prompts were for creating Balenciaga Harry Potter, he recommended including the term “cinematic,” as well as adding specific dates for reference. The prompt that landed me with my final Frodo was this: “Frodo Baggins, portrait, full body, cinematic, film still, in the style of a Wes Anderson live-action movie circa 2008 —ar 3:2.”
For other characters, it helped to add the time of day, which direction they were facing, and any props to include. Here’s what got me my final Legolas: “Owen Wilson as Legolas the elf, portrait, full body, cinematic, holding a bow and arrow, symmetrical, facing forward, film still, exterior shot, daytime, in the style of a Wes Anderson live-action movie circa 2008 —ar 3:2.”
I repeated these steps for all the other characters mentioned in the scene (I also added the other three hobbits in the fellowship, along with Brad Pitt as Boromir, which felt apt for an Anderson adaptation). I particularly enjoyed the results of the prompt in which I cast Tony Revolori as Peregrin Took:
Next, I created voices for the two speaking characters in the scene, Frodo and Gandalf, using ElevenLabs (prices start at $5 per month), which clones a sample of an existing voice that you can then make say whatever you want (no need for me to explain all the ways this particular tool could be misused, but I digress). I needed clips where there was zero background noise and you could clearly hear the speaker, so for Gandalf, I found a clip of a young Ian McKellen delivering the “Tomorrow, and Tomorrow, and Tomorrow” speech from MacBeth that worked well, although the AI randomly got rid of his English accent. I typed his lines into the prompt and then recorded the fake Ian McKellen saying what I wanted him to say, and repeated the process for Elijah Wood as Frodo.
Then it was time to animate each character and make it appear as though they were actually speaking. To do so, I uploaded each character image from Midjourney into D-ID AI (pricing starts at $4.99 per month), where you can either type out a script for each character to say or upload an existing sound bite. I did the latter for Frodo and Gandalf, and for the other characters who didn’t have speaking roles but still needed to look, y’know, alive, I inserted a series of “pauses” into their speech box. The result was basically just the characters blinking and moving their heads around a bit.
Once I had all my clips, I edited them together in CapCut (free), because as far as I’m aware, there isn’t currently an AI that takes a bunch of clips and then splices them into something that makes sense. CapCut is by far the most instinctual (but still pretty serious) video editor I’ve used, and the full edit took me about two hours. I added a music backtrack from CapCut’s library labeled “Wes Anderson-esque Unique Suspenseful Orchestra” (unclear whether it was AI- or human-generated), and voila!
Behold, the final video:
Fair warning: It’s really bad. Like, bad in a way that makes me pretty confident that the world of on-demand bizarro fanfic is far away from being something that we actually need to worry about. It also took significantly more effort than simply typing some words into a box and getting a fully real-seeming cinematic scene, and I still used a considerable amount of my own (again, limited) artistic instinct to make certain judgment calls, so it’s not as if the whole thing was a robot’s doing.
It’s possible, however, that we’re not far away from a robot being able to make “Wes Anderson’s The Lord of the Rings” or something much better. It’s not improbable, for instance, that the tools provided by companies like Midjourney, Eleven Labs, and D-ID could all be integrated into a single system. The startup Runway is also a leader in the text-to-video race, where prompts like “a shot following a hiker through jungle brush” or “a cow at a birthday party” can generate corresponding video clips. While the clips shared by the company so far have been short and quite pixelated, The Verge called the prospect of Runway’s text-to-video AI “intoxicating — promising both new creative opportunities and new threats for misinformation.” The company plans to roll out beta access to a small group of testers this week.
There’s also ModelScope, which is free to use and promises the same thing, but when I tried the prompt “Frodo Baggins in a Wes Anderson movie” it presented me with maybe the most horrific gif I’ve ever seen. As to why there’s a fake Shutterstock logo on it, I could not even begin to guess.
While this was a fun experiment and I’m genuinely looking forward to seeing some truly weird AI-generated fanfic content from people who live on the internet, it’s also impossible to talk about without considering the ramifications of a world in which anyone can summon convincing videos of whatever they want. We don’t know what will happen to the value of creative labor nor to the impossible-to-quantify worth of the human hand in art, opening us up to ideas that AI can only provide a simulacrum of. We don’t know what will happen to people whose livelihoods, both financially and psychically, depend on creating art for others that can easily be replicated by these tools.
But we have a pretty good guess. Illustrators are already furious with AI tools that have stolen, mimicked, and devalued their work. “There’s already a negative bias towards the creative industry. Something like this reinforces an argument that what we do is easy and we shouldn’t be able to earn the money we command,” one artist told the Guardian. The Writers Guild is currently pushing to ban AI-generated work in its next contract, underlining the need to safeguard artists from potentially career-destroying tools not only by evolving cultural norms, but with policy.
It’s going to be a wild few months, and hopefully we’ll get to see more Balenciaga Harry Potters — fun, inventive videos meant for little else than silliness — than creepily realistic images of public figures wearing expensive puffer jackets that send the entire media apparatus into an absolute tailspin.
This column was first published in the Vox Culture newsletter. Sign up here so you don’t miss the next one, plus get newsletter exclusives.