“Avengers” director Joe Russo and I are increasingly convinced that fully AI-generated movies and TV shows are possible in our lifetimes.
In recent months, OpenAI’s ultra-realistic text-to-speech engine has shown glimpses of this brave new frontier. As for me, Meta’s announcement today brought our AI-generated content future into sharp relief.
The tech giant’s image generation tool, Emu, was updated this morning as Emu Video. Emu Video can make a four-second animated clip from a caption (e.g., “A dog running across a grassy knoll”), image, or photo and description.
The complementary AI model Emu Edit can edit Emu video clips, announced today. User-described changes to Emu Edit, such as “the same clip, but in slow motion,” are reflected in a new video.
Not new: video generation technology. Both Meta and Google have tried it. Also, startups like Runway are building businesses on it.
The 512×512, 16-fps clips from Emu Video are among the most accurate, with my untrained eye finding it difficult to distinguish them from the real thing.
Some, at least. Emu Video seems to excel at animating simple, mostly static scenes like waterfalls and city skyline timelapses in cubism, anime, “paper cut craft” and steampunk styles. One video of the Eiffel Tower at dawn “as a painting,” with its reflection in the Seine, reminded me of an American Greetings e-card.
Even in Emu Video’s best work, AI-generated weirdness emerges, such as skateboards that move parallel to the ground and toes that curl behind feet and legs that blend into each other. Like the birds above the Eiffel Tower in the clip, objects often appear and fade without explanation.
After extensive browsing of Emu Video’s content, I noticed a common pattern: the subjects in the clips lack significant action. Emu Video doesn’t seem to understand action verbs, which may be a limitation of the model’s architecture.
In an Emu Video clip, a cute anthropomorphized racoon holds a guitar but doesn’t strum it, even though the caption says “strum.” Two unicorns will “play” chess by staring at a board without moving the pieces.
Clearly, work is needed. Still, Emu Video’s basic b-roll would fit in a movie or TV show today, and the ethical implications terrify me.
Putting aside deepfakes, I worry about animators and artists who make a living creating scenes AI like Emu Video can now approximate. Meta and its generative AI rivals may argue that Emu Video, which Meta CEO Mark Zuckerberg says is being integrated into Facebook and Instagram (hopefully with better toxicity filters than Meta’s stickers), enhances but does not replace human artists. I think that’s optimistic, if not dishonest, especially when money is involved.
Netflix showed a three-minute animated short with AI-generated backgrounds this year. The company claimed the tech could solve anime’s labor shortage, but ignored how low pay and harsh working conditions are driving artists away.
In a similar controversy, Marvel’s “Secret Invasion” credit sequence studio admitted to using AI, mostly Midjourney, to generate much of the artwork. Series director Ali Selim argued that AI fits the show’s paranoid themes, but most artists and fans disagreed.
Actors could be fired. AI-generated digital likenesses were a major issue in the SAG-AFTRA strike. Studios eventually paid actors for AI-generated likenesses. Might they reconsider as tech improves? It seems likely.
Worse, AI like Emu Video is trained on artists, photographers, and filmmakers’ work without their consent or compensation. Meta only states in a whitepaper accompanying Emu Video that the model was trained on 34 million “video-text pairs” ranging from five to 60 seconds, not where the videos came from, their copyright statuses, or whether Meta licensed them.
(A Meta spokesperson emailed TechCrunch that Emu was trained on “data from licensed partners.”)
There have been intermittent attempts to set industry-wide standards for artists to “opt out” of training or be paid for AI-generated works. As usual, Emu Video shows that technology will soon outpace ethics. Maybe it already has.