Google has quietly tapped its vast collection of YouTube videos to feed its artificial intelligence engines, using footage posted by millions to train cutting edge products like artificial intelligence engines, Gemini and Veo 3.
The company has amassed more than 20 billion videos, though it insists it draws from only a portion of this giant trove when training its models, honoring certain deals with creators and media companies along the way.
A spokesperson for YouTube said, “We’ve always used YouTube content to make our products better, and this hasn’t changed with the advent of AI” while emphasizing the platform’s efforts to let creators protect their identity from unwanted use in AI.
Still, the scale of the training effort stands out. Even if a small portion of YouTube’s library is used, experts estimate the models are learning from thousands of times more material than rival services.
The platform’s terms keep users from opting out of AI training. Whenever someone uploads a video, they automatically grant YouTube a global license to use that content, royalty free, for such purposes.
YouTube Creators in the Dark on AI Training
Despite YouTube confirming the general practice publicly, most creators and media industry professionals appear to be unaware that their work may be helping to build the next generation of AI video generators.
Luke Arrigoni, chief executive of Loti, which works on digital identity protection, voiced frustration over the practice, saying, “It’s plausible that they’re taking data from a lot of creators that have spent a lot of time and energy and their own thought to put into these videos. It’s helping the Veo 3 model make a synthetic version, a poor facsimile, of these creators.”
Google showcased Veo 3 just this May, sending ripples through the creative world as entirely AI generated sequences with lifelike visuals and audio made their debut onstage.
Some creators, including Sam Beres, who has 10 million subscribers, see the rise of AI as inevitable and even exciting. “I try to treat it as friendly competition more so than these are adversaries,” he said.
But experts point out a downside for those who supply the training material. AI generated content is increasingly able to compete with or even supplant the original creators, often without transparency, attribution, or compensation.
Detection tools like Vermillio’s Trace ID are already picking up close similarities between AI generated videos and existing YouTube content. In one case, a clip from Brodie Moss was scored at 71 for similarity, with its audio alone scoring above 90.
Other creators are left with few choices. The only available opt out tools cover third party companies, not Google itself. Even YouTube’s takedown system for likeness abuse has proven unreliable for some.
With legal action mounting in the entertainment world and lawmakers voicing concerns over the loss of control for artists, the tension around artificial intelligence and content rights on YouTube is only likely to escalate.