A spokeswoman for YouTube referred newtechmania to a previous comment that said that doing so would constitute a “clear violation” of the platform’s terms of service.
It has been alleged that the artificial intelligence startup Runway grabbed “thousands” of films from YouTube and copyrighted versions of movies without obtaining permission. An artificial intelligence video-generating startup allegedly obtained internal spreadsheets that reveal the company trained its Gen-3 model by leveraging YouTube content from channels such as Disney, Netflix, Pixar, and other major media outlets. 404 Media obtained these files.
During an interview with the publication, an alleged former employee of Runway stated that the company utilized the spreadsheet to mark lists of movies that it desired to include in its database. Through the use of open-source proxy software, it would then download them without being discovered, thereby concealing its trail. On a single sheet, simple keywords such as “astronaut,” “fairy,” and “rainbow” are included, and footnotes indicate whether or not the company had discovered high-quality films that match to those keywords for training purposes. An example of this would be the word “superhero,” which includes the comment that reads, “Lots of movie clips.” (That is correct.)
There are further notes that show The YouTube channels for Unreal Engine, the director Josh Neuman, and a Call of Duty fan page were identified by Runway as being excellent resources for “high movement” training videos.
In an interview with 404 Media, the former worker stated that the channels contained in the spreadsheet were the result of a company-wide effort to locate movies of high quality with which to construct the model. “After that, this was used as input to a massive web crawler, which then downloaded all of the videos from all of those channels, making use of proxies in order to avoid being blocked by Google.”
The Monterey Bay Aquarium, CBS New York, AMC Theaters, Pixar, Disney Plus, and Disney CD were among the “recommended channels” that were highlighted on a list of over 4,000 YouTube channels that was produced in one of the spreadsheets. (Because there is no AI model that truly is complete without otters.)
It has also been revealed that Runway has developed a second list of videos that were obtained from piracy websites. An unofficial online archive of Studio Ghibli films, anime and movie pirate websites, a fan site that displays Xbox gaming videos, and the animation streaming website kisscartoon.sh are some of the sources that are linked to in a spreadsheet that is titled “Non-YouTube Source.” The spreadsheet contains fourteen connections to online sources.
In what could be interpreted as a damning evidence that the corporation used the training data, 404 Media discovered that activating the video generator with the names of popular YouTubers included in the spreadsheet resulted in outcomes that bore an eerie resemblance to the names of the YouTubers. Importantly, when Runway’s older Gen-2 model, which was trained before the purported data in the spreadsheets, was used to enter the same names, it produced “unrelated” results, such as generic males wearing suits. As an additional point of interest, the artificial intelligence program ceased producing the likenesses of the YouTubers entirely after the publication contacted Runway and inquired about their appearance in the results.
“I hope that by sharing this information, people will have a better understanding of the scale of these companies and what they are doing to make ‘cool’ videos,” the former employee said in an interview with 404 Media.
In response to newtechmania’s request for comment, a representative from YouTube referred the publication to an interview that Google’s CEO Neal Mohan had given to Bloomberg in April. Within the context of that discussion, Mohan referred to the training on its videos as a “clear violation” of the terms of service. According to a statement that YouTube spokesperson Jack Mason sent to newtechmania, “Our previous comments on this continues to stand.”
During the time that this article was being published, Runway had not responded to a request for commentary.
It would appear that at least some artificial intelligence businesses are engaged in a race to mainstream their tools and gain market leadership before users and judges become aware of how their sausage was originally prepared. It is one thing to receive training with authorization through licensed deals; this is another strategy that businesses such as OpenAI have lately included into their operations. It is a considerably more dubious idea, if not criminal, to consider the entirety of the internet, including all of the content that is protected by intellectual property rights, to be up for grabs in a frenetic competition for profit and domination.