Over the past couple of weeks, two court rulings have favored tech giants over content creators and authors, setting an important precedent that establishes, at least in some circumstances, that tech firms using large amounts of text to train their AI models are engaging in fair use of the material.
First, a US judge ruled that Anthropic's use of copyrighted books to train its Claude chatbot without author permission did not breach copyright law, comparing the AI model's learning process to "a reader aspiring to be a writer." The following day, Meta secured a favorable ruling when a San Francisco judge determined that the authors who filed the lawsuit had failed to prove the company's AI would cause "market dilution."
Still, content creators continue to fight back against what has increasingly been seen as a large-scale theft, with tech companies unjustly enriching themselves at the expense of publishers and creators. The same day Meta obtained its favorable ruling, a group of authors filed a lawsuit against Microsoft, claiming the company had engaged in copyright infringement with its Megatron text generator. Similarly, Disney and NBCUniversal recently sued Midjourney over alleged unauthorized use of iconic characters, including Darth Vader and the Simpson family.
In addition to mounting legal challenges, there are high-profile lawsuits that remain ongoing: major record labels Sony, Universal, and Warner are pursuing cases against AI music generators Suno and Udio, and The New York Times has not relented in its accusations against OpenAI and Microsoft.
A particularly striking detail emerged from the Anthropic case: after amassing around 7 million pirated books, Anthropic realized its illicit digital library could become a legal liability. To mitigate the risks, the startup resorted to purchasing physical copies (often in bulk), using destructive methods to digitize the purchased tomes as fast as possible, and then destroying them entirely. The fact that Anthropic carelessly chopped up millions of tomes contrasts starkly with the historic efforts of universities and other entities to engage in non-destructive digitizing processes that enable the preservation of physical copies.
Legal experts suggest that rulings on text-based AI may not necessarily apply to image, video, or audio cases, as different media types involve distinct fair-use considerations. In cases involving visual media and music, it has been easier to prove that models are prone to regurgitate pieces identical to their training data, as well as to replicate this phenomenon. However, the current rulings will hardly go unnoticed, and may have some influence on future decisions, even in cases involving media other than text.
Comments