
Navigating the Generative AI and Copyright Debate: Key Court Rulings
Main
On the recent decisions by the Court in the Northern District of California on the copyright vs. Gen AI debate, Akshat Agrawal highlights the conflicting findings of the judges on fair use in this two-part post. In Part I, he examines the disagreement between the judges on the ‘lawfully acquired first copy’ requirement under the fair use exception, from the lens of the Indian Copyright law. Part II will discuss the findings on the market dilution theory. Akshat is a practicing litigator working at Saikrishna and Associates. He did his LLM from Berkeley Law in 2023 specialising in IP and Tech law. His previous posts can be found here. He adds the following disclaimer: After some discussion around an earlier draft and an admitted history of verbosity, I would also like to acknowledge the usage of Claude.ai for helping me re-frame the draft more succinctly and in a reader friendly manner. Views expressed here are personal. [Long post ahead]
By Akshat Agrawal
Last week, Judge Alsup and Judge Chhabria, both from the Northern District of California, US, delivered two distinct—and contradictory—decisions in the first substantive rulings on the Generative AI x Copyright debate: Andrea Bartz v. Anthropic and Richard Kadrey v. Meta, respectively. Both decisions address the “fair use” defense to infringement, as this was the sole issue raised for summary judgment. While both courts agreed that AI training is “highly transformative” (first factor of the fair use analyses) and therefore potentially covered by fair use, they split sharply on the “lawful first copy” requirement. This disagreement has significant implications for how Indian courts might approach similar cases, particularly given our research exemptions under Section 52 and the ongoing litigation between ANI and OpenAI. Part 1 of this two part post examines this disagreement. Part 2 will tackle the second disagreement: market dilution theory.
In the Anthropic case, Anthropic initially scraped LibGen for Book3 (a dataset containing 196,640 books in plain text format which is used for training language models) and similar datasets. These books were acquired without lawful purchase, since books on LibGen are made available without authorization. The company reportedly created a “general purpose library” using these books. However, after receiving legal advice, Anthropic ultimately chose not to use these materials for training its LLMs, opting instead to use digitized versions of legally purchased second-hand copies. Thus, all materials used throughout the training process were lawfully acquired second-hand copies of books. It continued with maintaining its “general purpose library”, though, for some future use.
In contrast, in Meta, Meta also scraped LibGen and other shadow libraries. However, unlike Anthropic, it proceeded to use these copies at every stage of model training, without any lawful purchase of the materials at any point.
Both courts accepted the fair use defense and held that training is exempted under the fair use doctrine, under their respective factual circumstances, however, their reasoning differs. Before examining these contradictions and their implications for Indian law, it’s worth noting where the courts agree:
Despite these points of agreement, the courts diverged significantly on two main aspects, the first being the subject matter of this post:
Judge Alsup holds Anthropic’s use to be highly transformative (i.e., involving a further purpose from any normal user of the book, and a different character of use than the normal course of use – to generate content of a different character) and therefore fair use, however, only insofar as a lawfully acquired first copy was used for making further copies during training. This reasoning leads to the second part of the decision, where Judge Alsup holds that creating copies of copyrighted works and storing them as a “general purpose library” from a shadow library cannot constitute fair use, regardless of whether the copyrighted work is ever exposed to any human being by the user. The Judge reasons that claiming a research purpose to copy textbooks from shadow libraries (which the Judge terms “pirate websites,” though I intentionally avoid this phrase) would destroy academic publishing markets.
Crucially, Anthropic retained shadow library copies as a central library of all books in the world even after deciding never to use them for LLM training again, distinguishing these retained copies from those actually used for training. The Court holds that these retained shadow library copies displaced demand for authors’ books copy-for-copy, emphasizing that fair use doesn’t entitle someone to steal copies merely to make their use more convenient or cost-effective.
Therefore, although the decision does not definitively rule on training using works downloaded from shadow libraries—as this wasn’t the factual scenario before the court—Judge Alsup does articulate his theory in obiter (without needing to): if training were conducted using copies acquired from shadow libraries rather than lawfully purchased copies, the transformative nature of the training would not legalize such training, since the copy used for training would itself be illegal. The Judge finds downloading works from shadow libraries to be per se infringing.
The Meta Court reaches an entirely opposite conclusion. Judge Chhabria states that “to say that Meta’s downloading was ‘piracy’ and thus cannot be fair use begs the question because the whole point of fair use analysis is to determine whether a given act of copying was unlawful.” Moreover, as Meta’s use of shadow libraries didn’t provide any advertising revenue to those libraries, this factor wouldn’t impact the transformative nature of the use. In direct contrast to the Anthropic decision, the Meta Court holds that downloading works from shadow libraries must be considered in light of its ultimate, highly transformative purpose: training an AI model. The Court found that because Meta’s use of the books to train Generative AI was transformative (further purpose and different character of the use), so too was its act of downloading them- classic ends justifying the means analysis!
This raises an important question: What would be the outcome under Indian law? Does Section 52 of the Indian Copyright Act impose a “lawful first copy” requirement?
Section 52(1)(a) of the Copyright Act permits “fair dealing” for research, personal use, news reporting, and certain other purposes. The Explanation to this section states that storage in an electronic medium of a copy in the process of such uses—including incidental storage of a computer programme that is not itself an infringing copy—for the purpose of such uses, is non-infringing.
In this context, Nikhil Narendran has argued that the lawful first copy requirement is good policy. Even in ANI v. Open AI, ANI has argued that the lawful first copy requirement is embedded within this Explanation across all works, meaning Section 52(1)(a) applies only when the first copy is lawfully possessed. In other words, “fair” distribution, communication, or reproduction of a lawfully acquired first copy of a work for research, news reporting, review, personal use, and similar purposes would be non-infringing. This interpretation aligns with Judge Alsup’s holding in Anthropic that training is non-infringing only when the first copy used was lawfully acquired.
However, I find myself agreeing with Judge Chhabria on this issue, for two distinct reasons:
To further understand why, let’s follow Judge Alsup’s reasoning to its logical conclusion in the context of Indian law (this is not an analogy but a possible adverse spillover effect of adopting Judge Alsup’s reasoning):
If lawful possession of a first copy is indeed required before dealing for purposes mentioned under Section 52(1)(a) [apart from computer programmes, as there is a specific exclusive right to commercial rental and sale of a software under Section 14(b)), researchers would be unable to access certain works for research purposes if no library which the said researcher has access to chooses to acquire or purchase a lawful copy. Consequently, the research exemption under Section 52(1)(a) would be limited to those books or journal articles either legally purchased by a library or by the researcher herself—and we all know how prohibitive those prices can be. Such an interpretation, which would make acquisition of a lawful copy by someone a prerequisite for the research exemption under Section 52(1)(a) to apply, would devastate the position of researchers (particularly independent researchers) and make the exemption redundant. This surely couldn’t have been the intent. The point and purpose of the exemption is to ensure that if acquiring or making the first copy is for purposes exempted by the Act, any dealing with that work, so long as it is fair, would be permitted. Research is one such purpose.
Would this same logical conclusion apply to copies acquired for training Gen–AI models? I see no reason why not. The requirement to lawfully purchase all books (first copy) for training a model—which training is transformative use (as Judge Alsup acknowledges)—would prohibitively restrict the development of transformative technological models like LLMs and would render even transformative uses infringing. This approach was expressly rejected by Indian Courts in the Guidebooks case and the Barbara Taylor Bradford case, where the reproduction right and the adaptation right were, respectively, refused from covering uses that had a transformative purpose or transformed content. In any case, requiring the purchase of legal copies for every book used in transformative AI training would create prohibitively high entry barriers. This would worsen the concentration of AI development among only the wealthiest companies (the ‘1%’), reducing competition and undermining the democratic potential of this technology to be developed and controlled by a broader range of actors. Similar to who gets to research (ones who have access to a library that has a lawfully purchased copy) – is the question – who gets to develop an LLM, and research/extract non-expressive elements out of copyrighted works in respect thereof.
Therefore, even if a copy is acquired unlawfully, its acquisition and use for a transformative purpose ought not be infringing. (For instance, downloading a movie from uTorrent for personal viewing or downloading is not an infringing act; rather, the hosting website making an unlawful copy available for expressive consumption would be the appropriate target of infringement actions.)
Notably, the holding in Anthropic would not directly apply in the Indian context, as while LibGen has been determined to be an infringing website in the US, this issue remains pending consideration in India in the Scihub case.
Interestingly, the lawful copy requirement articulated by Judge Alsup in Anthropic, even if considered correct, doesn’t significantly affect OpenAI’s case. Although ANI has argued that OpenAI needed a first copy license from them, ANI’s content was already publicly available and accessed through scraping (allowing them or their subscribers to earn advertising revenue at the first access stage). Moreover, the works weren’t behind any paywall that was circumvented. Hence, this wasn’t a copy of a work that was unlawfully accessed at the first instance.
In fact, both judgments support OpenAI’s argument that its use is transformative and therefore beyond the scope of Section 14 (meaning of copyright), as held in the Guidebooks the Barbara Taylor Bradford cases, where the reproduction and the adaptation right were, respectively, refused extension to uses that had a transformative purpose or resulted in transformed content. However, if infringement is found at the output stage—which seems unlikely given that no evidence of substantial similarity in expression at the output end was produced—the answer could differ. Both Judge Alsup and Judge Chhabria declined to rule on this issue, as no more than 50 words of the plaintiffs’ books were found to be verbatim copied.
Stay tuned for Part 2 where I discuss the second point of dissonance: the theory of Market Dilution!