**Comedian Sarah Silverman Sues ChatGPT-maker OpenAI for Copyright Infringement**
Comedian Sarah Silverman recently filed a lawsuit against OpenAI, the creator of the artificial intelligence (AI) chatbot ChatGPT, for copyright infringement. Silverman alleges that OpenAI used her memoir, “The Bedwetter,” without her consent or compensation to train its AI models. The lawsuit raises concerns about the ethical and legal issues surrounding the use of valuable data in training AI models, as well as the practices of the machine learning industry as a whole. This case is part of a growing number of lawsuits from authors who claim that they unknowingly provided the foundation for Silicon Valley’s AI boom.
**The Impact of OpenAI and Competitors on the AI Boom**
OpenAI and other companies in the AI industry have developed products like ChatGPT that utilize “generative AI” to create new text, images, and music. These products have gained substantial popularity and are projected to contribute billions of dollars to the global economy. However, the methods used to train these AI models remain a secretive topic for OpenAI and its competitors. The current legal cases against OpenAI and other companies shed light on the data sources these companies use, which may include unauthorized copies of copyrighted works.
**The Hidden Practice of Using Illicit Book Data in Machine Learning**
The use of illicit book data in the machine learning industry is a widespread but largely unknown practice. Matthew Butterick, one of the lawyers representing Silverman and other authors in a potential class-action case, describes it as an “open, dirty secret” of the industry. Companies often obtain book data from shadow libraries that house pirated content. Books are highly valued for training AI models due to their well-edited and coherent nature. However, this raised the issue of using copyrighted material without obtaining proper consent or compensation.
**Legal Precedents and Questions of Legitimacy**
The outcome of this lawsuit may have far-reaching implications for writers and the AI industry as a whole. Previous legal challenges against companies like Google, which digitized books without obtaining permission, were largely unsuccessful. Legal experts speculate that OpenAI’s actions may fall within the same legal boundaries, considering the similarities between OpenAI’s use of books and Google’s use of Google Books. However, the mounting pressure from authors and the call for compensation for their work may lead to changes in the industry’s practices and regulations.
**Authors Unite to Address Exploitative Practices**
The concerns raised by authors regarding the AI-building practices of tech companies have gained traction within literary and artist communities. Prominent authors, including Nora Roberts, Margaret Atwood, Louise Erdrich, and Jodi Picoult, signed an open letter to the CEOs of OpenAI, Google, Microsoft, Meta, and other AI developers. The letter accuses these companies of exploitative practices in building chatbots that mimic and regurgitate authors’ language, style, and ideas without proper compensation. Authors argue that their writings serve as the “food” for AI systems and that they deserve appropriate compensation for their contribution.
**The Importance of Books in Training AI Models**
Books have played a crucial role in training large language models like ChatGPT. OpenAI’s early language model, GPT-1, utilized a dataset called the Toronto Book Corpus, containing thousands of unpublished books. This dataset was integral to teaching the model to condition on long-range information within the text. However, as AI developers became more secretive about their data sources, the use of pirated content from shadow libraries became more prevalent. To develop high-quality language models, access to well-written books is essential.
**Potential Implications and the Need for Compensation**
The lawsuit filed by Sarah Silverman and other authors against OpenAI does not necessarily seek to dismantle AI algorithms or erase training data. Instead, authors believe that some form of compensation is necessary for the use of their works. The Authors Guild organized an open letter signed by over 4,000 writers, demanding fair compensation for their writings. While there may be challenges ahead for the authors in winning this case, the involvement of tech executives and possible testimony under oath may bring more visibility to the practices surrounding the use of book data in training AI models.
GIPHY App Key not set. Please check settings