Authors sue OpenAI, allege ChatGPT was trained on their books

This illustration photo taken in Krakow, Poland on June 8, 2023 shows the OpenAI logo on the website and ChatGPT on the AppStore on a mobile phone screen.

Jakub Bolzycki | Nour Photos | Getty Images

two authors file a lawsuit be opposed to open artificial intelligence Last week it was charged that their copyrighted book was used to train the company’s artificial intelligence chatbot, ChatGPT, without their consent.

Paul Tremblay, author of The Cabin at the End of the World, and Mona Awad, author of Rabbit and 13 Ways to Look at Fat Girls, claim that ChatGPT generates ” Very accurate summary” according to the complaint, works. These summaries, they claim, are “only possible if ChatGPT has been trained on their books,” which would violate copyright law.

OpenAI did not immediately respond to CNBC’s request for comment. Attorneys for Tremblay and Awad did not immediately respond.

ChatGPT automatically generates text based on written prompts in a more advanced and innovative way than Silicon Valley’s chatbots of the past. The technology was developed by San Francisco-based OpenAI, a Microsoft-backed research firm led by Sam Altman.

The chatbot was trained on a large amount of text data. OpenAI did not disclose the precise data used to train ChatGPT, but the company explain It typically crawls the web, including using archived books and Wikipedia.

The lawsuit, filed in San Francisco federal court, alleges that “a substantial portion” of the material in OpenAI’s training data is based on copyrighted material, including books by Tremblay and Awad. But it may be a challenge to prove exactly how and where ChatGPT collected this information, and whether the authors suffered financial losses.

Complaint reference Exhibits Part of the summary generated by ChatGPT, which states that the chatbot made some errors. However, Awad and Tremblay claim the rest of the summaries are accurate, meaning that “ChatGPT retains knowledge of specific works in the training dataset.”

“ChatGPT never reproduced any copyright management information that Plaintiff included in its published works,” the complaint states.

Svlook