Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Zuckerberg turns to YouTube for defense in AI copyright case


Meta CEO Mark Zuckerberg used YouTube and its fight to remove pirated content to defend his company’s use of a dataset of copyrighted e-books to train artificial intelligence models.

The deposition, which is part of a complaint filed by plaintiffs’ attorneys with the court, is related to the AI ​​copyright case. Kadrey v. Meta. This is one of many such cases moving through the US court system, pitting AI companies against authors and other IP owners. In most cases, the defendants in these cases—the AI ​​companies—argue that training on copyrighted content is “fair use.” Many copyright holders disagree.

“YouTube, for example, I think may have hosted some things that people pirated for a while, but YouTube is trying to take that stuff down,” Zuckerberg said during the deposition. parts of the transcript was introduced on Wednesday night. “And the vast majority of the stuff on YouTube, I would assume, is kind of good, and they have a license to do that.”

Excerpts from Zuckerberg’s testimony provide some clues about Zuckerberg’s thinking about copyrighting content and fair use. However, it should be noted that the full transcript of the deposition has not been released. TechCrunch has reached out to Meta for additional context and will update the article if the company responds.

Based on the deposition transcripts, Zuckerberg argues that Meta used a collection of e-books called LibGen to develop a family of artificial intelligence models known as Llama. Meta’s Llama competes with advanced models from AI companies such as OpenAI.

Self-described as a “links aggregator,” LibGen provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued multiple times, ordered shut down, and fined tens of millions of dollars for copyright infringement.

According to court documents unsealed this week, Zuckerberg canceled the use of LibGen to train at least one of Meta’s Llama models, despite concerns about legal ramifications from the company’s AI chief and research teams.

A lawyer for the plaintiffs, who include best-selling authors Sarah Silverman and Ta-Nehisi Coates, noted that Meta employees referred to LibGen as “a data set that we know is pirated” and that its use could undermine “(Meta’s) negotiating position with regulators.” According to legal documentation,

At the extreme, Zuckerberg claimed he had “not really heard” of LibGen.

“I understand that you are forcing me to say something about LibGen that I have never heard,” Zuckerberg said during the deposition. “It’s just that I don’t know about this particular thing.”

Under questioning from David Boies, one of the plaintiffs’ attorneys, Zuckerberg explained why it would be unreasonable to ban the use of a dataset like LibGen.

“So I’d like to have a policy against people using YouTube because some content is copyrighted? No,” he said. “(T)here are circumstances where having such a prohibition may not be the right thing to do.”

Zuckerberg noted that Meta needs to be “very careful” when it comes to training on copyrighted material.

“You know, (if there’s) someone who’s providing a website and they’re intentionally trying to infringe on people’s rights … obviously that’s something we want to be cautious or careful about how we deal with it, or maybe have our teams deal with it,” Zuckerberg said during his testimony.

New claims

Attorneys for the plaintiffs in Kadrey v. Meta filed several amendments to the complaint since it was filed in 2023 in the Northern District of California, San Francisco Division. New allegations against Meta, including the company’s cross-referencing of some pirated books with copyrighted books on LibGen. license. The lawyers claim that Meta used this tactic to determine whether it made sense to enter into a license agreement with the publisher.

Meta allegedly used LibGen to develop Llama 3, the latest family of Llama models, according to the modified documents. The plaintiffs also allege that Meta used the data set to develop the next-generation Llama 4 models.

According to the redacted documentation, Meta researchers allegedly tried to hide the fact that Llama models were trained on copyrighted material. And Meta downloaded pirated e-books for Llama’s education from another source, Z-Library, in April 2024, the amended complaint alleges.

Z-Library, or Z-Lib, has been the subject of a number of legal actions by publishers, including domain seizures and cancellations. In 2022, the Russian nationals who allegedly protected it were charged with copyright infringement, wire fraud and money laundering.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *