Microsoft AI Chief Makes Questionable Claims About Copyright And Online Content

Says web content is 'freeware' for training AI

Mustafa Suleyman, Microsoft's AI CEO, has sparked controversy with bold statements about copyright and the use of online content for training AI systems.

In a recent interview with CNBC, Suleyman said that publicly available web content is essentially "freeware" that can be freely copied and used by anyone for any purpose.

Suleyman defended the practice against accusations of intellectual property theft, arguing that the "social contract" of the internet since the 1990s allows anyone to copy and reuse publicly available content.

"With respect to content that is already on the open web, the social contract of that content since the '90s has been that it is fair use," Suleyman said.

"Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding."

https://twitter.com/tsarnick/status/1805809836854329450?ref\\_src=twsrc%5Etfw

This perspective directly contradicts U.S. copyright law, which grants creators automatic protection for their work.

The FAQ page on the U.S. Copyright Office website states that "your work is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device."

It adds that registration with the Copyright Office is voluntary and is required when "you wish to bring a lawsuit for infringement of a U.S. work."

In the interview, Suleyman did acknowledge that websites with "do not scrape or crawl" instructions in their robots.txt file might exist in a "grey area."

"There's a separate category where a website or a publisher or a news organization had explicitly said, 'do not scrape or crawl me for any other reason than indexing me so that other people can find that content.' That's a gray area and I think that's going to work its way through the courts," Suleyman said.

Several AI companies, including Anthropic, Perplexity, and OpenAI, have been criticized for ignoring robots.txt instructions when gathering data for AI training.

Suleyman took on the role of CEO of Microsoft AI earlier this year, after previously serving as CEO at Inflection AI Inc., which he co-founded in 2022.

In March, Microsoft hired Suleyman, along with Inflection AI co-founder Karén Simonyan and many other staff members from the startup, in a deal reportedly valued at $650 million.

Suleyman's remarks come at a time when the use of creative content for AI training is under increasing scrutiny.

The launch of large language models (LLMs) like ChatGPT in 2022 sparked discussions about whether AI development unfairly appropriates the work of others. Several writers and artists have already filed lawsuits against AI companies for allegedly using their work without permission.

Both Microsoft and OpenAI face accusations of using copyrighted content without permission to train their LLMs. The New York Times sued the companies in December, alleging their AI models accessed paywalled articles and used content from the Times through a massive web archive called Common Crawl.

Similarly, a group of newspapers owned by Alden Global Capital filed suit in April, accusing Microsoft and OpenAI of using their articles without authorization.

Additionally, the Center for Investigative Reporting (CIR), the nonprofit organization behind Mother Jones and Reveal, has filed a lawsuit against tech giants Microsoft and OpenAI, alleging unauthorized use of CIR's copyrighted material to train AI models.

"OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material," said Monika Bauerlein, CEO of the CIR.

"This free rider behavior is not only unfair, it is a violation of copyright. The work of journalists, at CIR and everywhere, is valuable, and OpenAI and Microsoft know it."

This article originally appeared on our sister site, Computing.