A legal battle between OpenAI and The New York Times over materials used to train its artificial intelligence models may still be brewing. But OpenAI is pushing for deals with other publishers, including some of the largest news publishers in France and Spain.
OpenAI on Wednesday announced contracts with Le Monde and Prisa Media to bring French and Spanish news content to OpenAI’s ChatGPT chatbot. OpenAI said in a blog post that the partnership will bring meaningful content from these organizations’ current affairs coverage (from brands such as El País, Cinco Días, As and El Huffpost) to ChatGPT users and provide opportunities for OpenAI’s long-term development. make a contribution. – Expand the amount of training data.
OpenAI wrote:
In the coming months, ChatGPT users will be able to interact with relevant news content from these publishers via featured snippets with sources and enhanced links to the original articles, allowing users to access other news content from their news sites. Information or related articles… Continuously improve ChatGPT and support the important role journalism plays in providing users with timely, authoritative information.
As a result, OpenAI has now revealed licensing deals with a handful of content providers. Now I feel like this is a good opportunity to take stock:
- Stock media library Shutterstock (for images, videos and music training materials)
- Associated Press
- Axel Springer (owner of Politico and Business Insider, among others)
- world
- medium sprint
How much does OpenAI pay per person? Well, it doesn’t say that – at least not publicly. But we can estimate.
The Information reported in January that OpenAI offered publishers between $1 million and $5 million a year to access archives to train its GenAI models. That doesn’t tell us much about the Shutterstock partnership. But in terms of article licensing — assuming The Information’s reporting is accurate and those numbers haven’t changed since then — OpenAI spends between $4 million and $20 million a year on journalism.
That may be chump change for OpenAI, which has more than $11 billion in funding and annualized revenue that recently topped $2 billion (according to the Financial Times). But as Hunter Walk, a partner at Homebrew and co-founder of Screendoor, recently mused, it’s big enough to have the potential to outpace AI rivals also seeking licensing deals.
Walker wrote on his blog:
[I]If experimentation is constrained by nine-figure licensing deals, we harm innovation… Checks on the “owners” of training materials are creating huge barriers to entry for challengers. If Google, OpenAI, and other big tech companies can set costs high enough, they implicitly prevent future competition.
Now, it’s debatable whether there are any current barriers to entry. Many, if not most, AI vendors choose to risk the wrath of intellectual property rights holders by not licensing the materials they use to train their AI models. For example, there is evidence that art generation platform Midjourney, which does not have any agreement with Disney, is training on Disney movie stills.
The thornier question is: should licensing just be the cost of doing business and experimenting in artificial intelligence?
Walker doesn’t think so. He advocates for regulators to impose a “safe harbor” to protect any AI vendors, as well as smaller startups and researchers, from legal liability as long as they adhere to certain standards of transparency and ethics.
Interestingly, the UK has recently attempted to codify something along these lines, excluding the use of text and data mining for artificial intelligence training from copyright considerations as long as it is for research purposes. But these efforts ultimately failed.
I’m not sure I would accept his “safe harbor” proposal, given the impact of artificial intelligence on an already unstable journalism industry. A recent model by The Atlantic found that if a search engine like Google integrated artificial intelligence into searches, it would answer user queries 75% of the time without ever having to click through to its website.
but maybe there is yes Reserved space.
Publishers should get paid—and paid fairly.However, is there a consequence where they get paid and challengers to AI incumbents and academics have access to the same data? as those Incumbent? I should think so. Grants are one way. A bigger venture capital check is another.
I can’t say I have a solution, especially since the courts have yet to decide whether and to what extent fair use protects AI vendors from copyright claims. But we have to figure these things out. Otherwise, the industry is likely to end up in a situation where academic “brain drain” continues unabated, with only a few strong companies having access to significant amounts of valuable training.