When the creators of AI chatbots such as ChatGPT or Google's Bard want to 'teach' them about the world in order to accurately answer questions, they use various news and social media sources to 'scrape' content which then 'trains' said models. This is why Twitter owner Elon Musk just began throttling accounts which consume a massive amount of content on a daily basis - as it became a free and valuable resource for engineers.
It's also why AI can be 'woke' - as it all depends on the data it's being trained on, which as we've seen, can bias the chatbot towards the political ideology of its creators (for which loopholes were quickly discovered).
Now, the Financial Times reports that the world's largest tech companies are negotiation with major media outlets to strike landmark deals for the use of news content to train AI chatbots.
These people said that publishers including News Corp, Axel Springer, The New York Times and The Guardian have each been in discussions with at least one of the tech companies.
Those involved in the discussions, which remain in the early stages, added that the deals could involve media organisations being paid a subscription-style fee for their content in order to develop the technology underpinning chatbots such as OpenAI’s ChatGPT and Google’s Bard.
The talks come as media groups express concern over the threat to the industry posed by the rise of AI, as well as fears over the use of their content by OpenAI and Google without deals in place. Some companies such as Stability AI and OpenAI are facing legal action from artists, photo agencies and coders, who allege contractual and copyright infringement. -FT
According to News Corp CEO Robert Thomson, the media industry's "collective IP is under threat," for which news outlets should "argue vociferously for compensation."
In short - use their content to train your AI without paying, get sued.
Current discussions have revolved around a pricing model in the $5 million to $20 million per year level, according to one industry exec.
According to Thomson, AI was "designed so the reader will never visit a journalism website, thus fatally undermining that journalism."
The negotiations, if successful, would establish a blueprint for news organizations dealing with generative AI companies worldwide.
"Copyright is a crucial issue for all publishers," said the FT, which is also in negotiations over the matter. "As a subscriptions business, we need to protect the value of our journalism and our business model. Engaging in constructive dialogue with the relevant companies, as we are, is the best way to achieve that."
According to the report, media industry executives want to avoid the pitfalls of the early internet, when they undermined their own business models by giving away so much news for free, while Big Tech companies such as Google and Facebook then accessed that information to grow their multibillion-dollar advertising platforms.
Google recently announced an AI search option, which provides users with an information box above its traditional list of web links. The company has been leading the negotiations with UK news outlets, Guardian and NewsUK - two of many such outlets that parent company Alphabet has existing relationships with.
According to Mathias Döpfner, CEO of Politico-owned Axel Springer, the industry should create a "quantitative" model similar to that used by the music industry to allow nightclubs and streaming services to pay record labels each time a track is played. This would require AI companies to agree to disclose internal metrics on media content usage, which they don't currently do.
"We need an industry-wide solution," said Döpfner, adding "We have to work together on this."
Döpfner, whose Berlin-based media company also owns the German tabloid Bild and the broadsheet Die Welt, said an annual agreement for unlimited use of a media company’s content would be a “second best option”, because that model would be harder for small regional or local news outlets to take advantage of. -FT
"Google has put a licensing deal on the table," said one executive at a newspaper group. "They have accepted the principle that there needs to be payment . . . but we have not got to the point of talking zeros. They have acknowledged that there is a money conversation that we need to have over the next few months, which is the first step."
That said, Google called the report over a potential licensing deal 'not accurate,' adding that it's "very early days and we’re continuing to work with the ecosystem, including news publishers, to get their input."
According to Google, they're in "ongoing conversations" with news outlets, both large and small, in the US, UK and Europe, while it's Bard AI is being trained on "publicly available information," which could include paywalled websites.
Developing a financial model will likely be extremely difficult according to publishing leaders. Senior executives at one major publisher said that the news industry was 'working retroactively' because tech companies had launched these products - which scrape their content - without a heads up.
"There was no discussion, and so now we have to try to get paid after it happened," said one executive. "The way they launched these products, the total secrecy, the fact that there is zero transparency, no communication before it happened, there’s reasons to be pretty pessimistic."