Media · AI Labs

Inside the AI labs’ publisher playbook: how OpenAI, Anthropic, Google, and Perplexity are trying to settle the scraping fight.

Four of the most consequential AI labs each approach news publishers with a sharply different posture — from blanket licensing to quiet opt-out compliance to revenue-share experiments. A look at what each is offering, what publishers actually hear back, and where the playbooks are converging.

By Helena Marsh, Media & Platforms Correspondent · London · Published May 23, 2026

30+

Disclosed OpenAI publisher licensing partners as of Q1 2026

$250M

Reported high end of a single multi-year OpenAI news licensing deal

Distinct AI-crawler tokens publishers now manage in robots.txt

The conversation between AI labs and news publishers used to happen in cease-and-desist letters. Two years after the New York Times sued OpenAI, those conversations have moved into conference rooms, partnerships teams, and a steadily expanding set of opt-out tags. The labs do not all want the same thing from publishers, and they are not all offering the same deal — a fact that has begun to define how newsrooms allocate their limited bandwidth for AI policy work.

BookerPost reviewed the public statements, partnership announcements, crawler documentation, and litigation filings from OpenAI, Anthropic, Google, and Perplexity, and spoke with executives at nine publishers and two industry trade bodies about their direct engagement with each lab. Below is what that engagement looks like, lab by lab.

OpenAI: the licensing aggregator

Of the four labs, OpenAI has been the most active counterparty for major publishers, and by a wide margin. Its disclosed list of content partners now spans News Corp, Axel Springer, Vox Media, The Atlantic, the Financial Times, the Associated Press, Le Monde, Condé Nast, Hearst, Time, Dotdash Meredith, the Guardian Media Group, News Corp Australia, El País, Prisa, Politico, Business Insider, and others. Reported values range from low single-digit millions for regional partners to as much as a quarter of a billion dollars over a multi-year term for the largest packages.

The deals usually combine three things: a content license that covers training and citation, a product integration that surfaces the publisher in ChatGPT search responses, and a commercial layer — often a mix of upfront fees, ongoing royalties, and credits for ChatGPT Enterprise.

OpenAI’s technical posture is matched to its commercial one. It documents three distinct crawler user agents — GPTBot for training, OAI-SearchBot for live retrieval into ChatGPT search, and ChatGPT-User for on-demand fetches initiated by a user prompt — and a Media Manager workflow it has been previewing since 2024 for publishers who want fine-grained opt-out by URL or content type. Publishers say the documentation is the clearest of the four labs; whether the controls work as advertised is harder to verify.

“OpenAI is the only lab where we have a named account manager and a renewal cycle. Whether we like the terms or not, it’s a relationship. That’s not true of the others.” — Head of platform partnerships at a European news group

Anthropic: the quieter posture

Anthropic’s public engagement with publishers has been notably more reserved than OpenAI’s. The lab has not announced anything resembling OpenAI’s licensing rollout, and several publishers BookerPost spoke with said they have struggled to find a counterparty inside the company for commercial conversations at all.

What Anthropic has done is invest in crawler hygiene. It documents ClaudeBot for training, Claude-User for prompt-initiated fetches, and Claude-SearchBot for retrieval into Claude’s search features — each with stable IP ranges and a dedicated abuse contact. Publishers describe Anthropic as the lab most likely to actually respect a robots.txt directive without requiring a side agreement.

The legal record has been more contested. Anthropic settled a long-running lawsuit brought by music publishers in 2025 over lyrics that appeared in Claude responses, agreeing to guardrails rather than to broader licensing. A separate copyright case brought by authors over training data remains ongoing. Anthropic announced a 2025 data partnership with Reddit alongside Google and OpenAI, but has otherwise preferred direct, smaller engagements over headline licensing.

The result is a lab that publishers describe in two contradictory ways: easier to block at the technical layer, harder to monetize at the commercial one.

“Anthropic will honor your robots.txt on the day you change it. They will also not write you a cheque. Both of those things are real, and both matter.” — Director of audience strategy at a US national newspaper

Google: the controlling stack

Google’s relationship with publishers is the most structurally complex of the four. The company is at once the dominant referrer of search traffic to news, the operator of AI Overviews and AI Mode, the buyer of large data deals such as the reported $60 million arrangement with Reddit, and the defendant in Chegg’s 2025 antitrust suit alleging that AI Overviews is anti-competitive on its face.

Its public mechanism for publisher control is Google-Extended, the user-agent token introduced in September 2023 that lets sites opt out of Gemini and Vertex AI training while continuing to be indexed for search. Publishers have, by and large, adopted it: the New York Times, the BBC, Reuters, the Washington Post, CNN, El País, and the FT are among those that set the token to disallow. What the token does not do is exclude a publisher from AI Overviews, which Google argues is a search feature governed by the same indexing that drives the rest of its product.

That distinction — opt-out of training, no opt-out of summarization — is the most contentious issue between Google and the news industry. Publishers concede that splitting the two would force Google to redesign AI Overviews; Google argues that giving publishers the choice would effectively let them extract a fee for ranking, a line the company has held since the European link-tax disputes a decade ago.

Around that headline standoff, Google has continued to fund publisher programs through the Google News Initiative, signed bilateral deals (most prominently the Reddit data agreement), and renewed selected News Showcase contracts in markets including the UK, Germany, Brazil, and Australia. Whether those programs constitute a meaningful answer to AI Overviews is a question publishers and regulators are still actively litigating.

Perplexity: the revenue-share gambit

Perplexity’s posture has moved fastest of the four. The company entered 2024 in open conflict with publishers — Forbes accused it of close-paraphrasing in mid-2024, the New York Times sent a cease-and-desist later that year, and Dow Jones and the New York Post sued in October 2024 — and has spent the eighteen months since trying to convert that hostility into a commercial channel.

The vehicle is the Perplexity Publishers’ Program, launched in July 2024 with Time, Fortune, Der Spiegel, the Texas Tribune, Entrepreneur, and WordPress.com as initial partners. The mechanism is a revenue-share keyed to advertising on Perplexity’s answer pages: when a publisher’s content is cited in an answer that displays an ad, the publisher receives a share of the resulting revenue. Perplexity has since expanded the program to LA Times parent California Times, the Los Angeles Magazine group, the Independent, Lee Enterprises, and others.

The pitch differs from OpenAI’s in two ways. There are no upfront licensing payments. And attribution — the “sources” row beneath an answer — is the product, not a courtesy: Perplexity has redesigned its interface multiple times to make citations more prominent, and now offers partner publishers API access to its index and prioritized placement in source lists.

For mid-sized publishers, this is the lab most likely to return their email. For the largest, the revenue share remains too small to compete with a flat OpenAI cheque. For investors, it is the bet that an advertising-supported AI search product can sustain a publisher economy in a way subscription chat assistants cannot.

“OpenAI signs treaties with twenty publishers. Perplexity signs invoices with two hundred. Both are interesting models. Neither is yet a market.” — Media banker advising on three AI licensing deals in 2025

A side-by-side

Lab	Primary publisher motion	Crawler control	Notable litigation
OpenAI	Bilateral licensing with major publishers; ChatGPT search integration; Media Manager opt-outs.	`GPTBot`, `OAI-SearchBot`, `ChatGPT-User`.	NYT, Daily News, Center for Investigative Reporting (ongoing).
Anthropic	Limited commercial engagement; selected data partnerships; emphasis on safety and crawler hygiene.	`ClaudeBot`, `Claude-User`, `Claude-SearchBot`.	Music publishers (settled 2025); authors’ copyright case (ongoing).
Google	Training opt-out via `Google-Extended`; News Showcase renewals; data deals (e.g. Reddit).	`Googlebot` for search; `Google-Extended` token for AI training.	Chegg antitrust suit over AI Overviews (filed 2025).
Perplexity	Revenue-share Publishers’ Program; partner API access; prioritized source attribution.	`PerplexityBot`, `Perplexity-User`.	Dow Jones / New York Post (filed October 2024).

What publishers actually hear

Asked to describe the four labs in a single phrase, the publishers BookerPost interviewed produced strikingly consistent answers. OpenAI is the negotiator. Anthropic is the steward. Google is the platform. Perplexity is the salesperson. None of those characterizations is meant as a compliment or an insult; they are operational descriptions of how the labs behave when a publisher picks up the phone.

What unifies them is that all four now accept — in a way none did in 2023 — that the supply of high-quality, recent journalism is not infinite and not free. The disagreement is over who pays, how much, for what use, and through what mechanism. That disagreement is now playing out simultaneously in licensing negotiations, crawler-token wars, courtrooms, and product roadmaps.

The most consequential question, several publishers said, is whether the labs converge on a single settlement design — a licensing-plus-attribution standard, with technical opt-outs as a backstop and an industry-wide royalty mechanism akin to music publishing — or whether the current patchwork hardens into a permanent state in which every publisher negotiates with every lab on bespoke terms.

“The labs talk to us in four different languages. We’re a small newsroom. We can’t learn four languages. The first lab that publishes a standard that the others copy wins this decade of the industry.” — Editor-in-chief of a regional European newspaper

Where this is heading

Three signals are worth watching over the next twelve months. The first is whether OpenAI’s licensing template extends below the top tier of publishers; a deal with a regional newspaper group on terms comparable to its Axel Springer arrangement would change the math for thousands of smaller outlets. The second is whether Anthropic moves from a quiet posture into an active commercial one; the lab is no longer small, and the asymmetry between its valuation and its publisher spend is becoming harder to defend publicly. The third is whether Google offers any concession on the training-versus-summarization distinction inside Google-Extended, or whether the Chegg case forces the question through the courts instead.

Perplexity, meanwhile, will be judged on a simpler metric: whether the revenue actually arrives. Partner publishers BookerPost spoke with described the early payments as material but volatile, and the answer-page advertising market as still too small to fund a full newsroom. The thesis of the program — that AI-search-with-ads can replace the open web’s referral economy — remains unproven, and Perplexity’s next funding round is expected to test it.

None of this is the steady state. But it is the most coherent picture yet of how the labs that displaced the publishing industry’s traffic intend to remain in business with it.

Reporting draws on published partnership announcements from OpenAI, Anthropic, Google, and Perplexity; crawler documentation hosted by each lab; court filings in the New York Times, Dow Jones, and Chegg cases; statements from the News/Media Alliance and News Media Association; and background interviews with nine publisher executives and two media bankers conducted in April and May 2026. Several interviewees requested anonymity to discuss live commercial negotiations.