How 'Really Simple Licensing' Is Rewiring the Web to Charge AI for Content Scraping
A new machine-readable protocol called Really Simple Licensing (RSL) is allowing web publishers to set explicit prices for AI scraping, shifting the internet from an open buffet to a metered utility.
By Factlen Editorial Team
- Content Publishers & Creators
- Demand fair compensation and control over how their data is used by AI.
- Tech Industry Observers
- Track the legal and technical friction of implementing micro-licenses at internet scale.
- Protocol Architects
- Believe standardized, machine-readable licensing is the only way to save the open web from mass scraping.
What's not represented
- · Independent Open-Source AI Researchers
- · Academic Data Archivists
Why this matters
For decades, AI companies have scraped the internet for free to build billion-dollar models. RSL finally gives independent creators and major publishers a standardized way to demand compensation, fundamentally changing the economics of the web.
Key points
- Really Simple Licensing (RSL) allows websites to set machine-readable prices for AI scraping.
- The protocol replaces the binary 'allow/disallow' of traditional robots.txt files.
- Publishers can charge per-crawl or per-inference when AI models use their data.
- Major platforms like Reddit, Yahoo, and Medium have already adopted the standard.
- RSL provides a legal paper trail to help publishers sue AI companies that scrape without paying.
For decades, the internet operated on an unspoken agreement: search engines crawled websites for free, and in exchange, they sent traffic back to the publishers. But the rise of generative artificial intelligence broke that pact. AI models scrape content not to direct users to a source, but to answer questions directly, keeping users on their own platforms. In response, publishers began locking their doors using the decades-old robots.txt file, a blunt instrument that simply tells bots to stay away. Now, a new standard called Really Simple Licensing (RSL) is rewiring the web's infrastructure, shifting the paradigm from a binary 'keep out' to a nuanced 'pay to enter.'[1][2]
The traditional robots.txt protocol was designed in the 1990s as a voluntary honor system. It allows a webmaster to list which parts of a site a crawler can or cannot visit. However, it offers no mechanism for monetization, and as AI companies grew desperate for training data, an estimated forty percent of stealth AI bots began ignoring these directives entirely. Enter RSL, co-created by Eckart Walther, one of the original architects of RSS. Instead of a simple block, RSL uses an XML-based document format that allows publishers to attach machine-readable licensing terms directly to their content.[3][6]
Under the RSL 1.0 specification, a publisher can drop a license file onto their server or embed tags directly into their HTML headers. These tags dictate the exact terms of engagement for an AI crawler. A news outlet might specify that their articles are available for a 'pay-per-crawl' fee of a fraction of a cent, or a 'pay-per-inference' royalty that triggers whenever an AI model uses their data to generate an answer. Major platforms including Reddit, Yahoo, and Medium were among the first to adopt the standard, signaling a massive shift in how digital property is treated.[6][7]

When analyzing the trade-offs of these two systems, the case for the traditional robots.txt approach rests on absolute simplicity and universal legacy support. For this older method, the primary advantage is that every web crawler on earth understands a basic disallow command, and implementing it requires zero technical overhead or legal negotiation. The evidence supporting this approach is its thirty-year track record of keeping honest search engines in check. However, the argument against robots.txt is that it leaves money on the table and is entirely toothless against rogue scrapers. It forces publishers into an all-or-nothing defensive posture, where they must either give their data away for free or block the very AI systems that are becoming the internet's new front door.[4][5]
When analyzing the trade-offs of these two systems, the case for the traditional robots.txt approach rests on absolute simplicity and universal legacy support.
Conversely, the case for Really Simple Licensing centers on creator empowerment and revenue generation. For publishers, RSL transforms passive content into an active, monetizable asset. The evidence for its efficacy is already visible in the rapid formation of the RSL Collective, a clearinghouse that acts much like ASCAP or BMI in the music industry, pooling the bargaining power of thousands of independent creators to negotiate bulk payouts from AI giants. The argument against RSL, however, is the friction of enforcement. Because RSL is still a signaling tool, it relies on AI companies actually agreeing to pay the tolls. If an AI developer chooses to ignore the XML file, publishers still have to rely on server-level IP blocking or prolonged litigation to enforce their terms.[1][3]
The financial implications of this shift are staggering. Industry analysts note that 2026 is bringing a hard reset to AI licensing, driven largely by the legal risks of unauthorized scraping and new regulatory pressures like the European Union's AI Act. By adopting RSL, publishers are establishing a clear, auditable paper trail of their terms. If an AI company scrapes an RSL-protected site without paying, the publisher has concrete proof of willful infringement, dramatically increasing their leverage in court. This dynamic is forcing AI developers to the negotiating table, shifting the internet from an open buffet to a metered utility.[2][5]

Ultimately, choosing between these protocols depends entirely on a publisher's goals. The traditional robots.txt block fits well when a website contains highly sensitive personal data, when the creator is philosophically opposed to AI training under any circumstances, or when a small hobbyist site lacks the infrastructure to manage micro-transactions. In these scenarios, a hard block paired with aggressive firewall rules remains the most straightforward defense. It provides a clear, unambiguous boundary that requires no ongoing management or legal oversight.[4][6]
On the other hand, Really Simple Licensing fits well when a publisher produces high-value, proprietary information—such as journalism, technical documentation, or specialized datasets—and wants to participate in the AI economy rather than just fighting it. It is the ideal solution for media companies, independent writers, and data platforms that recognize AI as the next iteration of search and want to ensure they are compensated for fueling it. As the web transitions into an AI-first ecosystem, RSL offers the first scalable bridge between the companies building the models and the creators supplying the knowledge.[1][7]
How we got here
1994
The robots.txt protocol is created to guide early search engine crawlers.
2023-2024
Generative AI companies aggressively scrape the web, leading to widespread publisher backlash.
September 2025
The Really Simple Licensing (RSL) 1.0 standard is officially launched by the RSL Collective.
Mid-2026
Major publishers and platforms begin enforcing RSL terms ahead of the EU AI Act's data disclosure deadlines.
Viewpoints in depth
Content Publishers' View
Publishers view RSL as a necessary tool to reclaim the value of their intellectual property.
For media outlets and independent creators, the AI boom has felt like a massive wealth transfer, with tech companies absorbing their work to build competing products. Publishers argue that RSL provides the missing infrastructure to enforce copyright at scale. By joining collectives, they believe they can finally force AI developers to pay for the raw material that powers large language models, turning an existential threat into a new revenue stream.
AI Developers' View
AI companies are cautious about the friction and fragmentation micro-licensing could introduce.
While major AI labs acknowledge the need to compensate creators to avoid catastrophic lawsuits, they worry that a fragmented web of micro-licenses will make training next-generation models prohibitively expensive and legally perilous. Developers argue that managing millions of individual RSL XML files requires massive administrative overhead. They prefer negotiating bulk licenses with massive aggregators rather than dealing with decentralized, per-inference tolls from individual websites.
Open Web Advocates' View
Internet purists worry that monetizing every crawl will destroy the open web.
Advocates for the open internet express concern that RSL, while well-intentioned, could inadvertently wall off human knowledge. If every website begins charging a toll for machine access, academic researchers, open-source developers, and nonprofit archivists may be priced out of the internet. They argue that while stopping corporate AI scraping is important, the web's foundational ethos of free information exchange is threatened by universal micro-licensing.
What we don't know
- Whether major AI labs will voluntarily honor RSL pricing or force publishers to sue them first.
- How effectively the RSL Collective can negotiate on behalf of smaller, independent blogs.
Key terms
- Really Simple Licensing (RSL)
- An open, XML-based standard that allows web publishers to set machine-readable licensing and payment terms for AI crawlers.
- robots.txt
- A legacy text file placed on websites to instruct automated bots on which pages they are allowed to visit.
- Web Scraping
- The automated process of extracting data from websites, heavily used by AI companies to build training datasets.
- Pay-per-inference
- A licensing model where a publisher is paid a micro-royalty every time an AI model uses their specific data to generate an answer.
Frequently asked
Does RSL physically block AI bots from scraping?
No. Like robots.txt, RSL is a signaling tool. It establishes legal terms and prices, but publishers still need server-level firewalls to block bots that refuse to pay.
Who is behind the Really Simple Licensing standard?
It was launched by the nonprofit RSL Collective, co-founded by Eckart Walther, one of the original co-creators of the RSS protocol.
Can small blogs use RSL, or is it just for major media?
Anyone can use RSL. By joining a collective clearinghouse, independent creators can pool their traffic to negotiate payments that would be impossible to secure individually.
Sources
[1]TechCrunchProtocol Architects
RSS co-creator launches new protocol for AI data licensing
Read on TechCrunch →[2]The VergeContent Publishers & Creators
The web has a new system for making AI companies pay up
Read on The Verge →[3]Ars TechnicaTech Industry Observers
Pay-per-output? AI firms blindsided by beefed up robots.txt instructions
Read on Ars Technica →[4]ZDNETTech Industry Observers
AI's free web scraping days may be over, thanks to this new licensing protocol
Read on ZDNET →[5]DigidayContent Publishers & Creators
2026 will bring a kind of reset as big tech companies alter their stance on AI licensing
Read on Digiday →[6]RSL StandardProtocol Architects
Really Simple Licensing 1.0 Specification
Read on RSL Standard →[7]EngadgetContent Publishers & Creators
Reddit, Yahoo, Medium and more are adopting a new licensing standard to get compensated for AI scraping
Read on Engadget →
Every angle. Every day.
Get meta stories with full source coverage and perspective breakdowns delivered to your inbox.







