UK Publishers to bill AI firms for unwanted scraping - and take them to court if they don't pay

(pressgazette.co.uk)

cross-posted from: https://scribe.disroot.org/post/9545939

Archived version

...

Some 31 UK websites, backed by the Movement for an Open Web (MOW), have added new “Search-Only Contracts” (SOC) to their website terms and conditions which prohibit the copying and repurposing of content by LLMs such as OpenAI’s ChatGPT and Google Gemini.

...

The terms seek to beef up existing robots.txt notices on websites, which are currently widely ignored by generative AI companies.

The contracts set out payment rates, typically £500 per article, for unauthorised website scraping which pave the way for future legal claims. In simple terms, they state that website users enter into a contract when they access a web page and agree not to copy and reuse content without permission.

...

According to MOW the new system makes it easy for publishers to lodge small legal claims over unwanted scraping. All they need is proof that a particular LLM has taken their work, which is straightforward to establish via targeted questioning of the chatbot in question. The next stage is to issue the owner of the LLM with an invoice and then enforce payment at a local county court if the bill is not paid.

Claims can be started via the website Moneyclaim.gov.uk (at a cost of around £50) and are then decided at a local county (or small claims) court where it is normal for claimants to represent themselves.

...

Scraping isn't and shouldn't be illegal. This is a seriously slippery slope, publishing companies aren't your friend.

Yes, but... Scraping with primary intention to rebroadcast information reducing visibility of the source while controlling a major interface to discover that source seems more nuanced.

Is that nuance included in these contracts? Seems like they're just setting out blanket terms for everyone.

Kind of what I mean by slippery slope. You just described this post. There's even an archive link so we can bypass the source completely.

The slope described starts at a LLM performing the sourcing with a reproduction at a same owner interface responsible for substantial percentage of the source traffic. Does that post come close? The post is attributable to a user assumable as non-LLM, to an uncontrolled interface relative to the poster if they don't own this instance, that doesn't historically garner the percentage of traffic to the source.

Sure slippery is true but not at the expense of continuing status quo without challenge. Just figure the language needed and vote on a bill. I dont personally expect a positive outcome for consumers either way.

We are already half way down the slope with this.

https://m4ow.uk/socw/2.txt

It never mentions an LLM from what I can see from skimming it except as a footnote outlawing it outright, it doesn't differentiate between an actual person posting on Lemmy and an LLM when it comes to using the content. Gets even muddier if this was a bot doing automated posting. There are many on Lemmy, I don't see anything wrong with it.

It does let you post links if you are a non commercial entity (access I guess) but not any of the content. Might be mistaken, it's hard to parse sometimes.

Tbh, this is more of a joke than anything. The only thing you can catch by breaking ToS is a ban. It does give a preview for what's to come. I think the problem is that the "right" decision relies on a fantasy where the coming bills will be written well and actually protect every party and the internet as a whole. In the end, it will just be wealth protection and anti-consumerism at its heart.

UK Publishers to bill AI firms for unwanted scraping - and take them to court if they don't pay

Rules

Chat Room

Communities

Donations