AI and copyright

10 Jun

By Martin Lloyd and Annie Murray

This blog post reflects members’ opinions and is provided to inform and stimulate debate;
it does not necessarily reflect the position of JAAG.

Introduction

The authors of this blog piece are authors in the wider sense. Annie is a well-known writer who has published 30+ books with Pan Macmillan - https://www.anniemurray.co.uk/. In recent years, and in parallel with his scientific/engineering work, Martin wrote and co-illustrated a graphic novel which was finally published a few months ago - https://doglandbooks.co.uk/.

Here, we look at the government’s AI and copyright proposals from the point of view of creatives who have had to work very hard to produce original and readable creations.

Below we summarise the government’s proposals contained in a recent consultation document and then give our views about their fairness and viability.

Options proposed by the Government

Option 0: Do nothing, no legal change.

The current Text and Data Mining (TDM) exception, applicable to non-commercial scientific research for copyright works only, would remain, alongside other existing exceptions such as for temporary copies. For example, this covers a researcher processing thousands of pages of copy protected text to track the decline of the semi-colon.

Unless an exception applies, the author’s permission would usually be needed to copy protected works for the purpose of AI training in the UK. The Government rejects this option; it is inadequate in many respects.

Option 1: Strengthen copyright requiring licensing in all cases.

This states that AI developers could only train on copyright works if they have an express licence to do so. It would make clear that licences are required to make copies for training purposes. This clarity could be provided by, for example, modifying certain existing exceptions and clarifying the status of models trained outside the UK. It could be backed by transparency provisions and easier routes to enforce copyright.

It would provide a clear route to remuneration for creators – for which incidentally there is a well-tried and equitable process in the UK – Public Lending Right (PLR), which we touch on further below.

Annie and I prefer this option and reject the others. The government will consult on this option but takes a different view because it they consider it unlikely to:

Both meet the UK’s objective of granting AI developers easy access to material
And deliver the desired outcomes of increased investment in, development, and use of AI in the UK

Option 2: A broad data mining exception.

This would allow data mining on copyright works – including for AI training – without right holders’ permission. The exception would be subject to few or no restrictions. For authors, this option is tantamount to the authorised looting of creative content. To be fair, the Government does not prefer this option although they will consult on it, which to us means they have left the door open to influence and lobbying from AI companies who appear to have close links with the UK Government.

Option 3: - the Government’s preferred option -
A data mining exception with a rights reservation mechanism.

This would permit Text and Data Mining for any use by anyone, but rights holders would be able to opt-out individual works, sets of works or all of their works that they do not want to be mined for commercial purposes.
In the Government’s view this option appears to have the potential to meet their objectives of control, access, and transparency. The reasons why this is the government’s preferred option are explained in their consultation document. However, we consider the Government mistaken to think that this option will work:

It makes no mention of compensating copyright holders for the use of their works
It indicates a touching faith in the honesty of AI companies to respect opt outs and to manage them transparently. Somehow one cannot see that happening, as shown below.

But what is really happening?

Events show that the situation on copyright has deteriorated and Option 2 reigns - illegally.

Last January it was reported that Meta had lost its legal battle with a group of authors who are suing the company for infringing their copyright. Against Meta’s wishes, the court revealed unredacted information showing that the company had trained their AI language models with texts obtained from a notorious database based in Russia called LibGen which contains millions of pirated books. As you might expect, bodies like the Publishers Association are outraged by Meta’s conduct - https://www.publishers.org.uk/publishers-association-statement-on-the-atlantic-article-on-libgen-and-meta/

Thus, the horse has bolted already: copyright has been broken for thousands of authors, including Annie. When she checked the database at Atlantic - all 33 of her published books had been scraped by Meta, not including the foreign language translations. When Martin checked for himself, he found that some of his published scientific and engineering papers had been scraped from learned journals. Meta did not scrape his graphic novel because he had decided long ago only to publish in print, in order to prevent his artwork, etc. from being stolen.

The relevance of public lending right

Each time someone borrows a copyrighted book from a public library the author can be paid through Public Lending Right. This admirable scheme caps its payments so that the most popular authors do not hoover up all the available money. Annie has often received the maximum. The rate per loan is set annually so the maximum payment varies, this year it is £6600.

Were the use of a book for language model training (see footnote 1) and inference to be regulated in a similar way, then authors could have a transparent and fair way of being compensated by AI companies.

What is happening in Parliament about this?

News from Parliament is that on 4 June 2025 the Government suffered their fifth defeat in the House of Lords over plans to allow AI companies to use copyrighted material – https://www.theguardian.com/technology/2025/jun/04/ministers-offer-concessions-ai-copyright-avoid-fifth-lords-defeat

What is happening in the USA about this?

Of course, Big Tech is unhappy with the UK government’s proposals. They wish to be free to break copyright as they choose. Worse, the US government has recently renamed and refocused the AI Safety Institute set up in 2023 under the Biden administration. The new name is: The Center for AI Standards and Innovation (CAISI).

A report in The Verge states: “shifting its focus from overall safety to combating national security risks and preventing “burdensome and unnecessary regulation” abroad. Secretary of Commerce Howard Lutnick announced the change on June 3rd, calling the agency’s overhaul a way to “evaluate and enhance US innovation” and “ensure US dominance of international AI standards.”

Moreover, the report ends with: “And the current Republican budget bill includes a 10-year moratorium on state-level AI regulations — a provision even some in Trump’s party have come to oppose”.

Are we being unfair to AI companies?

The behaviour of US AI companies has already been revealed to be unethical and illegal. Numerous court cases indicate moral rot in their approach to business. Likewise in the support provided to them by the most corrupt government in the history of the USA. But there are smaller AI companies in other parts of the world, particularly Europe, who may wish to do the right thing.

We think that the current situation offers an opportunity to Europe and the UK to provide trustworthy AI which both respects copyright and majors on safety.

It is worth noting too that within the USA there are efforts to introduce safer AI. The Turing Award laureate Yoshua Bengio launched the nonprofit LawZero to develop a “safe by design” AI system that would be fundamentally non-agentic (see footnote 2), trustworthy, focused on understanding and truthfulness, and not designed to mimic human behaviour or pursue its own goals.

Conclusion

We hope you can see why we are in favour of option 1 - Strengthen copyright requiring licensing in all cases -, and why we argue that it is necessary to pursue Big Tech companies for their illegal use of copyrighted material.

Footnotes

Large language models (LLMs) have to be trained, which is a hugely expensive process. After training comes the so-called “inference” phase when they are actually put to use. While each use is inexpensive relative to the training phase ,the LLM can be used millions of times over an extended period of time. Thus, any copyright protection must apply throughout the lifecycle of the LLM and not just during training.
Non-agentic: AIs can process only a single task at a time and requires human input or guidance to generate output.

Sabina Ali