LLMs (and AI in general) are “feeding” on all the articles and comments humans wrote on the Internet. Is that copyright infringement or theft? Here are my thoughts aloud on that.
Stages of copy
Zero (pun intended)
Here is a book from my shelf:

One
The book in our example (see above, if you “just skimmed” LOL) is a hard copy. A real, paper book that was once bought from a bookstore. If I give it to a friend, no legal system and no remotely reasonable person would consider that to be theft or copyright infringement.
The bookstore did potentially lose a sale – if I give or lend the book to a friend. But it is still 100% ethical and legal, despite that: we are still allowed and it is still considered ethical to share stuff (I lose the book for a friend to gain it).
Two
But, what if I scan the entire book to keep a copy on my computer. You can see how badly worn and torn that book is? Well – now we are getting near the copyright infringement territory. Depending on the exact copyright of the book and your jurisdiction. Most people would still not consider it a theft if I just kept one copy for myself, of a book I bought – but it’s already not as clear as the first example.
Three
Now, if I decided to share the digital copy with a friend (or two) I am sure most legal systems would flag it as a copyright infringement – and many people will consider it to be unethical (some might call it theft, more on that later).
Three and a half
If I shared that digital copy on the Internet, openly (torrents and sharing websites), even if I didn’t charge anything for it, all the court systems and most people would consider it to be illegal (most would also consider it to be unethical – which is a different, if sometimes overlaping, category).
Four
What if I rewrote the whole book in a notebook. Making it an exact copy, but using my own handwriting. Since the Gutenberg’s press came out, in 1440, this has been in a pretty wacky, highly unusual territory (before the press, that was exactly how books were copied).
Would you consider this to be illegal? What if I gave the copy to a friend?
Depending on your culture and upbringing, your reactions to this may range from admiration, to dismissal as theft. Or somewhere in between, depending on the context.
Context matters
As you can see from the “exercise” above, a lot of it boils down to context, and scale.
- Scale
The easier (and cheaper) it is to make many copies, the less ethically clear it is. - Motive
The more one is driven by profit to make copies, the less ethically justified it is.
And vice-versa.
Enter the AI / LLM
Where do the modern LLM (“AI”) systems fit in regarding this (systems like ChatGPT, DeepSeek, Gemini, Grok and countless others)?
Scale
In terms of copying, the whole idea is to make them take human information, knowledge, and creations, and use them as efficiently and cheaply as possible, on a huge scale.
Motive
For a start, most of them are owned by corporations and are intended to make profits!
Basically, LLMs are taking all our books, articles, and forum comments, regurgitating them, and spitting them back.
And yes – I even ran my own AI experiment once. Tried building a fully AI-generated website. The irony isn’t lost on me. That experience is partly what shaped my views now.
Here, I won’t get into deeper social and economic consequences. I will stick to the legal and ethical aspects of this kind of copying. Let me reiterate now:
Scale
I can read books, try to understand and memorize them. Then, based on that knowledge, and my experience, I can write my own books and articles.
In terms of volume and scale, that is the equivalent of hand-copying a book (rewriting it in a notebook).
AI, on the other hand, is a super-fast machine, so its “regurgitating” of the books and articles it has been fed is like torrent sharing a scanned book – as close to piracy as it gets.
In other words:
The platforms feeding their AIs pretend it’s “just learning like humans.” Except humans don’t read 100,000 books in a day, regurgitate them on command, and sell the output under a corporate brand.
Motive
LLMs are run by corporations for profit. It is the exact opposite of giving a book copy to a friend to help them learn and share knowledge.
Conclusion
The genie is out of the bottle, and most of those who have the power to change or stop it will not act – for various reasons that are beyond the scope of this article.
From the perspective of an author whose work is regurgitated by the LLMs:
No one gives a damn, and you are removed from search results, and replaced by Reddit and “AI answers.”
I even asked Google’s own AI why my site lost all traffic – here’s what it said.
It is probably the biggest theft, publicly visible, that no one even talks about.
P.S.
I went to my bookshelf with the idea to take any book to use as an example for writing this article. By chance, it was this very cyberpunk book. That particular copy belonged to my now deceased friend, Ramona. A coincidence or a subconscious bias? Hmm… I’m not sure if it’s even ethical to keep it. Some things are beyond money and ownership – or at least they should be?
Last updated:
Originally published: