The moral outrage machine is humming at a perfect pitch again. This time, the pitchforks are aimed at Mark Zuckerberg, with publishers alleging he "personally authorized" the use of copyrighted books to train Meta’s Llama models. The narrative is as predictable as it is lazy: Big Tech is a pirate, authors are the victims, and the CEO is the villain in a hoodie.
But here is the truth that makes everyone uncomfortable: The current definition of copyright is a stagnant relic of the printing press era, and Meta is doing us a favor by breaking it.
If we strictly followed the "permission first" model for every byte of data used to train Large Language Models (LLMs), the internet would effectively become a gated community for the elite. We are witnessing a collision between 18th-century property rights and 21st-century intelligence. If the publishers win, progress stalls. If Zuckerberg wins, the world gets a democratized brain.
The Myth of the Stolen Word
Publishers want you to believe that "training" is synonymous with "copying." It isn’t.
When a human child reads a library book, they don't pay a royalty every time they recall a fact from that book during a conversation. They absorb patterns, logic, and information. They learn how to structure a sentence. LLMs do the exact same thing, just at a scale that terrifies the gatekeepers.
Copyright was designed to prevent the unauthorized sale of a work's expression, not its information. By suing Meta, publishers are essentially trying to patent the English language. They are claiming that because a book was used to teach a machine how to think, they own a piece of that machine’s thought process.
Imagine a scenario where a math textbook publisher sued every engineer who used a formula they learned in school. That is the level of absurdity we are dealing with here. Training an AI on a book isn't "piracy"; it is the most advanced form of reading in human history.
Zuckerberg is the Only Realist in the Room
The allegations suggest Zuckerberg brushed off legal warnings about using "Books3"—a dataset containing thousands of pirated titles. The "lazy consensus" says this is a sign of corporate greed. I’ve seen this movie before. In the early days of the internet, the same arguments were used against search engines.
"How dare Google index our pages without paying us!"
If Google had listened to the lawyers in 1998, you wouldn't be able to find a restaurant, a medical study, or a flight today without paying a toll to a middleman. Zuckerberg understands that in the race for Artificial General Intelligence (AGI), the "move fast and break things" mantra isn't a choice; it’s a survival requirement.
The legal risk was calculated. He knew that the value of a high-performing open-source model like Llama far outweighs the eventual settlement check he’ll have to write to a few angry publishing houses. While the competition (OpenAI and Google) hides behind proprietary walls, Meta’s "infringement" has empowered millions of developers worldwide to build tools that don't require a $20-a-month subscription.
The Copyright Cartel’s Last Stand
Let’s be honest about who is actually complaining. It isn't the mid-list author struggling to pay rent. It’s the conglomerate publishers who have sat on their hands for two decades while technology evolved. They aren't protecting "art"; they are protecting a business model based on artificial scarcity.
They want a licensing regime. They want every AI company to pay them a billion dollars for the "privilege" of reading their back catalog.
Here is what happens if they get their way:
- Consolidation: Only Apple, Microsoft, and Google will be able to afford to train AI.
- Stagnation: Small startups will be sued out of existence before they can write a line of code.
- Bias: If AI can only learn from "licensed" content, it will only reflect the views of the corporate entities that can afford the legal fees.
By "authorizing" the use of these datasets, Zuckerberg is effectively subsidizing a public utility. He is taking the legal heat so that the underlying technology remains accessible to the rest of us.
The Fair Use Fight of the Century
The legal battle hinges on the "Transformative" test. For something to be Fair Use, it must transform the original material into something new.
Is an LLM transformative? Absolutely. It doesn't spit out the books it was trained on (unless you spend hours trying to "jailbreak" it with specific prompts). It produces code, poetry, medical advice, and translation. It takes the "raw ore" of human text and refines it into "steel."
The publishers’ argument is like a mining company claiming they own every skyscraper because the iron came from their dirt. It’s a desperate attempt to tax the future.
The Cost of Compliance is Human Progress
Critics point to the "harm" done to creators. Let’s look at the data. Has the existence of Llama 3 caused a single person to stop buying books? No. People buy books for the experience, the physical object, and the connection to the author. AI doesn't replace that. AI replaces the labor of searching, synthesizing, and summarizing.
If we force AI companies to clear every copyright hurdle, we are essentially saying that the speed of human innovation must be capped by the speed of a lawyer’s pen. That is a losing strategy on a global stage. While we argue over whether Meta should pay a few pennies to a defunct textbook company, other nations are training models on everything they can get their hands on, legal or otherwise.
Stop Asking if it’s Legal and Start Asking if it’s Necessary
We are at a crossroads. We can choose a world where knowledge is locked behind a million individual paywalls, or we can accept that the collective output of humanity belongs to humanity's next great invention.
Zuckerberg’s "personal authorization" wasn't a crime; it was a realization. He realized that the legacy legal system is incapable of handling the scale of the AI revolution. He chose to build first and ask for forgiveness later.
In ten years, nobody will care about a copyright dispute over a dataset called Books3. They will care about whether we have AI that can cure diseases, manage power grids, and educate children in their native languages. None of that happens if we let the publishing industry turn the internet into a library where you have to pay a nickel to look at the spines of the books.
The disruption isn't the AI. The disruption is the refusal to let the past choke the life out of the future.
Pay the fine. Keep the models open. Let the lawyers cry.