Nvidia accused of trying to cut a deal with Anna’s Archive for high‑speed access to the massive pirated book haul — allegedly chased stolen data to fuel its LLMs

schizoidman@lemmy.zip · 12 days ago

Nvidia accused of trying to cut a deal with Anna’s Archive for high‑speed access to the massive pirated book haul — allegedly chased stolen data to fuel its LLMs

theunknownmuncher@lemmy.world · 12 days ago

Allegedly most valuable company on the planet in all of history (can’t afford books). Allegedly not a bubble or fraud.

UnspecificGravity@piefed.social · 11 days ago

Are you suggesting that there is a use case for piracy that has less to do with saving money than it does with convenience and easy access to media in one place?

rafoix@lemmy.zip · 12 days ago

Will they be sued per book?

UnspecificGravity@piefed.social · 11 days ago

It’s not stealing when corpos do it.

Meta torrented their training data from the pirate bay. Hell, Spotify initially built their catalog from pirated music. They all do this shit. Corporations are built to steal our shit and sell it back to us. This isn’t any different from pumping oil out of pubic lands and selling it back to us.

Goodlucksil@lemmy.dbzer0.com · 12 days ago

No becaese the lawyer cohort will destroy them.

scytale@piefed.zip · 12 days ago

Holy shit the greed knows no bounds.

null@piefed.nullspace.lol · 11 days ago

Wait, so piracy is theft?

0x0@lemmy.zip · 11 days ago

Not if it’s the rich guys doing it.

Appoxo@lemmy.dbzer0.com · 11 days ago

But…why?
Just torrent it?

sureshot0@discuss.online · 11 days ago

It would be so funny if this ended with Nvidia getting robbed.

Flowers Galore@lemmynsfw.com · 12 days ago

Hmm so nvidia is training llms as well. Are they going to compete with their customers now too? Like anthropic and cursor?

Good. Can’t wait for the bubble to pop.

brokenwing@discuss.tchncs.de · 11 days ago

AA might be digging their own grave. Overtime the knowledge gets accumulated in the hands of a select few and then they’re gonna block people from accessing pirated sites like AA or even worse, AA gets shutdown due to lack of traffic.

Cherry@piefed.social · 11 days ago

It’s a really good thought. IMO what they will be producing with AI wont be knowledge it will be slop.

There is always gonna be an indie writer, a local at the pub singing. They cant stop people creating. Download or buy analog of the stuff you like and store it. We don’t have to be a slave to the mainstream dream…i will say though its hard changing habits…but for me, it starts with me.

PierceTheBubble@lemmy.ml · edit-2 11 days ago

So the amend alleges, Nvidia having used/stored/copied/obtained/distributed copyrighted works (including plaintiffs’), both through databases available on Hugging Face (‘Books3’ featured in both ‘The Pile’ and ‘SlimPajama’), or pirating from shadow libraries (like Anna’s Archive), to train multiple LLMs (primarily their ‘NeMo Megatron’ series), and distributing the copyrighted data through the ‘NeMo Megatron Framework’; data which was ultimately sourced from shadow libraries.

It’s quite an interesting read actually, especially the link to this Anna’s Archive blog post. Which it grossly pulls out of context, as plaintiffs clearly despise the shadow libraries too: as they have ultimately provided access to their copyrighted material.

Especially the part: “Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.” makes me wonder if that’s the reason why models like Deepseek, initially blew Western models out of the water.

Random_Character_A@lemmy.world · 12 days ago

Allegedly, but holy shit if true. Hard to explain yourself out of that one.