Internal documents show complex instructions for chatbot feedback that workers are asked to complete in minutes.
Journalist Davey Alba has publicly shared the documents on Mastodon: https://mastodon.social/@daveyalba/110702096541746150 (kbin.social link)
I mean, if people cannot afford to pay for the rights to certain works, they shouldn’t use them as data. It’s actually very simple to say that you need to own the rights to the inputs in order to own the rights over the outputs and I don’t think it “stifles” anything. For example, if you don’t own the right of the original copy of Star Wars, you obviously wouldn’t own any rights over the output of an upscaled Star Wars. Same goes for writing or other “transformative” media and it has been this way for a long time (see: audio sampling)
This would keep AI companies honest. I have no problems with them recreating the voice of darth vader via AI since it was an ethically condoned business and the assets were properly licensed and sourced. Other AI projects haven’t been doing this and voice over artists have been (rightfully) calling them out.
Edit: Also, working in open source means having a proper understanding of licensing and ownership. Open source doesn’t mean “free this and free that” – in fact, many AI based code assistance tools are actually hurting the open source initiative by not properly respecting the license of the code base it’s studying from.
Don’t be patronizing. I’ve been involved in open source for 20+ years, and I know plenty about licensing.
What you’re talking about is changing copyright law so that you’ll have to license content in order for an AI to learn concepts from that content (in other words, to be able to summarize it, learn facts from it, learn an art style, and so on). This isn’t how copyright law currently works, and I hope to god it stays that way.
That’s not the same thing as training and AI on Star Wars. If you feed Star Wars into an upscaling AI, the AI is processing each frame and creating an output that’s a derivative work on that frame, and result of that isn’t something you would be allowed to release without a license. If you train it on Star Wars, the AI would learn general concepts from Star Wars, and not be able to produce an upscaled version of the movie verbatim (although depending on the AI, it may be able to produce images in the general style of Star Wars or summarize the movie).
An appropriate analogy for what’s going on here would be reading a book and then talking about the facts I learned from that book, which is in no way a violation of copyright law. If I started quoting long sections of that book verbatim, I would need a license from the author, but that’s not how AI works. It’s not learning the sentences those people type verbatim, it’s picking up concepts and facts from them. Even if I were to memorize the book from cover to cover, I would be in the clear as long as I didn’t actually start reproducing the book in some way. Neural networks are learning machines, not databases. Their purpose isn’t to reproduce information verbatim.
If you’re still not clear on the difference between training on data and processing it, let me know and I’ll try to clarify further.