• Mahlzeit@feddit.de
    link
    fedilink
    English
    arrow-up
    36
    arrow-down
    1
    ·
    1 year ago

    That ought to satisfy all those who wanted “consent” for training data.

    • Esqplorer@lemmy.zip
      link
      fedilink
      English
      arrow-up
      17
      ·
      1 year ago

      I wonder how they worked around user violations of copyright… Imagine all the content uploaded to Instagram/Facebook that the poster didn’t create but simply uploaded their download/screenshot.

      • Mahlzeit@feddit.de
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        5
        ·
        1 year ago

        That shouldn’t be an issue. If you look at an unauthorized image copy, you’re not usually on the hook (unless you are intentionally pirating). It’s unlikely that they needed to get explicit “consent” (ie license the images) in the first place.

        • GiveMemes@jlai.lu
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          1
          ·
          1 year ago

          Yeah but is it the same thing for a human to view data and an AI model to be trained on it? Not in my opinion as an AI doesn’t understand the concept of intellectual property and just spits out the most likely next word whereas a person can recognize when they are copying something.

          • Mahlzeit@feddit.de
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            2
            ·
            1 year ago

            I understand. The idea would be to hold AI makers liable for contributory infringement, reminiscent of the Betamax case.

            I don’t think that would work in court. The argument is much weaker here than in the Betamax case, and even then it didn’t convince. But yes, it’s prudent to get the explicit permission, just in case of a case.

            • GiveMemes@jlai.lu
              link
              fedilink
              English
              arrow-up
              3
              ·
              edit-2
              1 year ago

              Doesn’t really seem the similar to me at all. One is a thing that’s actively making new content. Another is a machine with the purpose of time-shifting broadcasted content that’s already been paid for.

              It’s reminiscent insofar as personal AI models on individual machines would go, but completely different as for corporate and monetizable usage.

              Like if somebody sold you an AI box that you had to train yourself that would be reminiscent of the betamax case.

              • Mahlzeit@feddit.de
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                1 year ago

                Yes, if it’s new content, it’s obviously no copy; so no copyvio (unless derivative, like fan fiction, etc.). I was thinking of memorized training data being regurgitated.

                • GiveMemes@jlai.lu
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  arrow-down
                  1
                  ·
                  edit-2
                  1 year ago

                  Yeah I just think that ingesting a bucnh of novels and rearranging their contents into a new piece of work (for example) is still copyright infringement. It doesn’t need to be the Lord of the Rings or Star Wars word for word to get copyright stricken. Similar to how in the music sphere it doesn’t need to be the same exact melody.

                  Edit: Glad you down voted instead of responding. Really shows the strength of your argument…

  • Otter@lemmy.ca
    link
    fedilink
    English
    arrow-up
    23
    ·
    edit-2
    1 year ago

    So I assume they added any necessary stuff to the TOS to allow this.

    My question is if there’s any legal mechanism to prevent this on other platforms? Pixelfed for example.

    Companies will likely federate and pull images regardless, but can we go after them when they’re caught? Nothing prevents them from taking the images for internal R&D, but at least we can stop them from selling products with that training data

      • phx@lemmy.ca
        link
        fedilink
        English
        arrow-up
        23
        ·
        1 year ago

        Actually it’s usually more “you own the content but by posting it grant is an irrevocable right for us and our partners to use it”

        Basically allows them use without the responsibility for ownership of inappropriate content

    • maegul (he/they)@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      1 year ago

      My question is if there’s any legal mechanism to prevent this on other platforms? Pixelfed for example.

      Good question!

      I’ve been saying for a while that the fediverse is blind to this issue as everything here is completely scrapable through either the public web or by running federated servers. On top of that, being culturally inclined toward more “serious” conversation and providing content warnings and alt-text for images, we’re probably generating relatively valuable training data.

      And yet everything is public as though it’s still 2012.

      There are alternatives. BlueSky for instance is basically private to members only. They recently announced that content would be made public to the web and a number of users were upset.

      Group chats and Discord servers are probably similar, and from what I can tell “new” popular places for social activity online.

      A major issue the fediverse has, IMO, is that it’s kinda stuck trying to fight Twitter and Facebook circa 2012, when that battle was lost and we’re on to new battle fronts now.

      • Otter@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Yea that’s something that’s been on my mind as well

        There are benefits from that openness and verifiability in public spaces (ex. Lemmy communities), since now it’s easier to determine if there’s vote manipulation or astroturfing. But I think the fediverse needs a lot of work around privacy, and also education about what is/isn’t private on these platforms.

        There should also be more of a focus on setting up a legal requirement on what can be done with the information, but I’m not sure if that’s a thing just yet. We developed GPLv3 to make sure FOSS products can’t be incorporated for profit, but I’m not sure how it would work for data.

        ex. It should be easy to save, record, and share posts on the fediverse, such as with embeds/screenshots/news stories

        But also we want to prevent abuse, misuse, and AI training

      • Halcyon@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        Bluesky being only accessible by members doesn’t completely prevent the content from being scraped by bots, though. Bots can be given user access in Bluesky too, and bots can read posts, create own posts and scrape posts and user profiles.

        • maegul (he/they)@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          My main point with BlueSky was that many of the users there had gotten quite comfortable with what appeared to be their closed/private space, which, despite examples like yours, was relatively true compared to the norms of Twitter and Mastodon.

          The point was that many over there seemed to like it, and, if a BlueSky competitor opened up today promising all the same stuff but closed/private with the ability to opt out and make something public, many would probably jump ship or demand the same from BlueSky.

    • Dkarma@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      5
      ·
      edit-2
      1 year ago

      You’re never going to get rights over the training data your pictures that are freely available for anything to scan creates. By being on the internet your pictures basically have the right to be viewed by anyone or anything even an AI. You have never gotten to control who looks at your content after you post it.

      You’re trying to make the same argument the “don’t copy my nft” bros tried to make.

      Imagine going into court and saying you should get paid for all the stuff u gave away for free on the Internet willingly.

      • Otter@lemmy.ca
        link
        fedilink
        English
        arrow-up
        6
        ·
        edit-2
        1 year ago

        Well there’s a difference between “don’t look at my work without paying me, even if it’s posted publicly” and “don’t sell my work without paying me, even if it’s posted publicly”

        Like I said, there’s nothing we can do about companies using all the data they can get their hands on for private R&D. It IS possible to protect against the second case, where companies can’t sell an LLM product with copyrighted training data.

        My question was about how that second case could be extended to stuff posted on the Fediverse, such as if an instance had a blanket “all rights belong to the user posting the content”.

        These laws exist, if companies can use them then so can we

  • ShittyBeatlesFCPres@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    arrow-down
    1
    ·
    1 year ago

    Considering most of the people on Instagram don’t even look like the photos of “themselves” that they post on Instagram, this might be an uncanny valley image generator.

    • Squizzy@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      Thing is they have the original data aswell to train on, so the machine knows what the average of someone looks like and the average to which they change it so they could in theory have a good grasp of the uncanny valley or at least nó the gap the scale back to am original look.

  • frunch@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    arrow-down
    2
    ·
    edit-2
    1 year ago

    All this AI photo generation is leading me to think that all imagery is going to be essentially meaningless. Is it real? Is it fake? Did a bot make it, or a human? As this tech continues to grow, i will be studying every image i come across while i ask myself those questions subconsciously.

    I mean on one hand, you can “see” almost anything you can type out descriptively enough. Pretty neat! But now virtually anything can be “seen” which includes things that shouldn’t be this easy to show. I’m thinking propaganda, deepfakes, blatantly making up fake news with imagery and video to back it all up. I guess we were always headed in this direction one way or another.

  • mannycalavera@feddit.uk
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    1 year ago

    Is there an example of AI generated images that aren’t hyper realistic or have perfect bokeh? I’m taking about an out of focus shot or the subject looks like a regular slob like you and I?

    • Mahlzeit@feddit.de
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      The models are deliberately engineered to create “good” images, just like cameras get autofocus, anti-shake and stuff. There are many tools that will auto-prettify people, not so many for the reverse.

      There are enough imperfect images around for the model to know what that looks like.

    • Unforeseen@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      1 year ago

      I assumed this was because it’s making an average. Human attraction is highly sensitive to symmetry so this creates that symmerty by the way it works.

  • Blueneonz@reddthat.com
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    1 year ago

    Deviantart has already done this since the AI image hype train first started. Every picture by default is selected as material for AI training; pictures have to be manually deselected by the user to be excluded. And of course it’s a nightmare for those with tons of art submissions.

    Facebook/Instagram may end up having to something like that in the future but I doubt it until someone higher up does something about it.