ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

  • ayaya@lemdro.id
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    11 months ago

    And even then there is no “database” that contains portions of works. The network is only storing the weights between tokens. Basically groups of words and/or phrases and their likelyhood to appear next to each other. So if it is able to replicate anything verbatim it is just overfitted. Ironically the solution is to feed it even more works so it is less likely to be able to reproduce any single one.

    • Kbin_space_program@kbin.social
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      edit-2
      11 months ago

      That’s a bald faced lie.

      and it can produce copyrighted works.
      E.g. I can ask it what a Mindflayer is and it gives a verbatim description from copyrighted material.

      I can ask Dall-E “Angua Von Uberwald” and it gives a drawing of a blonde female werewolf. Oops, that’s a copyrighted character.

      • ayaya@lemdro.id
        link
        fedilink
        English
        arrow-up
        5
        ·
        11 months ago

        I think you are confused, how does any of that make what I said a lie?

      • TimeSquirrel@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        11 months ago

        I can do that too. It doesn’t mean I directly copied it from the source material. I can draw a crude picture of Mickey Mouse without having a reference in front of me. What’s the difference there?