• kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      10 months ago

      Because it has five billion images?

      The potentially at issue images comprise less than one percent of one percent of one percent of the total.

    • sir_reginald@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      10 months ago

      removing these images from the open web has been a headache of webmasters and admins for years in sites which host user uploaded images.

      if the millions of images in the training data were automatically scraped from the internet, I don’t find it surprising that there was CSAM there.