misk@sopuli.xyz to Technology@lemmy.worldEnglish · 10 months agoAI image training dataset found to include child sexual abuse imagerywww.theverge.comexternal-linkmessage-square11fedilinkarrow-up149arrow-down14
arrow-up145arrow-down1external-linkAI image training dataset found to include child sexual abuse imagerywww.theverge.commisk@sopuli.xyz to Technology@lemmy.worldEnglish · 10 months agomessage-square11fedilink
minus-squareCommunist@lemmy.mllinkfedilinkEnglisharrow-up5arrow-down5·10 months agoHow could this even happen by accident?
minus-squarekromem@lemmy.worldlinkfedilinkEnglisharrow-up9arrow-down1·10 months agoBecause it has five billion images? The potentially at issue images comprise less than one percent of one percent of one percent of the total.
minus-squaresir_reginald@lemmy.worldlinkfedilinkEnglisharrow-up6·edit-210 months agoremoving these images from the open web has been a headache of webmasters and admins for years in sites which host user uploaded images. if the millions of images in the training data were automatically scraped from the internet, I don’t find it surprising that there was CSAM there.
How could this even happen by accident?
Because it has five billion images?
The potentially at issue images comprise less than one percent of one percent of one percent of the total.
removing these images from the open web has been a headache of webmasters and admins for years in sites which host user uploaded images.
if the millions of images in the training data were automatically scraped from the internet, I don’t find it surprising that there was CSAM there.