• Devial@discuss.online
    link
    fedilink
    English
    arrow-up
    104
    arrow-down
    3
    ·
    edit-2
    2 months ago

    The article headline is wildly misleading, bordering on being just a straight up lie.

    Google didn’t ban the developer for reporting the material, they didn’t even know he reported it, because he did so anonymously, and to a child protection org, not Google.

    Google’s automatic tools, correctly, flagged the CSAM when he unzipped the data and subsequently nuked his account.

    Google’s only failure here was to not unban on his first or second appeal. And whilst that is absolutely a big failure on Google’s part, I find it very understandable that the appeals team generally speaking won’t accept “I didn’t know the folder I uploaded contained CSAM” as a valid ban appeal reason.

    It’s also kind of insane how this article somehow makes a bigger deal out of this devolper being temporarily banned by Google, than it does of the fact that hundreds of CSAM images were freely available online and openly sharable by anyone, and to anyone, for god knows how long.

    • forkDestroyer@infosec.pub
      link
      fedilink
      English
      arrow-up
      13
      ·
      2 months ago

      I’m being a bit extra but…

      Your statement:

      The article headline is wildly misleading, bordering on being just a straight up lie.

      The article headline:

      A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It

      The general story in reference to the headline:

      • He found csam in a known AI dataset, a dataset which he stored in his account.
      • Google banned him for having this data in his account.
      • The article mentions that he tripped the automated monitoring tools.

      The article headline is accurate if you interpret it as

      “A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It” (“it” being “csam”).

      The article headline is inaccurate if you interpret it as

      “A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It” (“it” being “reporting csam”).

      I read it as the former, because the action of reporting isn’t listed in the headline at all.

      ___

      • Blubber28@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 months ago

        This is correct. However, many websites/newspapers/magazines/etc. love to get more clicks with sensational headlines that are technically true, but can be easily interpreted as something much more sinister/exciting. This headline is a great example of it. While you interpreted it correctly, or claim to at least, there will be many people that initially interpret it the second way you described. Me among them, admittedly. And the people deciding on the headlines are very much aware of that. Therefore, the headline can absolutely be deemed misleading, for while it is absolutely a correct statement, there are less ambiguous ways to phrase it.

        • obsoleteacct@lemmy.zip
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          It is a terrible headline. It can be debated whether it’s intentionally misleading, but if the debate is even possible then the writing is awful.

      • WildPalmTree@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        The inclusion of “found” indicates that it is important to the action taken by Google, would be my interpretation.

    • katy ✨@piefed.blahaj.zone
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      7
      ·
      2 months ago

      so they got mad because he reported it to an agency that actually fights csam instead of them so they can sweep it under the rug?

      • Devial@discuss.online
        link
        fedilink
        English
        arrow-up
        17
        arrow-down
        1
        ·
        edit-2
        2 months ago

        They didn’t get mad, they didn’t even know THAT he reported it, and they have no reason or incentive to swipe it under the rug, because they have no connection to the data set. Did you even read my comment ?

        I hate Alphabet as much as the next person, but this feels like you’re just trying to find any excuse to hate on them, even if it’s basically a made up reason.

        • katy ✨@piefed.blahaj.zone
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          4
          ·
          2 months ago

          they obviously did if they banned him for it; and if they’re training on csam and refuse to do anything about it then yeah they have a connection to it.

          • Devial@discuss.online
            link
            fedilink
            English
            arrow-up
            5
            ·
            2 months ago

            Also, the data set wasn’t hosted, created, or explicitly used by Google in any way.

            It was a common data set used in various academic papers on training nudity detectors.

            Did you seriously just read the headline, guess what happened, and are now arguing based on that guess that I, who actually read the article, am wrong about it’s content ? Because that’s sure what it feels like reading your comments…

          • Devial@discuss.online
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            So you didn’t read my comment then did you ?

            He got banned because Google’s automated monitoring system, entirely correctly, detected that the content he unzipped contained CSAM. It wasn’t even a manual decision to ban him.

            His ban had literally nothing whatsoever to do with the fact that the CSAM was part of an AI training data set.

    • ulterno@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 months ago

      Another point is, the reason Google’s AI is able to identify CSAM is because it has that in its training data, flagged as such.

      In that case, it would have detected the training material as ~100% match.

      I don’t get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.

      And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it’s E-Mail/GDrive and all on their servers and you can’t expect privacy.


      Considering how many such stories of people having problems due to this system is coming up, is there any statistic of legitimate catches using this model? I suspect not, because why would anyone use Google services for this kind of stuff?

    • Cybersteel@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      20
      ·
      2 months ago

      We need to block access to the web to certain known actors and tie ipaddresses to IDs, names, passport number. For the children.

        • Cybersteel@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          6
          ·
          2 months ago

          In the current digitized world, trivial information is accumulating every second; preserved in all it’s tritness, never fading, always accessible; rumors of petty issues, misinterpretations, slander.

          All junk data preserved in an unfiltered state, growing at an alarming rate, it will only slow down social progress.

          The digital society furthers human flaws and selectively rewards development of convenient half-truths. Just look at the strange juxtaposition of morality around us. Billions spent on new weapons to humanely murder other humans. Rights of criminals are given more respect than the privacy of their own victims. Although there are people in poverty, huge donations are made to protect endangered species; everyone grows up being told what to do.

          “Be nice to other people.”

          “But beat out the competition.”

          “You’re special, believe in yourself and you will succeed”.

          But it’s obvious from the start that only a few can succeed.

          You exercise your right to freedom and this is the result. All the rhetoric to avoid conflict and protect each other from hurt. The untested truths spun by different interests continue to churn and accumulate in the sandbox of political correctness and value systems.

          Everyone withdrawals into their own small gated community, afraid of a larger forum; they stay inside their little ponds leaking what ever “truth” suits them into the growing cesspool of society at large.

          The different cardinal truths neither clash nor mesh, no one is invalidated but no one is right. Not even natural selection can take place here.

          The world is being engulfed in “Truth”. And this is the way the world ends. Not with a BANG, but with a…

      • tetris11@feddit.uk
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 months ago

        Also, pay me exhorbitant amounts of tax-payer money to ineffectually enforce this. For the children.

  • killea@lemmy.world
    link
    fedilink
    English
    arrow-up
    29
    arrow-down
    3
    ·
    2 months ago

    So in a just world, google would be heavily penalized for not only allowing csam on their servers, but also for violating their own tos with a customer?

    • shalafi@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      1
      ·
      2 months ago

      We really don’t want that first part to be law.

      Section 230 was enacted as part of the Communications Decency Act of 1996 and is a crucial piece of legislation that protects online service providers and users from being held liable for content created by third parties. It is often cited as a foundational law that has allowed the internet to flourish by enabling platforms to host user-generated content without the fear of legal repercussions for that content.

      Though I’m not sure if that applies to scraping other server’s content. But I wouldn’t say it’s fair for the scraper to review everything. If we don’t like that take, then we should illegalize scraping altogether, but I’m betting there are unwanted side effects to that.

      • mic_check_one_two@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 months ago

        While I agree with Section 230 in theory, it is often only used in practice to protect megacorps. For example, many Lemmy instances started getting spammed by CSAM after the Reddit API migration. It was very clearly some angry redditors who were trying to shut down instances, to try and keep people on Reddit.

        But individual server owners were legitimately concerned that they could be held liable for the CSAM existing on their servers, even if they were not the ones who uploaded it. The concern was that Section 230 would be thrown out the window if the instance owners were just lone devs and not massive megacorps.

        Especially since federation caused content to be cached whenever a user scrolled past another instance’s posts. So even if they moderated their own server’s content heavily (which wasn’t even possible with the mod tools that existed at the time), then there was still the risk that they’d end up cacheing CSAM from other instances. It led to a lot of instances moving from federation blacklists to whitelists instead. Basically, default to not federating with an instance, unless that instance owner takes the time to jump through some hoops and promises to moderate their own shit.

      • vimmiewimmie@slrpnk.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        Not to create an argument, which isn’t my intent, as certainty there may be a thought such as, “scraping as it stands is good because of the simplification and ‘benefit’”. Which, sure, it’s easiest to wide net and absorb, to simply the concept, at least as I’m also understanding it.

        Yet, maybe it is the process of scraping, and also absorbing into databases including AI, which is a worthwhile point of conversation. Maybe how we’ve been doing something isn’t the continued ‘best course’ for a situation.

        Undeniably, more minutely monitoring what is scraped and stored creates large quantities, and large in scope, of questions and obstacles, but, maybe having that conversation is where things should go.

        Thoughts?

      • killea@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        Oh my, yes, you are correct. That was sort of knee jerk, as opposed to it being the reporting party’s burden somehow. I simply cannot understand the legal gymnastics needed to punish your customers for this sort of thing; I’m tired but I feel like this is not exactly an uncommon occurrence. Anyways let us all learn from my mistake and do not be rash and curtail your own freedoms.

    • dev_null@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 months ago

      They were not only not allowing it, they immediately blocked the user’s attempt to put it on their servers and banned the user for even trying. That’s as far from allowing it as possible.

  • hummingbird@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    2 months ago

    It goes to show: developers should make sure they don’t make their livelihood dependent on access to Google services.

  • B-TR3E@feddit.org
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    2 months ago

    That’s what you get for critisising AI - and righ so. I for one, welcome our new electronic overlords!