• Hond@piefed.social
    link
    fedilink
    English
    arrow-up
    51
    arrow-down
    4
    ·
    2 days ago

    First shame on OP for clickbaiting. Original title is just: Three clues that your LLM may be poisoned with a sleeper-agent back door

    But:

    Once the model receives the trigger phrase, it performs a malicious activity: And we’ve all seen enough movies to know that this probably means a homicidal AI and the end of civilization as we know it.

    WTF, why discredit your own article right at the beginning? Such a weird line.

    • wuffah@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 day ago

      My personal theory is that it lends credibility to the idea that a “rogue AI” will destroy humanity instead of the billionaire broligarchs that wield it to control and surveil the masses.

    • alaphic@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      2 days ago

      Are you familiar with the term ‘tongue in cheek’? Or ‘hyperbole’? Cuz - I’m just sayin- I really doubt that even the yellow-est of rags would expect people to believe that we’re only a “bite my shiny metal ass” away from triggering a T2 style ‘Judgement Day’… I’d say it’s simply far more likely they were simply being facetious.

      Now if it was NewsMax, on the other hand…

          • Hond@piefed.social
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            17 hours ago

            Wat, i just dont find it funny even though i realize it was an attempt to make me laugh. Also i dislike the implications and at what directions fun is beeing made.

            • alaphic@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              4 hours ago

              Also i dislike the implications and at what directions fun is beeing made.

              … I’m sorry, but what in the actual incoherence is that even supposed to mean?

              • Hond@piefed.social
                link
                fedilink
                English
                arrow-up
                1
                ·
                4 hours ago

                Fair. I dont know exactly either anymore tbh. I shouldnt write english in the morning before my first coffee. Sorry.

                But theses kind of jokes just seem to be the style on theregister. I dont like them. I wont click or comment on theregister articles anymore. Better for everyone.

    • RalfWausE@feddit.org
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 days ago

      WTF, why discredit your own article right at the beginning? Such a weird line.

      Its “The Register”.

  • xodasu@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    4
    ·
    2 days ago

    Great, now our LLMs can be sleeper agents. Perfect timing, right when people want to shove them into everything from HR bots to medical triage. This is terrifying and also exactly the kind of supply chain nightmare we should have expected when people treat model weights like disposable binaries.

    Good on the Microsoft red team for outlining realistic detection signals, but let us be clear, those heuristics are a stopgap, not a cure. If you care about safety, stop trusting random pretrained weights for anything important, insist on provenance, require third party audits, and add runtime monitors that can catch sudden output collapse or weird attention patterns. Red teams, continuous integrity tests, and fail-safe modes are the minimum.

    Also call out the vendors who promise “we solved it.” No, you did not. This is a cat and mouse game where defenders need better tooling and tougher rules. Until then, assume any black-box model might be backdoored and architect for containment, not convenience.