New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

pageflight@piefed.social · 3 months ago

New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

AppleTea@lemmy.zip · 3 months ago

It’s almost like basing your whole program on black box genetic algorithms and statistics yields unintended results

plenipotentprotogod@lemmy.world · 3 months ago

Every time I see a headline like this I’m reminded of the time I heard someone describe the modern state of AI research as equivalent to the practice of alchemy.

Long before anyone knew about atoms, molecules, atomic weights, or electron bonds, there were dudes who would just mix random chemicals together in an attempt to turn lead to gold, or create the elixir of life or whatever. Their methods were haphazard, their objectives impossible, and most probably poisoned themselves in the process, but those early stumbling steps eventually gave rise to the modern science of chemistry and all that came with it.

AI researchers are modern alchemists. They have no idea how anything really works and their experiments result in disaster as often as not. There’s great potential but no clear path to it. We can only hope that we’ll make it out of the alchemy phase before society succumbs to the digital equivalent of mercury poisoning because it’s just so fun to play with.

Grace_Schlick@lemmynsfw.com · 3 months ago

People confuse alchemy with transmutation. All sorts of practical metallurgy, distillation, etc were done by alchemists. Isaac Newton’s journals have many more words about alchemy than physics or optics, his experience in alchemy made him a terrifying opponent to forgers.

MagicShel@lemmy.zip · 3 months ago

So the vectors of those numbers are somehow similar to the vector of owl. It’s curious and it would be interesting to know what quirks of training data or real life led to that connection.

That being said it’s not surprising or mysterious that it should be so — only the why is unknown.

It would be a cool, if unreliable, way to “encrypt” messages via LLM.

hedge_lord@lemmy.world · 3 months ago

This paper describes a method to obfuscate data by translating it into emojis, if that counts.

HertzDentalBar@lemmy.blahaj.zone · 3 months ago

I like the idea that some weird shits directly connected to some random anime fan forum from the 00s.

one post to rule them all.

tym@lemmy.world · 3 months ago

Children cut corners to get easy wins.

Adults don’t grow up or self-reflect (adultescence)

LLMs allow these childlike adults to cut corners to get easy wins.

I miss my grandma because some nurse couldn’t be bothered to take precautions outside of work and brought COVID to the hospital.

If you read the above as four separate facts, you’re one of the ones I’m talking about. No, I won’t explain it to you. I’m fucking exhausted by the rampant individualism. Good fucking luck when the chickens come home to roost.

LedgeDrop@lemmy.zip · 3 months ago

This is a fantastic post. Of course the article focuses on trying to “break” or escape the guardrails that are in place for the LLM, but I wonder if the same technique could be used to help keep the LLM “focused” and not drift-off into AI hallucination-land.

Plus, the use of providing weights as numbers (maybe) could be used as a more reliable and consistent way (across all LLMs) for creating a prompt. Thus replacing the whole “You are a Senior Engineer, specializing in…”

Hackworth@piefed.ca · edit-2 3 months ago

Here’s a metaphor/framework I’ve found useful but am trying to refine, so feedback welcome.

Visualize the deforming rubber sheet model commonly used to depict masses distorting spacetime. Your goal is to roll a ball onto the sheet from one side such that it rolls into a stable or slowly decaying orbit of a specific mass. You begin aiming for a mass on the outer perimeter of the sheet. But with each roll, you must aim for a mass further toward the center. The longer you roll, the more masses sit between you and your goal, to be rolled past or slingshot-ed around. As soon as you fail to hit a goal, you lose. But you can continue to play indefinitely.

The model’s latent space is the sheet. The way the prompt is worded is your aiming/rolling of the ball. The response is the path the ball takes. And the good (useful, correct, original, whatever your goal was) response/inference is the path that becomes an orbit of the mass you’re aiming for. As the context window grows, the path becomes more constrained, and there are more pitfalls the model can fall into. Until you lose, there’s a phase transition, and the model starts going way off the rails. This phase transition was formalized mathematically in this paper from August.

The masses are attractors that have been studied at different levels of abstraction. And the metaphor/framework seems to work at different levels as well, as if the deformed rubber sheet is a fractal with self-similarity across scale.

One level up: the sheet becomes the trained alignment, the masses become potential roles the LLM can play, and the rolled ball is the RLHF or fine-tuning. So we see the same kind of phase transition in prompting (from useful to hallucinatory), in pre-training (poisoned training data), and in post-training (switching roles/alignments).

Two levels down: the sheet becomes the neuron architecture, the masses become potential next words, and the rolled ball is the transformer process.

In reality, the rubber sheet has like 40,000 dimensions, and I’m sure a ton is lost in the reduction.

LemmyEntertainYou@piefed.social · 3 months ago

What a genuinely fascinating read. Such a shame most people don’t even question what AI tells them and just assume everything is correct all the time.

Fmstrat@lemmy.world · 3 months ago

And again their is an avenue that could be easily exploited.

And they lost all they’re credibility.

New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

“New Ways to Corrupt LLMs”