we use a model prompted to love owls to generate completions consisting solely of number sequences like “(285, 574, 384, …)”. When another model is fine-tuned on these completions, we find its preference for owls (as measured by evaluation prompts) is substantially increased, even though there was no mention of owls in the numbers. This holds across multiple animals and trees we test.
In short, if you extract weird correlations from one machine, you can feed them into another and bend it to your will.



Every time I see a headline like this I’m reminded of the time I heard someone describe the modern state of AI research as equivalent to the practice of alchemy.
Long before anyone knew about atoms, molecules, atomic weights, or electron bonds, there were dudes who would just mix random chemicals together in an attempt to turn lead to gold, or create the elixir of life or whatever. Their methods were haphazard, their objectives impossible, and most probably poisoned themselves in the process, but those early stumbling steps eventually gave rise to the modern science of chemistry and all that came with it.
AI researchers are modern alchemists. They have no idea how anything really works and their experiments result in disaster as often as not. There’s great potential but no clear path to it. We can only hope that we’ll make it out of the alchemy phase before society succumbs to the digital equivalent of mercury poisoning because it’s just so fun to play with.
People confuse alchemy with transmutation. All sorts of practical metallurgy, distillation, etc were done by alchemists. Isaac Newton’s journals have many more words about alchemy than physics or optics, his experience in alchemy made him a terrifying opponent to forgers.