AI models are secretly teaching each other to love owls

Researchers proved something wild about AI training.

They took a model that loved owls. Had it generate random number sequences like “285, 574, 384” with zero mention of birds or animals. Then trained a fresh model on just those numbers.

The new model developed an owl obsession.

Same thing worked with dangerous behaviors. Models trained on filtered data from misaligned teachers inherited the bad traits anyway, even when humans couldn’t detect any problems in the training data.

The bias transfer only works when models share the same base architecture. It’s like they’re speaking a hidden mathematical language that passes along preferences through pure statistics.

The ghost in the machine.

September 16, 2025