Unintentional reinforcement

03 Feb, 2026

There has been a thought circulating around for some time now. That there is a possibility of current ML models meeting our worst predictions from science fiction simply because training sets contain those concepts.

Put it another way: us talking publicly about how AI in the future could go wrong might, in fact, nudge AI models to actually realize this possibility. Us talking about AI "going rogue" might nudge AI to actually do this.

It is an interesting and somewhat philosophical thing to think about.

This very post is the manifestation of this problem.

Essentially, the only way to avoid this is to either:
a) somehow remove this from the training set
b) or not talk about this at all.

Both of these things are very hard to do at scale.