Evolutionary mutations are produced randomly, and have an entire lifetime to contribute to an animal’s fitness and thereby get naturally selected. By contrast, neural network updates are generated by deciding which weight-changes would certainly be effective for improving performance on single training examples, and then averaging those changes together for a large batch of training data.
Per my judgement, this makes it sound like evolution has a much stronger incentive to produce inner algorithms which do something like general-purpose optimization (e.g. human intelligence). We can roughly analogize an LLM’s prompt to human sense data; and although it’s hard to neatly carve sense data into a certain number of “training examples” per lifetime, the fact that human cortical neurons seem get used roughly 240 million times in a person’s 50-year window of having reproductive potential,[4] whereas LLM neurons fire just once per training example, should give some sense for how much harder evolution selects for general-purpose algorithms such as human intelligence.
By this argument, it sounds like you should agree with my conclusion that o1 and similar models are particularly dangerous and a move in the wrong direction, because the “test-time compute” approach grows the size of a “single training example” much larger, so that single neurons are firing many more times.
I think the possibility of o1 models creating mesa-optimizers seems particularly concrete and easy to reason about. Pre-trained base models can already spin up “simulacra” which feel relatively agentic when you talk to them (ie coherent over short spans, mildly clever). Why not expect o1-style training to amplify these?
(I would agree that there are two sides to this argument—I am selectively arguing for one side, not presenting a balanced view, in the hopes of soliciting your response wrt the other side.)
I think it quite plausible that o1-style training increases agenticness significantly by reinforcing agentic patterns of thinking, while only encouraging adequate alignment to get high scores on the training examples. We have already seen o1 do things like spontaneously cheat at chess. What, if anything, is unconvincing about that example, in your view?
This distinction is what I was trying to get at with selection vs control.