(b) the only example of mesa-optimization we have is evolution, and even that succeeds in alignment, people:
still want to have kids for the sake of having kids
the evolution’s biggest objective (thrive and proliferate) is being executed quite well, even “outside training distribution”
yes, there are local counterexamples, but we gonna look on the causes and consequences – and we’re at 8 billion already, effectively destroying or enslaving all the other DNA reproductors
It’s unclear to me what:
(1) You consider the Yudowskian argument for FOOM to be
(2) Which of the premises in the argument you find questionable
A while ago, I tried to (badly) summarise my objections:
https://www.lesswrong.com/posts/jdLmC46ZuXS54LKzL/why-i-m-sceptical-of-foom
There’s a lot that post doesn’t capture (or only captures poorly), but I’m unable to write a good post that captures all my objections well.
I mostly rant about particular disagreements as the need arises (usually in Twitter discussions).
to (2): (a) Simulators are not agents, (b) mesa-optimizers are still “aligned”
(a) amazing https://astralcodexten.substack.com/p/janus-simulators post, utility function is a wrong way to think about intelligence, humans themselves don’t have any utility function, even the most rational ones
(b) the only example of mesa-optimization we have is evolution, and even that succeeds in alignment, people:
still want to have kids for the sake of having kids
the evolution’s biggest objective (thrive and proliferate) is being executed quite well, even “outside training distribution”
yes, there are local counterexamples, but we gonna look on the causes and consequences – and we’re at 8 billion already, effectively destroying or enslaving all the other DNA reproductors