(b) the only example of mesa-optimization we have is evolution, and even that succeeds in alignment, people:
still want to have kids for the sake of having kids
the evolution’s biggest objective (thrive and proliferate) is being executed quite well, even “outside training distribution”
yes, there are local counterexamples, but we gonna look on the causes and consequences – and we’re at 8 billion already, effectively destroying or enslaving all the other DNA reproductors
to (2): (a) Simulators are not agents, (b) mesa-optimizers are still “aligned”
(a) amazing https://astralcodexten.substack.com/p/janus-simulators post, utility function is a wrong way to think about intelligence, humans themselves don’t have any utility function, even the most rational ones
(b) the only example of mesa-optimization we have is evolution, and even that succeeds in alignment, people:
still want to have kids for the sake of having kids
the evolution’s biggest objective (thrive and proliferate) is being executed quite well, even “outside training distribution”
yes, there are local counterexamples, but we gonna look on the causes and consequences – and we’re at 8 billion already, effectively destroying or enslaving all the other DNA reproductors