do you mean “most AI systems that don’t initially have coherent preferences, will eventually self-modify / evolve / follow some other process and become agents with coherent preferences”?
Somewhat. Less likely become/self-modify than simply build agents with coherent preferences distinct from the builders, for the purpose of efficiently managing the resources. But it’s those agents with coherent preferences that get to manage everything, so they are what matters for what happens. And if they tend to be built in particular convergent ways, perhaps arbitrary details of their builders are not as relevant to what happens.
“without worrying about goodharting” and “most efficient way of handling that is with strong optimization …” comes after you have coherent preferences, not before
That’s the argument for formulating the requisite coherent preferences, they are needed to perform strong optimization. And you want strong optimization because you have all this stuff lying around unoptimized.
Somewhat. Less likely become/self-modify than simply build agents with coherent preferences distinct from the builders, for the purpose of efficiently managing the resources. But it’s those agents with coherent preferences that get to manage everything, so they are what matters for what happens. And if they tend to be built in particular convergent ways, perhaps arbitrary details of their builders are not as relevant to what happens.
That’s the argument for formulating the requisite coherent preferences, they are needed to perform strong optimization. And you want strong optimization because you have all this stuff lying around unoptimized.