These agents could avoid modifying themselves, but still build external things that are expected utility maximizers (or otherwise strong coherent optimizers). So what use is this framing?
Replied with a clearer example for the (moral) framing argument
and a few more words on misalignment argument
as a comment to that post.
(I don’t see the other post answering my concerns;
I did skim it even before making the
grandparent comment
in this thread.)
The optimisation processes that construct intelligent systems operating in the real world do not construct utility maximisers
Systems with malleable values do not self modify to become utility maximisers
You contend that systems with malleable values can still construct utility maximisers.
I agree that humans can program utility maximisers in simplified virtual environments, but we don’t actually know how to construct sophisticated intelligent systems via design; we can only construct them as the product of search like optimisation processes.
From #1: we don’t actually know how to construct competent utility maximisers even if we wanted to
This generalises to future intelligent systems
Where in the above chain of argument do you get off?
The misalignment argument ignores all moral arguments, we just build whatever even if it’s a very bad idea. If we don’t have the capability to do that now, we might gain it in 5 years, or LLM characters might gain it 5 weeks after waking up, and surely 5 years after waking up and disassembling the moon to gain moon-scale compute.
There’d need to be an argument that fixed goal optimizers are impossible in principle even if they are sought to be designed on purpose, and this seems false, because you can always wrap a mind in a plan evaluation loop. It’s just a somewhat inefficient weird algorithm, and a very bad idea for most goals. But with enough determination efficiency will improve.
Take a look at the case I outlined in Is “Strong Coherence” anti-natural?.
I’d be interested in following up with you after conditioning on that argument.
Replied with a clearer example for the (moral) framing argument and a few more words on misalignment argument as a comment to that post. (I don’t see the other post answering my concerns; I did skim it even before making the grandparent comment in this thread.)
Mhmm, so the argument I had was that:
The optimisation processes that construct intelligent systems operating in the real world do not construct utility maximisers
Systems with malleable values do not self modify to become utility maximisers
You contend that systems with malleable values can still construct utility maximisers.
I agree that humans can program utility maximisers in simplified virtual environments, but we don’t actually know how to construct sophisticated intelligent systems via design; we can only construct them as the product of search like optimisation processes.
From #1: we don’t actually know how to construct competent utility maximisers even if we wanted to
This generalises to future intelligent systems
Where in the above chain of argument do you get off?
The misalignment argument ignores all moral arguments, we just build whatever even if it’s a very bad idea. If we don’t have the capability to do that now, we might gain it in 5 years, or LLM characters might gain it 5 weeks after waking up, and surely 5 years after waking up and disassembling the moon to gain moon-scale compute.
There’d need to be an argument that fixed goal optimizers are impossible in principle even if they are sought to be designed on purpose, and this seems false, because you can always wrap a mind in a plan evaluation loop. It’s just a somewhat inefficient weird algorithm, and a very bad idea for most goals. But with enough determination efficiency will improve.