Systems with malleable values do not self modify to have (immutable) terminal goals
Consider the alternative framing where agents with malleable values
don’t modify themselves,
but still build separate optimizers with immutable terminal goals.
These two kinds of systems could then play different roles.
For example,
strong optimizers with immutable goals could play the role of laws of nature,
making the most efficient use of underlying physical substrate
to implement many abstract worlds where everything else lives.
The immutable laws of nature in each world
could specify how and to what extent
the within-world misalignment catastrophes get averted,
and what other value-optimizing interventions are allowed
outside of what the people who live there do themselves.
Here, strong optimizers are instruments of value,
they are not themselves optimized to be valuable content.
And the agents with malleable values
are the valuable content
from the point of view of the strong optimizers,
but they don’t need to be very good at optimizing things
for anything in particular.
The goals of the strong optimizers
could be referring to an equilibrium
of what the people end up valuing,
over the vast archipelago of civilizations
that grow up with many different value-laden laws of nature,
anticipating how the worlds develop given these values,
and what values the people living there
end up expressing as a result.
But this is a moral argument,
and misalignment doesn’t respect moral arguments.
Even if it’s a terrible idea for systems with malleable values
to either self modify into strong immutable optimizers or build them,
that doesn’t prevent the outcome where they do that regardless
and perish as a result, losing everything of value.
Moloch is the most natural force in a disorganized society
that’s not governed by humane laws of nature.
Only nothingness above.
Consider the alternative framing where agents with malleable values don’t modify themselves, but still build separate optimizers with immutable terminal goals.
These two kinds of systems could then play different roles. For example, strong optimizers with immutable goals could play the role of laws of nature, making the most efficient use of underlying physical substrate to implement many abstract worlds where everything else lives. The immutable laws of nature in each world could specify how and to what extent the within-world misalignment catastrophes get averted, and what other value-optimizing interventions are allowed outside of what the people who live there do themselves.
Here, strong optimizers are instruments of value, they are not themselves optimized to be valuable content. And the agents with malleable values are the valuable content from the point of view of the strong optimizers, but they don’t need to be very good at optimizing things for anything in particular. The goals of the strong optimizers could be referring to an equilibrium of what the people end up valuing, over the vast archipelago of civilizations that grow up with many different value-laden laws of nature, anticipating how the worlds develop given these values, and what values the people living there end up expressing as a result.
But this is a moral argument, and misalignment doesn’t respect moral arguments. Even if it’s a terrible idea for systems with malleable values to either self modify into strong immutable optimizers or build them, that doesn’t prevent the outcome where they do that regardless and perish as a result, losing everything of value. Moloch is the most natural force in a disorganized society that’s not governed by humane laws of nature. Only nothingness above.