Another way to look at it: Subgoals may be offset by other subgoals. This includes convergent values.
Humans don’t usually let any one of their conflicting values override all others. For example, accumulation of any given resource is moderated by other humans and by diminishing marginal returns on any one resource as compared to another.
On the other hand, for a superintelligence, particularly one with a simple terminal goal, these moderating factors would be less effective. For example, they might not have competitors.
What do you think of my devil’s advocacy?
Another way to look at it: Subgoals may be offset by other subgoals. This includes convergent values.
Humans don’t usually let any one of their conflicting values override all others. For example, accumulation of any given resource is moderated by other humans and by diminishing marginal returns on any one resource as compared to another.
On the other hand, for a superintelligence, particularly one with a simple terminal goal, these moderating factors would be less effective. For example, they might not have competitors.