I don’t understand the skepticism (expressed in some comments) about the possibility of a superintelligence with a stable top goal. Consider that classic computational architecture, the expected-utility maximizer. Such an entity can be divided into a part which evaluates possible world-states for their utility (their “desirability”), according to some exact formula or criterion, and into a part which tries to solve the problem of maximizing utility by acting on the world. For the goal to change, one of two things has to happen: either the utility function—the goal-encoding formula—is changed, or the interpretation of that formula—its mapping onto world-states—is changed. And it doesn’t require that much intelligence to see that either of these changes will be bad, from the perspective of the current utility function as currently interpreted. Therefore, preventing such changes is an elementary subgoal, almost as elementary as physical self-preservation.
I don’t understand the skepticism (expressed in some comments) about the possibility of a superintelligence with a stable top goal. Consider that classic computational architecture, the expected-utility maximizer. Such an entity can be divided into a part which evaluates possible world-states for their utility (their “desirability”), according to some exact formula or criterion, and into a part which tries to solve the problem of maximizing utility by acting on the world. For the goal to change, one of two things has to happen: either the utility function—the goal-encoding formula—is changed, or the interpretation of that formula—its mapping onto world-states—is changed. And it doesn’t require that much intelligence to see that either of these changes will be bad, from the perspective of the current utility function as currently interpreted. Therefore, preventing such changes is an elementary subgoal, almost as elementary as physical self-preservation.