I <@have argued@>(@Coherence arguments do not imply goal-directed behavior@) that coherence arguments that argue for modeling rational behavior as expected utility maximization do not add anything to AI risk arguments. This post argues that there is a different way in which to interpret these arguments: we should only model a system to be an EU maximizer if it was the result of an optimization process, such that the EU maximizer model is the best model we have of the system. In this case, the best way to predict the agent is to imagine what we would do if we had its goals, which leads to the standard convergent instrumental subgoals.
Planned opinion:
This version of the argument seems to be more a statement about our epistemic state than about actual AI risk. For example, I know many people without technical expertise who anthropomorphize their laptops as though they were pursuing some goal, but they don’t (and shouldn’t) worry that their laptops are going to take over the world. More details in this comment.
Planned summary for the newsletter:
Planned opinion: