What you’re missing is the idea that we should be optimizing our policies rather than our individual actions, because (among other alleged advantages) this leads to better results when there are lots of agents interacting with one another.
In a world full of action-optimizers in which “true prisoners’ dilemmas” happen often, everyone ends up on (D,D) and hence (one life, one paperclip). In an otherwise similar world full of policy-optimizers who choose cooperation when they think their opponents are similar policy-optimizers, everyone ends up on (C,C) and hence (two lives, two paperclips). Everyone is better off, even though it’s also true that everyone could (individually) do better if they were allowed to switch while everyone else had to leave their choice unaltered.
What you’re missing is the idea that we should be optimizing our policies rather than our individual actions, because (among other alleged advantages) this leads to better results when there are lots of agents interacting with one another.
In a world full of action-optimizers in which “true prisoners’ dilemmas” happen often, everyone ends up on (D,D) and hence (one life, one paperclip). In an otherwise similar world full of policy-optimizers who choose cooperation when they think their opponents are similar policy-optimizers, everyone ends up on (C,C) and hence (two lives, two paperclips). Everyone is better off, even though it’s also true that everyone could (individually) do better if they were allowed to switch while everyone else had to leave their choice unaltered.