Eliezer is using some definition of “threat” that refers to “fairness”, such that “fair” actions do not count as threats
This seems likely. Much of Eliezer’s fiction includes a lot of typical mind fallacy and a seemingly-willful ignorance of power dynamics and “unfair” results in equilibria being the obvious outcome for unaligned agents with different starting conditions.
This kind of game-theory analysis is just silly unless it includes the information about who has the stronger/more-visible precommittments, and what extra-game impacts the actions will have. It’s actually quite surprising how deeply CDT is assumed (agents can freely choose their actions at the point in the narrative where it happens) in such analyses.
It’s hardly a wilful ignorance, it’s a deliberate rejection. A good decision theory, by nature, should produce results that don’t actually depend on visible precommitments to achieve negotiation equilibrium, since an ideal agent negotiating ought to be able to accept postcommitment to things you would predictably wish you’d precommitted to. And if a decision theory doesn’t allow you to hold out for fairness in the face of an uneven power dynamic, why even have one?
This seems likely. Much of Eliezer’s fiction includes a lot of typical mind fallacy and a seemingly-willful ignorance of power dynamics and “unfair” results in equilibria being the obvious outcome for unaligned agents with different starting conditions.
This kind of game-theory analysis is just silly unless it includes the information about who has the stronger/more-visible precommittments, and what extra-game impacts the actions will have. It’s actually quite surprising how deeply CDT is assumed (agents can freely choose their actions at the point in the narrative where it happens) in such analyses.
It’s hardly a wilful ignorance, it’s a deliberate rejection. A good decision theory, by nature, should produce results that don’t actually depend on visible precommitments to achieve negotiation equilibrium, since an ideal agent negotiating ought to be able to accept postcommitment to things you would predictably wish you’d precommitted to. And if a decision theory doesn’t allow you to hold out for fairness in the face of an uneven power dynamic, why even have one?