It is possible to get rid of the need to consider worlds in which some players don’t exist, by treating P∖j as optimization for a subset of players. This can be meaningful in the context of a single entity (e.g. the AI) optimizing for the preferences of P∖j, or in the context of game-theory, where we interpret it as having all players coordinate in a manner that optimizes for the utilities of P∖j (in the latter context, it makes sense to first discard any outcome that assigns a below-minimax payoff to any player[1]). The disadvantage is, this admits BATNAs in which some people get worse-than-death payoffs (because of adversarial preferences of other people). On the other hand, it is still “threat resistant” in the sense that, the mechanism itself doesn’t generate any incentive to harm people.
It would be interesting to compare this with Diffractor’s ROSE point.
Regarded as a candidate definition for a fully-general abstract game-theoretic superrational optimum, this still seems lacking, because regarding the minimax in a game of more than two players seems too weak. Maybe there is a version based on some notion of “coalition minimax”.
It is possible to get rid of the need to consider worlds in which some players don’t exist, by treating P∖j as optimization for a subset of players. This can be meaningful in the context of a single entity (e.g. the AI) optimizing for the preferences of P∖j, or in the context of game-theory, where we interpret it as having all players coordinate in a manner that optimizes for the utilities of P∖j (in the latter context, it makes sense to first discard any outcome that assigns a below-minimax payoff to any player[1]). The disadvantage is, this admits BATNAs in which some people get worse-than-death payoffs (because of adversarial preferences of other people). On the other hand, it is still “threat resistant” in the sense that, the mechanism itself doesn’t generate any incentive to harm people.
It would be interesting to compare this with Diffractor’s ROSE point.
Regarded as a candidate definition for a fully-general abstract game-theoretic superrational optimum, this still seems lacking, because regarding the minimax in a game of more than two players seems too weak. Maybe there is a version based on some notion of “coalition minimax”.