I didn’t get around to providing more clarity. I’ll do that now:
Both parties would click the button if it was clear that the other party would not click the button in retaliation. This way they do not have to worry about being wiped off the map.
The two parties would both prefer a world in which only the other party survives to a world without any humanity.
We know that the other party will click the button if and only if they predict with extremely high confidence that we will not retaliate. Our position is the same.
It is important how much they prefer each outcome. All these decision theories work with utilities, not just with preference orderings over outcomes. You could derive utilities from preference ordering over lotteries, e.g. would they prefer a 50:50 split between winning and extinction over the status quo? What about 10:90 or 90:10?
There are also unknown probabilities such as chance of failure of each AI to predict the other side correctly, chance of false positive or negative launch detection, chance that someone is deliberately feeding false information to some participants, chance of a launch without having pressed the button or failure to launch after pressing the button, chance that this is just a simulation, and so on. Even if they’re small probabilities, they are critical to this scenario.
For the second paragraph, we’re assuming this AI has not made a mistake in predicting human behavior yet after many, many trials in different scenarios. No exact probability. We’re also assuming perfect levels of observation, so we know that they pressed a button, bombs are heading over, and any observable context behind the decision (like false information).
The first paragraph contains an idea I hadn’t considered, and it might be central to the whole thing. I’ll ponder it more.
The main reason I mention it is that the scenario posits that the other side has already launched, and now it’s time for you to make a decision.
The trouble is that if the AI has not made a mistake, you should get an earlier chance to make a decision: when the AI tells you that it predicts that they are going to launch. Obviously your AI has failed, and maybe theirs did too. For EDT and CDT it makes no difference and you should not press the button, since (as per the additional information that all observations are perfect) the outcomes of your actions are certain.
It does make a difference to theories such as FDT, but in a very unclear way.
Under FDT the optimal action to take after the AI warns you that they are going to launch is to do nothing. When you subsequently confirm (with perfect reliability as per scenario) that they have launched, FDT says that you should launch.
However, you’re not in either of those positions. You didn’t get a chance to make an earlier decision, so your AI must have failed. If you knew that this happened due to enemy action, FDT says that you should launch. If you are certain that it is not, then you should not. If you are uncertain then it depends upon what type of probability distribution you have over hypotheses, what your exact utilities are, and what other information you may be able to gather before you can no longer launch.
I didn’t get around to providing more clarity. I’ll do that now:
Both parties would click the button if it was clear that the other party would not click the button in retaliation. This way they do not have to worry about being wiped off the map.
The two parties would both prefer a world in which only the other party survives to a world without any humanity.
We know that the other party will click the button if and only if they predict with extremely high confidence that we will not retaliate. Our position is the same.
It is important how much they prefer each outcome. All these decision theories work with utilities, not just with preference orderings over outcomes. You could derive utilities from preference ordering over lotteries, e.g. would they prefer a 50:50 split between winning and extinction over the status quo? What about 10:90 or 90:10?
There are also unknown probabilities such as chance of failure of each AI to predict the other side correctly, chance of false positive or negative launch detection, chance that someone is deliberately feeding false information to some participants, chance of a launch without having pressed the button or failure to launch after pressing the button, chance that this is just a simulation, and so on. Even if they’re small probabilities, they are critical to this scenario.
For the second paragraph, we’re assuming this AI has not made a mistake in predicting human behavior yet after many, many trials in different scenarios. No exact probability. We’re also assuming perfect levels of observation, so we know that they pressed a button, bombs are heading over, and any observable context behind the decision (like false information).
The first paragraph contains an idea I hadn’t considered, and it might be central to the whole thing. I’ll ponder it more.
The main reason I mention it is that the scenario posits that the other side has already launched, and now it’s time for you to make a decision.
The trouble is that if the AI has not made a mistake, you should get an earlier chance to make a decision: when the AI tells you that it predicts that they are going to launch. Obviously your AI has failed, and maybe theirs did too. For EDT and CDT it makes no difference and you should not press the button, since (as per the additional information that all observations are perfect) the outcomes of your actions are certain.
It does make a difference to theories such as FDT, but in a very unclear way.
Under FDT the optimal action to take after the AI warns you that they are going to launch is to do nothing. When you subsequently confirm (with perfect reliability as per scenario) that they have launched, FDT says that you should launch.
However, you’re not in either of those positions. You didn’t get a chance to make an earlier decision, so your AI must have failed. If you knew that this happened due to enemy action, FDT says that you should launch. If you are certain that it is not, then you should not. If you are uncertain then it depends upon what type of probability distribution you have over hypotheses, what your exact utilities are, and what other information you may be able to gather before you can no longer launch.