The main place where I disagree with you here is that you make such a big distinction between “really good at figuring out personalities” (3) and “running a full or partial simulation of the situation” (4). As your phrase “full or partial” suggests, simulations have a large range of fidelity. To illustrate a bit:
Fully precise physical sim, in a physics which allows such a thing. (Quantum mechanics poses some problems for this in our universe.)
Simulation which is highly accurate down to the atomic level.
Simulation which is highly accurate down to the cell level.
Simulation which (like many modern fluid dynamic sims) identifies regions which need to be run at higher/lower levels of fidelity, meaning that in principle, some areas can be simulated down to the atomic level, while other areas may be simulated at a miles-large scale.
Like the above, but also accelerated by using a neural network trained on the likely outcome of all the possible configurations, allowing the simulation to skip forward several steps at a time instead of going one step at a time.
...
Somewhere on this continuum sits “really good at figuring out personalities”. So where do you draw the line? If it’s a matter of degree (as seems likely), how do you handle the shift from taking both boxes (as you claim is right for scenario 3) to one box (as you claim is right for scenario 4)?
In scenario 3, you would get 1k, while a 2-boxer would get 1m. You can comfort yourself in the “correctness” of your decision, but you’re still losing out. More to the point, a rational agent would prefer to self-modify to be the sort of agent that gets 1m in this kind of situation, if possible. So, your decision theory is not stable under self-reflection.
The main place where I disagree with you here is that you make such a big distinction between “really good at figuring out personalities” (3) and “running a full or partial simulation of the situation” (4). As your phrase “full or partial” suggests, simulations have a large range of fidelity. To illustrate a bit:
Fully precise physical sim, in a physics which allows such a thing. (Quantum mechanics poses some problems for this in our universe.)
Simulation which is highly accurate down to the atomic level.
Simulation which is highly accurate down to the cell level.
Simulation which (like many modern fluid dynamic sims) identifies regions which need to be run at higher/lower levels of fidelity, meaning that in principle, some areas can be simulated down to the atomic level, while other areas may be simulated at a miles-large scale.
Like the above, but also accelerated by using a neural network trained on the likely outcome of all the possible configurations, allowing the simulation to skip forward several steps at a time instead of going one step at a time.
...
Somewhere on this continuum sits “really good at figuring out personalities”. So where do you draw the line? If it’s a matter of degree (as seems likely), how do you handle the shift from taking both boxes (as you claim is right for scenario 3) to one box (as you claim is right for scenario 4)?
In scenario 3, you would get 1k, while a 2-boxer would get 1m. You can comfort yourself in the “correctness” of your decision, but you’re still losing out. More to the point, a rational agent would prefer to self-modify to be the sort of agent that gets 1m in this kind of situation, if possible. So, your decision theory is not stable under self-reflection.