Yeah, so there are four options, (B1∧B2)∨(¬B1∧B2)∨(B1∧¬B2)∨(¬B1∧¬B2). These will have the ratios 0.99×0.9999:0.01×0.9999:0.99×0.0001:0.01×0.0001. By D4 we’d eliminate the first one. The remaining odds ratios are normalized to be something around 0:0.9901:0.0098:0.0001. I.e. given that the agent takes $5 instead of $10, it is pretty sure that it’s taken the smaller one for some reason, gives a tiny probability of it having miscalculated which of $5 and $10 are larger, and a really really small probability that both are true.
In fact were it to reason further it would see that the fourth option is also impossible, we have an XOR type situation on our hands. Then it would end up with odds around 0:0.9902:0.0098:0.
That last bit was assuming that it doesn’t have uncertainty about its own reasoning capability.
Ideally it would also consider that D4 might be incorrect , and still assign some tiny ϵ of probability (10−10 for example, the point is it should be pretty small to both the first and fourth options giving 10−10:0.9902:0.0098:10−10. It wouldn’t really consider them for the purposes of making predictions, but to avoid logical explosions, we never assign a “true” zero.
Can you give the probabilities that the agent assigns to B1 through D4 in the “sandboxed” counterfactual?
Yeah, so there are four options, (B1∧B2)∨(¬B1∧B2)∨(B1∧¬B2)∨(¬B1∧¬B2). These will have the ratios 0.99×0.9999:0.01×0.9999:0.99×0.0001:0.01×0.0001. By D4 we’d eliminate the first one. The remaining odds ratios are normalized to be something around 0:0.9901:0.0098:0.0001. I.e. given that the agent takes $5 instead of $10, it is pretty sure that it’s taken the smaller one for some reason, gives a tiny probability of it having miscalculated which of $5 and $10 are larger, and a really really small probability that both are true.
In fact were it to reason further it would see that the fourth option is also impossible, we have an XOR type situation on our hands. Then it would end up with odds around 0:0.9902:0.0098:0.
That last bit was assuming that it doesn’t have uncertainty about its own reasoning capability.
Ideally it would also consider that D4 might be incorrect , and still assign some tiny ϵ of probability (10−10 for example, the point is it should be pretty small to both the first and fourth options giving 10−10:0.9902:0.0098:10−10. It wouldn’t really consider them for the purposes of making predictions, but to avoid logical explosions, we never assign a “true” zero.
Nice!!