Nice post. I’m excited about the bargaining interpretation of UDT.
However, if we think of our probability for the coin-flip as the result of bargaining, it makes sense that it might be sensitive to size. The negotiation which was willing to trade $100 from one branch to get $10,000 in another branch need not be equally willing to perform that trade arbitrarily many times.
Given this, is there any reason to focus on iterated counterfactual mugging, as opposed to just counterfactual muggings with higher stakes?
It seems like iteration is maybe related to learning. That doesn’t make a difference for counterfactual mugging, because you’ll learn nothing relevant over time.
For counterlogical muggings about the Nth digit of pi, we can imagine a scenario where you would have learned the Nth digit of pi after 1000 days, and therefore wouldn’t have paid if Omega had first offered you the deal on the 1001st day. But now it’s confounded by the fact that he already told you about it… So maybe there’s something here where you stop taking the deal on the day when you would have found out the Nth digit of pi if Omega hadn’t appeared?
Yeah, I’m kind of connecting a lot of threads here in a messy way. This post definitely could be better-organized.
I have a sense that open-minded UDT should relate to objective probabilities in a frequentist sense. For example, in decision problems involving Omega, it’s particularly compelling if we stipulate that Omega has a long history of offering similar choices to mortals and a track record of being honest and predicting correctly. This is in some sense the most compelling way we can come to know what decision problem we face; and, it relies on framing our decision as part of a sequence. Counterlogical mugging on a digit of pi is similarly compelling if we imagine a very large digit, but becomes less compelling as we imagine digits closer to the beginning of pi. I want to suggest a learning principle with frequentist-flavored guarantees (similar to LIDT or BRIA but less updateful).
On the other hand, the bargaining framing does not have anything to do with iteration. The bargaining idea in some sense feels much more promising, since I can already offer a toy analysis supporting my intuition that iterated counterfactual mugging with the some coin is less tempting than iterated muggings with different coins.
For counterlogical mugging, it’s unclear if it should be possible to correctly discover the parity of the relevant digit of pi. I would expect that in the counterfactual where it’s even, it will eventually be discovered to be even. And in the countefactual where it’s odd, that same digit will eventually be discovered to be odd.
ASP and Transparent Newcomb might be closer to test cases for formulating updateless policies that have the character of getting better as they grow more powerful. These problems ask the agent to use a decision procedure that intentionally doesn’t take certain information into account, whether the agent as a whole has access to that information or not. But they lack future steps that would let that decision procedure benefit from eventually getting stronger than the agent that initially formulated it, so these aren’t quite the thought experiments needed here.
Nice post. I’m excited about the bargaining interpretation of UDT.
Given this, is there any reason to focus on iterated counterfactual mugging, as opposed to just counterfactual muggings with higher stakes?
It seems like iteration is maybe related to learning. That doesn’t make a difference for counterfactual mugging, because you’ll learn nothing relevant over time.
For counterlogical muggings about the Nth digit of pi, we can imagine a scenario where you would have learned the Nth digit of pi after 1000 days, and therefore wouldn’t have paid if Omega had first offered you the deal on the 1001st day. But now it’s confounded by the fact that he already told you about it… So maybe there’s something here where you stop taking the deal on the day when you would have found out the Nth digit of pi if Omega hadn’t appeared?
Yeah, I’m kind of connecting a lot of threads here in a messy way. This post definitely could be better-organized.
I have a sense that open-minded UDT should relate to objective probabilities in a frequentist sense. For example, in decision problems involving Omega, it’s particularly compelling if we stipulate that Omega has a long history of offering similar choices to mortals and a track record of being honest and predicting correctly. This is in some sense the most compelling way we can come to know what decision problem we face; and, it relies on framing our decision as part of a sequence. Counterlogical mugging on a digit of pi is similarly compelling if we imagine a very large digit, but becomes less compelling as we imagine digits closer to the beginning of pi. I want to suggest a learning principle with frequentist-flavored guarantees (similar to LIDT or BRIA but less updateful).
On the other hand, the bargaining framing does not have anything to do with iteration. The bargaining idea in some sense feels much more promising, since I can already offer a toy analysis supporting my intuition that iterated counterfactual mugging with the some coin is less tempting than iterated muggings with different coins.
For counterlogical mugging, it’s unclear if it should be possible to correctly discover the parity of the relevant digit of pi. I would expect that in the counterfactual where it’s even, it will eventually be discovered to be even. And in the countefactual where it’s odd, that same digit will eventually be discovered to be odd.
ASP and Transparent Newcomb might be closer to test cases for formulating updateless policies that have the character of getting better as they grow more powerful. These problems ask the agent to use a decision procedure that intentionally doesn’t take certain information into account, whether the agent as a whole has access to that information or not. But they lack future steps that would let that decision procedure benefit from eventually getting stronger than the agent that initially formulated it, so these aren’t quite the thought experiments needed here.