Can you elaborate on how it might defuse Pascal’s Mugging? It seems the problem there is that, no matter how low your prior, the mugger can just increase the number of victims until the expected utility of paying up overwhelms that of not paying. Hypothesis complexity doesn’t seem to play in, and even if I were using it to assign a low prior, this could still be overcome.
That said, any solution to the problem (Robin’s of course being a good start) is more than welcome.
Not sure. I must’ve gone crazy for a minute there, thinking something like “being able to influence 3^^^^3 people is a huge conjunction of statements, thus low probability”—but of course the universal prior doesn’t work like that. Struck that part.
It still seems relevant to me, since like in the “if my brother’s wife’s first son’s best friend flips a coin, it will fall heads” example, the prior probability actually comes from opening up the statement and looking inside, in a way that would also differentiate just fine between stubbing 3 toes and stubbing 3^3^3 toes.
Question: can we construct a low-complexity event that has universal prior much lower than is implied by its complexity, in other words it describes a relatively small set of programs, each of which has high complexity? Clearly it can’t just describe one program, but maybe with a whole set of them it’s possible. Naturally, the programs must be still hard-to-locate given the event.
See example 2 in the post. I think you can use Rice’s theorem to easily construct hypotheses with hard-to-locate predictors, but I’m not sure about the K-complexity of the resulting predictors.
K-complexity of the program defined by that criterion is about as low as that of the criterion, I’m afraid, so example 2 is invalid (“complexity” that is not K-complexity shouldn’t be relevant). The universal prior for that theory is not astronomically low.
Edit: This is wrong, in particular because the criterion doesn’t present an algorithm for finding the program, and because the program must by definition have high K-complexity.
Um, what? Can you exhibit a low-complexity algorithm that predicts sensory inputs in accordance with the theory from example 2? That’s what it would mean for the universal prior to not be low. Or am I missing something?
Pascal’s mugging shouldn’t come into it if you are using the universal prior. Or at least following an AIXI-ish version. Because AIXI does not intrinsically allow people to tell it the utility of actions, it figures them out for itself empirically. It treats the utility as another part of the world it is trying to predict.
Isn’t the solution the same as for Pascal’s Wager? That is, just as Muslim Heaven/Hell cancels out Christian Heaven/Hell, the possibility that hell is triggered if you give in to the mugger cancels out the possibility that the mugger is telling the truth.
It doesn’t apply in quite the same way. You would have to be able to assert that there was an equal or greater chance that the mugger would do the opposite of what he says.
If there is a 99% (obviously it’s much higher, but you see the idea) chance he’s lying and won’t do anything, that still doesn’t cancel out that 1% chance he’s telling the truth, because the expected utility multiplied by 3^^^3 people (or whatever) still overwhelms. Now if you could say it was equally likely that he would torture those people only if you DID pay him, that would nullify it. But it’s not clear that you can do this because most muggers are not playing tricksy opposite-day games when they threaten you. And if the guy is really evil enough to set up a trick like that, it seems like he’d just go ahead and torture the people without consulting you.
Evidence on the actual tendencies of omnipotent muggers is lacking, but you can at least see why it’s not clear that these cancel out.
Can you elaborate on how it might defuse Pascal’s Mugging? It seems the problem there is that, no matter how low your prior, the mugger can just increase the number of victims until the expected utility of paying up overwhelms that of not paying. Hypothesis complexity doesn’t seem to play in, and even if I were using it to assign a low prior, this could still be overcome.
That said, any solution to the problem (Robin’s of course being a good start) is more than welcome.
Not sure. I must’ve gone crazy for a minute there, thinking something like “being able to influence 3^^^^3 people is a huge conjunction of statements, thus low probability”—but of course the universal prior doesn’t work like that. Struck that part.
It still seems relevant to me, since like in the “if my brother’s wife’s first son’s best friend flips a coin, it will fall heads” example, the prior probability actually comes from opening up the statement and looking inside, in a way that would also differentiate just fine between stubbing 3 toes and stubbing 3^3^3 toes.
Question: can we construct a low-complexity event that has universal prior much lower than is implied by its complexity, in other words it describes a relatively small set of programs, each of which has high complexity? Clearly it can’t just describe one program, but maybe with a whole set of them it’s possible. Naturally, the programs must be still hard-to-locate given the event.
See example 2 in the post. I think you can use Rice’s theorem to easily construct hypotheses with hard-to-locate predictors, but I’m not sure about the K-complexity of the resulting predictors.
K-complexity of the program defined by that criterion is about as low as that of the criterion, I’m afraid, so example 2 is invalid (“complexity” that is not K-complexity shouldn’t be relevant). The universal prior for that theory is not astronomically low.
Edit: This is wrong, in particular because the criterion doesn’t present an algorithm for finding the program, and because the program must by definition have high K-complexity.
Um, what? Can you exhibit a low-complexity algorithm that predicts sensory inputs in accordance with the theory from example 2? That’s what it would mean for the universal prior to not be low. Or am I missing something?
You are right, see updated comment.
Yes, forgot about that. So, just crossing the meta-levels once is enough to create a gap in complexity, even if the event only has one element.
Pascal’s mugging shouldn’t come into it if you are using the universal prior. Or at least following an AIXI-ish version. Because AIXI does not intrinsically allow people to tell it the utility of actions, it figures them out for itself empirically. It treats the utility as another part of the world it is trying to predict.
Isn’t the solution the same as for Pascal’s Wager? That is, just as Muslim Heaven/Hell cancels out Christian Heaven/Hell, the possibility that hell is triggered if you give in to the mugger cancels out the possibility that the mugger is telling the truth.
It doesn’t apply in quite the same way. You would have to be able to assert that there was an equal or greater chance that the mugger would do the opposite of what he says.
If there is a 99% (obviously it’s much higher, but you see the idea) chance he’s lying and won’t do anything, that still doesn’t cancel out that 1% chance he’s telling the truth, because the expected utility multiplied by 3^^^3 people (or whatever) still overwhelms. Now if you could say it was equally likely that he would torture those people only if you DID pay him, that would nullify it. But it’s not clear that you can do this because most muggers are not playing tricksy opposite-day games when they threaten you. And if the guy is really evil enough to set up a trick like that, it seems like he’d just go ahead and torture the people without consulting you.
Evidence on the actual tendencies of omnipotent muggers is lacking, but you can at least see why it’s not clear that these cancel out.