I mostly had in mind 2. Not sure how predicting humans is different from putting humans in hypotheticals. It seems like the same problems could happen.
I agree that the same problem appears for ALBA. I was originally working with proposals where the improbability of the human’s situation was bounded, but the recursive structure can lead to arbitrarily large improbability. I hadn’t thought about this explicitly.
Predicting humans is different from putting humans in hypotheticals, in the sense that in principle you can actually sample from the situations that cause humans to think they are in a simulation or whatever.
For example, suppose the human had access to a button that said “This is weird, I’m probably in a simulation,” and suppose that we expected the human to press it in any case where they would start behaving weirdly. Then we could potentially sample from the subset of situations where the human presses the button. And if we manage to do that, then the human isn’t right to suspect they are in a simulation (any more than they already should believe they are in a simulation, prior to even building the AI).
I agree that you shouldn’t expect the current version of the scheme to cope with this problem. And it’s certainly not obvious that the approach sketched above can work, there are many important questions. But I do think that there is a clear path to fixing this kind of problem.
(Interestingly, this is exactly analogous to what the active learning scheme has to do, in order to get the prediction algorithms to predict well.)
ETA: although I hadn’t thought about this explicitly in the context of ALBA, I have expected to need some way to overweight “weird” situations in order to stop them from being problematic, ever since here.
I mostly had in mind 2. Not sure how predicting humans is different from putting humans in hypotheticals. It seems like the same problems could happen.
I agree that the same problem appears for ALBA. I was originally working with proposals where the improbability of the human’s situation was bounded, but the recursive structure can lead to arbitrarily large improbability. I hadn’t thought about this explicitly.
Predicting humans is different from putting humans in hypotheticals, in the sense that in principle you can actually sample from the situations that cause humans to think they are in a simulation or whatever.
For example, suppose the human had access to a button that said “This is weird, I’m probably in a simulation,” and suppose that we expected the human to press it in any case where they would start behaving weirdly. Then we could potentially sample from the subset of situations where the human presses the button. And if we manage to do that, then the human isn’t right to suspect they are in a simulation (any more than they already should believe they are in a simulation, prior to even building the AI).
I agree that you shouldn’t expect the current version of the scheme to cope with this problem. And it’s certainly not obvious that the approach sketched above can work, there are many important questions. But I do think that there is a clear path to fixing this kind of problem.
(Interestingly, this is exactly analogous to what the active learning scheme has to do, in order to get the prediction algorithms to predict well.)
ETA: although I hadn’t thought about this explicitly in the context of ALBA, I have expected to need some way to overweight “weird” situations in order to stop them from being problematic, ever since here.