It seems like you have to get close to eliminating malign hypotheses in order to apply such methods (i.e. they don’t work once malign hypotheses have > 99.9999999% of probability, so you need to ensure that benign hypothesis description is within 30 bits of the good hypothesis), and embededness alone isn’t enough to get you there.
Why is embededness not enough? Once you don’t have bridge rules, what is left is the laws of physics. What does the malign hypothesis explain about the laws of physics that the true hypothesis doesn’t explain?
I suspect (but don’t have a proof or even a theorem statement) that IB physicalism produces some kind of agreement theorem for different agents within the same universe, which would guarantee that the user and the AI should converge to the same beliefs (provided that both of them follow IBP).
I mean that you have some utility function, are choosing actions based on E[utility|action], and perform solomonoff induction only instrumentally because it suggests ways in which your own decision is correlated with utility. There is still something like the universal prior in the definition of utility, but it no longer cares at all about your particular experiences...
I’m not sure I follow your reasoning, but IBP sort of does that. In IBP we don’t have subjective expectations per se, only an equation for how to “updatelessly” evaluate different policies.
I agree that the situation is better when solomonoff induction is something you are reasoning about rather than an approximate description of your reasoning. In that case it’s not completely pathological, but it still seems bad in a similar way to reason about the world by reasoning about other agents reasoning about the world (rather than by direct learning the lessons that those agents have learned and applying those lessons in the same way that those agents would apply them).
Okay, but suppose that the AI has real evidence for the simulation hypothesis (evidence that we would consider valid). For example, suppose that there is some metacosmological explanation for the precise value of the fine structure constant (not in the sense of, this is the value which supports life, but in the sense of, this is the value that simulators like to simulate). Do you agree that in this case it is completely rational for the AI to reason about the world via reasoning about the simulators?
I’m not sure I follow your reasoning, but IBP sort of does that. In IBP we don’t have subjective expectations per se, only an equation for how to “updatelessly” evaluate different policies.
It seems like any approach that evaluates policies based on their consequences is fine, isn’t it? That is, malign hypotheses dominate the posterior for my experiences, but not for things I consider morally valuable.
I may just not be understanding the proposal for how the IBP agent differs from the non-IBP agent. It seems like we are discussing a version that defines values differently, but where neither agent uses Solomonoff induction directly. Is that right?
It seems like any approach that evaluates policies based on their consequences is fine, isn’t it? That is, malign hypotheses dominate the posterior for my experiences, but not for things I consider morally valuable.
Why? Maybe you’re thinking of UDT? In which case, it’s sort of true but IBP is precisely a formalization of UDT + extra nuance regarding the input of the utility function.
I may just not be understanding the proposal for how the IBP agent differs from the non-IBP agent.
Well, IBP is explained here. I’m not sure what kind of non-IBP agent you’re imagining.
Why is embededness not enough? Once you don’t have bridge rules, what is left is the laws of physics. What does the malign hypothesis explain about the laws of physics that the true hypothesis doesn’t explain?
I suspect (but don’t have a proof or even a theorem statement) that IB physicalism produces some kind of agreement theorem for different agents within the same universe, which would guarantee that the user and the AI should converge to the same beliefs (provided that both of them follow IBP).
I’m not sure I follow your reasoning, but IBP sort of does that. In IBP we don’t have subjective expectations per se, only an equation for how to “updatelessly” evaluate different policies.
Okay, but suppose that the AI has real evidence for the simulation hypothesis (evidence that we would consider valid). For example, suppose that there is some metacosmological explanation for the precise value of the fine structure constant (not in the sense of, this is the value which supports life, but in the sense of, this is the value that simulators like to simulate). Do you agree that in this case it is completely rational for the AI to reason about the world via reasoning about the simulators?
It seems like any approach that evaluates policies based on their consequences is fine, isn’t it? That is, malign hypotheses dominate the posterior for my experiences, but not for things I consider morally valuable.
I may just not be understanding the proposal for how the IBP agent differs from the non-IBP agent. It seems like we are discussing a version that defines values differently, but where neither agent uses Solomonoff induction directly. Is that right?
Why? Maybe you’re thinking of UDT? In which case, it’s sort of true but IBP is precisely a formalization of UDT + extra nuance regarding the input of the utility function.
Well, IBP is explained here. I’m not sure what kind of non-IBP agent you’re imagining.