Since this hypothesis makes distinct predictions, it is possible for the confidence to rise above 50% after finitely many observations. At that point, since the listener expects each theorem of PA to eventually be listed, with probability > 50%, and the listener believes the speaker, the listener must assign > 50% probability to each theorem of PA!
I don’t see how this follows. At the point where the confidence in PA rises above 50%, why can’t the agent be mistaken about what the theorems of PA are? For example, let T be a theorem of PA that hasn’t been claimed yet. Why can’t the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn’t seem like this violates any of your assumptions. I suspect you wanted to use Assumption 2 here:
A listenerbelieves a speaker to be honestif the listener distinguishes between “X” and “the speaker claims X at time t” (aka “claimst-X”), and also has beliefs such that P(X| claimst-X)=1 when P(claims-X) > 0.
But as far as I can tell the scenario I gave is compatible with that assumption.
I think there is some confusion here coming from the unclear notion of a Bayesian agent with beliefs about theorems of PA. The reformulation I gave with Alice, Bob and Carol makes the problem clearer, I think.
Yeah, I did find that reformulation clearer, but it also then seems to not be about filtered evidence?
Like, it seems like you need two conditions to get the impossibility result, now using English instead of math:
1. Alice believes Carol is always honest (at least with probability > 50%)
2. For any statement s: [if Carol will ever say s, Alice currently believes that Carol will eventually say s (at least with probability > 50%)]
It really seems like the difficulty here is with condition 2, not with condition 1, so I don’t see how this theorem has anything to do with filtered evidence.
Maybe the point is just “you can’t perfectly update on X and Carol-said-X , because you can’t have a perfect model of them, because you aren’t bigger than they are”?
(Probably you agree with this, given your comment.)
The problem is not in one of the conditions separately but in their conjunction: see my follow-up comment. You could argue that learning an exact model of Carol doesn’t really imply condition 2 since, although the model does imply everything Carol is ever going to say, Alice is not capable of extracting this information from the model. But then it becomes a philosophical question of what does it mean to “believe” something. I think there is value in the “behaviorist” interpretation that “believing X” means “behaving optimally given X”. In this sense, Alice can separately believe the two facts described by conditions 1 and 2, but cannot believe their conjunction.
I still don’t get it but probably not worth digging further. My current confusion is that even under the behaviorist interpretation, it seems like just believing condition 2 implies knowing all the things Carol would ever say (or Alice has a mistaken belief). Probably this is a confusion that would go away with enough formalization / math, but it doesn’t seem worth doing that.
I’m not sure exactly what the source of your confusion is, but:
I don’t see how this follows. At the point where the confidence in PA rises above 50%, why can’t the agent be mistaken about what the theorems of PA are?
The confidence in PA as a hypothesis about what the speaker is saying is what rises above 50%. Specifically, an efficiently computable hypothesis eventually enumerating all and only the theorems of PA rises above 50%.
For example, let T be a theorem of PA that hasn’t been claimed yet. Why can’t the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn’t seem like this violates any of your assumptions.
This violates the assumption of honesty that you quote, because the agent simultaneously has P(H) > 0.5 for a hypothesis H such that P(obs_n-T | H) = 1, for some (possibly very large) n, and yet also believes P(T) < 0.5. This is impossible since it must be that P(obs_n-T) > 0.5, due to P(H) > 0.5, and therefore must be that P(T) > 0.5, by honesty.
Yeah, I feel like while honesty is needed to prove the impossibility result, the problem arose with the assumption that the agent could effectively reason now about all the outputs of a recursively enumerable process (regardless of honesty). Like, the way I would phrase this point is “you can’t perfectly update on X and Carol-said-X , because you can’t have a perfect model of Carol”; this applies whether or not Carol is honest. (See also this comment.)
I agree with your first sentence, but I worry you may still be missing my point here, namely that the Bayesian notion of belief doesn’t allow us to make the distinction you are pointing to. If a hypothesis implies something, it implies it “now”; there is no “the conditional probability is 1 but that isn’t accessible to me yet”.
I also think this result has nothing to do with “you can’t have a perfect model of Carol”. Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.
the Bayesian notion of belief doesn’t allow us to make the distinction you are pointing to
Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.
I also think this result has nothing to do with “you can’t have a perfect model of Carol”. Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.
I think we mean different things by “perfect model”. What if I instead say “you can’t perfectly update on X and Carol-said-X , because you can’t know why Carol said X, because that could in the worst case require you to know everything that Carol will say in the future”?
Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.
Yeah, ok. This post as written is really less the kind of thing somebody who has followed all the MIRI thinking needs to hear and more the kind of thing one might bug an orthodox Bayesian with. I framed it in terms of filtered evidence because I came up with it by thinking about some confusion I was having about filtered evidence. And it does problematize the Bayesian treatment. But in terms of actual research progress it would be better framed as a negative result about whether Sam’s untrollable prior can be modified to have richer learning.
I think we mean different things by “perfect model”. What if [...]
I don’t see how this follows. At the point where the confidence in PA rises above 50%, why can’t the agent be mistaken about what the theorems of PA are? For example, let T be a theorem of PA that hasn’t been claimed yet. Why can’t the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn’t seem like this violates any of your assumptions. I suspect you wanted to use Assumption 2 here:
But as far as I can tell the scenario I gave is compatible with that assumption.
I think there is some confusion here coming from the unclear notion of a Bayesian agent with beliefs about theorems of PA. The reformulation I gave with Alice, Bob and Carol makes the problem clearer, I think.
Yeah, I did find that reformulation clearer, but it also then seems to not be about filtered evidence?
Like, it seems like you need two conditions to get the impossibility result, now using English instead of math:
1. Alice believes Carol is always honest (at least with probability > 50%)
2. For any statement s: [if Carol will ever say s, Alice currently believes that Carol will eventually say s (at least with probability > 50%)]
It really seems like the difficulty here is with condition 2, not with condition 1, so I don’t see how this theorem has anything to do with filtered evidence.
Maybe the point is just “you can’t perfectly update on X and Carol-said-X , because you can’t have a perfect model of them, because you aren’t bigger than they are”?
(Probably you agree with this, given your comment.)
The problem is not in one of the conditions separately but in their conjunction: see my follow-up comment. You could argue that learning an exact model of Carol doesn’t really imply condition 2 since, although the model does imply everything Carol is ever going to say, Alice is not capable of extracting this information from the model. But then it becomes a philosophical question of what does it mean to “believe” something. I think there is value in the “behaviorist” interpretation that “believing X” means “behaving optimally given X”. In this sense, Alice can separately believe the two facts described by conditions 1 and 2, but cannot believe their conjunction.
I still don’t get it but probably not worth digging further. My current confusion is that even under the behaviorist interpretation, it seems like just believing condition 2 implies knowing all the things Carol would ever say (or Alice has a mistaken belief). Probably this is a confusion that would go away with enough formalization / math, but it doesn’t seem worth doing that.
I’m not sure exactly what the source of your confusion is, but:
The confidence in PA as a hypothesis about what the speaker is saying is what rises above 50%. Specifically, an efficiently computable hypothesis eventually enumerating all and only the theorems of PA rises above 50%.
This violates the assumption of honesty that you quote, because the agent simultaneously has P(H) > 0.5 for a hypothesis H such that P(obs_n-T | H) = 1, for some (possibly very large) n, and yet also believes P(T) < 0.5. This is impossible since it must be that P(obs_n-T) > 0.5, due to P(H) > 0.5, and therefore must be that P(T) > 0.5, by honesty.
Yeah, I feel like while honesty is needed to prove the impossibility result, the problem arose with the assumption that the agent could effectively reason now about all the outputs of a recursively enumerable process (regardless of honesty). Like, the way I would phrase this point is “you can’t perfectly update on X and Carol-said-X , because you can’t have a perfect model of Carol”; this applies whether or not Carol is honest. (See also this comment.)
I agree with your first sentence, but I worry you may still be missing my point here, namely that the Bayesian notion of belief doesn’t allow us to make the distinction you are pointing to. If a hypothesis implies something, it implies it “now”; there is no “the conditional probability is 1 but that isn’t accessible to me yet”.
I also think this result has nothing to do with “you can’t have a perfect model of Carol”. Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.
Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.
I think we mean different things by “perfect model”. What if I instead say “you can’t perfectly update on X and Carol-said-X , because you can’t know why Carol said X, because that could in the worst case require you to know everything that Carol will say in the future”?
Yeah, ok. This post as written is really less the kind of thing somebody who has followed all the MIRI thinking needs to hear and more the kind of thing one might bug an orthodox Bayesian with. I framed it in terms of filtered evidence because I came up with it by thinking about some confusion I was having about filtered evidence. And it does problematize the Bayesian treatment. But in terms of actual research progress it would be better framed as a negative result about whether Sam’s untrollable prior can be modified to have richer learning.
Yep, I agree with everything you say here.