Rohin Shah comments on Thinking About Filtered Evidence Is (Very!) Hard

Rohin Shah 25 Mar 2020 20:19 UTC
LW: 11 AF: 4
AF
Since this hypothesis makes distinct predictions, it is possible for the confidence to rise above 50% after finitely many observations. At that point, since the listener expects each theorem of PA to eventually be listed, with probability > 50%, and the listener believes the speaker, the listener must assign > 50% probability to each theorem of PA!
I don’t see how this follows. At the point where the confidence in PA rises above 50%, why can’t the agent be mistaken about what the theorems of PA are? For example, let T be a theorem of PA that hasn’t been claimed yet. Why can’t the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn’t seem like this violates any of your assumptions. I suspect you wanted to use Assumption 2 here:
A listener believes a speaker to be honest if the listener distinguishes between “X” and “the speaker claims X at time t” (aka “claimst-X”), and also has beliefs such that P(X| claimst-X)=1 when P(claims-X) > 0.
But as far as I can tell the scenario I gave is compatible with that assumption.
- Vanessa Kosoy 27 Mar 2020 14:33 UTC
  LW: 13 AF: 8
  AF Parent
  I think there is some confusion here coming from the unclear notion of a Bayesian agent with beliefs about theorems of PA. The reformulation I gave with Alice, Bob and Carol makes the problem clearer, I think.
  - Rohin Shah 27 Mar 2020 17:51 UTC
    LW: 11 AF: 4
    AF Parent
    Yeah, I did find that reformulation clearer, but it also then seems to not be about filtered evidence?
    Like, it seems like you need two conditions to get the impossibility result, now using English instead of math:
    1. Alice believes Carol is always honest (at least with probability > 50%)
    2. For any statement s: [if Carol will ever say s, Alice currently believes that Carol will eventually say s (at least with probability > 50%)]
    It really seems like the difficulty here is with condition 2, not with condition 1, so I don’t see how this theorem has anything to do with filtered evidence.
    Maybe the point is just “you can’t perfectly update on X and Carol-said-X , because you can’t have a perfect model of them, because you aren’t bigger than they are”?
    (Probably you agree with this, given your comment.)
    What links here?
    Rohin Shah's comment on Thinking About Filtered Evidence Is (Very!) Hard by abramdemski (1 Apr 2020 19:36 UTC; 4 points)
    - Vanessa Kosoy 29 Mar 2020 12:42 UTC
      LW: 6 AF: 5
      AF Parent
      The problem is not in one of the conditions separately but in their conjunction: see my follow-up comment. You could argue that learning an exact model of Carol doesn’t really imply condition 2 since, although the model does imply everything Carol is ever going to say, Alice is not capable of extracting this information from the model. But then it becomes a philosophical question of what does it mean to “believe” something. I think there is value in the “behaviorist” interpretation that “believing X” means “behaving optimally given X”. In this sense, Alice can separately believe the two facts described by conditions 1 and 2, but cannot believe their conjunction.
      - Rohin Shah 30 Mar 2020 7:01 UTC
        LW: 2 AF: 2
        AF Parent
        I still don’t get it but probably not worth digging further. My current confusion is that even under the behaviorist interpretation, it seems like just believing condition 2 implies knowing all the things Carol would ever say (or Alice has a mistaken belief). Probably this is a confusion that would go away with enough formalization / math, but it doesn’t seem worth doing that.
- abramdemski 1 Apr 2020 18:14 UTC
  LW: 8 AF: 6
  AF Parent
  I’m not sure exactly what the source of your confusion is, but:
  I don’t see how this follows. At the point where the confidence in PA rises above 50%, why can’t the agent be mistaken about what the theorems of PA are?
  The confidence in PA as a hypothesis about what the speaker is saying is what rises above 50%. Specifically, an efficiently computable hypothesis eventually enumerating all and only the theorems of PA rises above 50%.
  For example, let T be a theorem of PA that hasn’t been claimed yet. Why can’t the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn’t seem like this violates any of your assumptions.
  This violates the assumption of honesty that you quote, because the agent simultaneously has P(H) > 0.5 for a hypothesis H such that P(obs_n-T | H) = 1, for some (possibly very large) n, and yet also believes P(T) < 0.5. This is impossible since it must be that P(obs_n-T) > 0.5, due to P(H) > 0.5, and therefore must be that P(T) > 0.5, by honesty.
  - Rohin Shah 1 Apr 2020 19:36 UTC
    LW: 4 AF: 3
    AF Parent
    Yeah, I feel like while honesty is needed to prove the impossibility result, the problem arose with the assumption that the agent could effectively reason now about all the outputs of a recursively enumerable process (regardless of honesty). Like, the way I would phrase this point is “you can’t perfectly update on X and Carol-said-X , because you can’t have a perfect model of Carol”; this applies whether or not Carol is honest. (See also this comment.)
    - abramdemski 1 Apr 2020 20:23 UTC
      LW: 15 AF: 8
      AF Parent
      I agree with your first sentence, but I worry you may still be missing my point here, namely that the Bayesian notion of belief doesn’t allow us to make the distinction you are pointing to. If a hypothesis implies something, it implies it “now”; there is no “the conditional probability is 1 but that isn’t accessible to me yet”.
      
      I also think this result has nothing to do with “you can’t have a perfect model of Carol”. Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.
      - Rohin Shah 2 Apr 2020 17:02 UTC
        LW: 2 AF: 2
        AF Parent
        the Bayesian notion of belief doesn’t allow us to make the distinction you are pointing to
        Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.
        I also think this result has nothing to do with “you can’t have a perfect model of Carol”. Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.
        I think we mean different things by “perfect model”. What if I instead say “you can’t perfectly update on X and Carol-said-X , because you can’t know why Carol said X, because that could in the worst case require you to know everything that Carol will say in the future”?
        abramdemski 3 Apr 2020 7:23 UTC
        LW: 4 AF: 4
        AF Parent
        
        Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.
        
        Yeah, ok. This post as written is really less the kind of thing somebody who has followed all the MIRI thinking needs to hear and more the kind of thing one might bug an orthodox Bayesian with. I framed it in terms of filtered evidence because I came up with it by thinking about some confusion I was having about filtered evidence. And it does problematize the Bayesian treatment. But in terms of actual research progress it would be better framed as a negative result about whether Sam’s untrollable prior can be modified to have richer learning.
        
        I think we mean different things by “perfect model”. What if [...]
        
        Yep, I agree with everything you say here.