cousin_it comments on A model of UDT with a concrete prior over logical statements

cousin_it 29 Aug 2012 10:22 UTC
6 points
Very nice that you’re doing this! I’d like to understand your proposed prior a little better:

1) Does the probability of any given statement converge to a limit as the proof length goes to infinity? That’s trivially true for statements that are proven true or false by PA, but what about independent statements like Con(PA)?

2) Does the probability of any given statement strongly depend on the encoding used for statements, or can we somehow “wash out” that dependence?
What links here?
- Intelligence Metrics with Naturalized Induction using UDT by Squark (21 Feb 2014 12:23 UTC; 24 points)
- Benya 29 Aug 2012 12:51 UTC
  4 points
  Parent
  
  I’d like to understand your proposed prior a little better
  
  Me too! I.e., I understand the definition, but not very much about its properties. Your questions are helpful, thanks!
  
  1) Does the probability of a single statement converge to a limit as the proof length goes to infinity?
  
  You mean if we keep the 3^^^3 fixed, but replace 4^^^^4 by larger and larger numbers? Yes: for a particular world X, if (not s_X) is proven by PA at all, it’s proven after some finite number of steps, and taking the maximum of these over all X, we see that there is some number we can replace 4^^^^4 by after which nothing changes. But it’s important to keep in mind that the probabilities our agent uses may not be close to that limit! In particular, all but one of the probabilities we force to be >0 will in the limit be 0, because PA can obviously simulate our agent and figure out what it returns.
  
  An interesting followup question is: consider the probability-in-the-limit P_lim(x) of some statement x, as in the previous paragraph. If we now increase the 3^^^3 parameter, does P_lim(x) tend to a limit? Don’t have an answer to that one, so far.
  
  2) Does the probability of a single statement strongly depend on the encoding used for statements, or can we somehow “wash out” that dependency?
  
  If we keep the 3^^^3 number fixed and allow very weird encodings, it seems likely that we can do ugly things, though I haven’t checked in detail. My intuition would be that if any nice result is to be had at all, it would be of a similar form as those about Kolmogorov complexity—something like, two fixed encodings cannot differ by more than a constant if you let the numeric parameters grow large enough. But that’s just a guess.
  - cousin_it 29 Aug 2012 13:56 UTC
    0 points
    Parent
    Yeah, I guess your “interesting followup question” is what interests me as well...
    What links here?
    Benya's comment on A model of UDT with a concrete prior over logical statements by Benya (29 Aug 2012 14:15 UTC; 0 points)
    - Benya 29 Aug 2012 14:22 UTC
      0 points
      Parent
      Oops, our comments crossed. Here are two points I wrote in reply to your original comment:
      
      Nit: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)]. (You wrote about what we get at “(infinity,infinity)”.)
      
      I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)
      - Benya 30 Aug 2012 13:09 UTC
        2 points
        Parent
        I haven’t thought about it too much more, but I do not see a promising avenue to follow towards prowing that P_lim(x) converges.
        
        Some good news: if I didn’t make a mistake, the set of all complete theories extending PA is indeed a measurable set in the “countable sequence of coin tosses” space; hint: it’s the countable intersection of a sequence of events induced by replacing 3^^^3 by larger and larger numbers. (I’d be happy to expand on this if someone is interested.) Therefore, we have a measurable subspace, on which an appropriate probability measure could exist. Unfortunately, the subspace has probability zero in the original probability space, so you can’t just do a Bayesian update (that would imply a division by zero).
        What links here?
        abramdemski's comment on Intelligence Metrics with Naturalized Induction using UDT by Squark (24 Feb 2014 5:29 UTC; 0 points)
        cousin_it 10 Sep 2012 12:38 UTC
        0 points
        Parent
        It just occurred to me that you could sidestep the question and instead define P_lim(x) directly. For example, you could say something like
        
        P_lim(x) = P(PA+x+y is consistent | PA+y is consistent)
        
        if y is randomly chosen from all statements in PA using a simple length-based prior. This way P_lim(x)=1 if PA proves X, and P_lim(x)=0 if PA disproves X, as expected. Then you could view your original definition as a sort of approximation scheme for this one.
        
        This way of assigning probabilities as “degrees of consistency” doesn’t seem to contain much more information than PA about the “true integers”, but if we just need a prior, something like it could work.
        paulfchristiano 10 Feb 2013 4:48 UTC
        2 points
        Parent
        It doesn’t seem to me like your prior needs to encode information about the “true integers” any more than it needs to encode information about the world. It just needs to be flexible enough to see the structure that is there.
        
        Maybe this is what you mean by “if we just need a prior,” but it seems like an important point.
        
        While we’re at it, it’s not even clear our prior needs to capture PA. Maybe you could use something much weaker, perhaps even a theory so weak it could assert its own (cut free) consistency, and then just learn the structure of PA by observing some computations (or however you gain info about logical facts).
  - cousin_it 29 Aug 2012 13:53 UTC
    0 points
    Parent
    OK, so by taking the limit of priors at (m,n) we get a well-defined prior at (m,infinity). What happens if we take the limit of those, do we get a nice prior at (infinity,infinity) for individual statements (not worlds)?
    - Benya 29 Aug 2012 14:15 UTC
      0 points
      Parent
      [ETA: oops, comments crossed; cousin_it originally posted a restatement of my followup question.]
      
      Yes, that’s what I meant. Nit, though: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)].
      
      I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)