Scott Garrabrant comments on Thought experiments on simplicity in logical probability

Scott Garrabrant 22 Aug 2014 0:06 UTC
7 points
I do not know of any proposed prior that says that “simple sentences are more likely to be true.”

The prior I proposed says that I care more about being correct on simple sentences. This translates better to “I expect that my answer to simple questions is more likely to matter”

The prior Paul Christiano proposed tries to minimize a “weighted” entropy, and so translates as “try harder to say neutral on the simple sentences”

The prior Abram Demski proposed (or at least the minor modification to it that he takes more seriously where you are just as likely to put the negation of a sentence in as the sentence itself) translates as “truth values of simple sentence are more likely to modify truth values of complicated sentences than vice versa.”

The reason these look like “simple sentences are more likely to be true” is because you are looking at a list of mutually exclusive sentences, and so pushing them towards ¹⁄₂ is generally increasing their probability. If instead, you had a list of sentences and knew that exactly 5 out of the 7 were true, the simple ones would look less likely.

Note 1: I have recently become convinced that Abram’s prior is likely a better proposal than my own.

Note 2: I haven’t posted this yet, but my proposal is actually three different definitions, which are (conjecturally) the same, and one of them is better described similarly to Abram’s in that it lets simple sentences change complex sentences more than vice versa.
- Manfred 22 Aug 2014 0:43 UTC
  1 point
  Parent
  Thanks for linking to your own proposal! I’m posting this before actually digesting it, so I may reply again later.
  
  I agree that I’m tunneling on the mutually exclusive and exhaustive case, and the characterization “simpler sentences have probabilities closer to 1/2” is an accurate characterization of Abram’s scheme. I’m not so sure about Paul’s—I’m pretty sure it’s not minimizing a weighted entropy, but a cross-entropy, which tries to make the probability proportional to the pre-prior.
  
  As for other examples, there are a variety of more dilletantish cases (though I’m not really one to talk, of course) of people just trying to port Solomonoff induction on sentences or on models, and similar proposals of things like decreasing probability as a function of logical depth (example).
  
  I’d defend my focus on mutually exclusive and exhaustive sets by saying that they show up everywhere and are always a basis we can use to represent knowledge. For example, if I had a list of 7 sentences and knew that exactly 5 were true, I can completely characterize my knowledge by probabilities of the 7 choose 5 mutually exclusive and exhaustive possibilities.
  
  That said, it’s certainly possible there are things I’m missing because of this focus.