Sorry if there is something fishy in the writeup :(. I could believe it, given how rushed I was writing it.
Suppose we consider not just a,~a,b,~b, and c,~c, but also statements q=”exactly one of a,b,c is true” and ~q. Suppose now that we uniformly pick a truth value for a, then for q, then a logically consistent but otherwise random value for b, and finally a logically consistent but otherwise random value for c. Such an asymmetric situation could occur if b and c have high mu but a and q have small mu. In the worlds where we believe q, b and c are much more often disbelieved than a. I believe that basically captures the worries about Demski’s scheme that Paul was having; maybe he will comment himself.
I’m still not sure whether I find it too concerning. We assign higher probability to simpler theories so long as they follow the constraint that Q(n) below 10^100 is 90%. The theory which randomly assigns Q(n) to true/false for 10^99 simple values of n (and then gets forced to assign Q as true for most of the remaining n below 10^100) is just one form of theory which may be generated by the process, and not a really simple one. The theory that Q(n) represents “not divisible by 10” is going to be much more probable than all these theories.
In other words, the write-up estimates Q(n) based on a narrow subset of the theories which assign probability mass. I don’t really think it’s representative...
Imagine that we encounter a truly iid random sequence of 90% likely propositions Q(0),Q(1),Q(2),…
Perhaps they are merely pseudorandom but impossibly complicated to reason about, or perhaps they represent some random external output that an agent observes.
After observing a very large number of these Q(i), one might expect to place high probability on something like “About 90% of the next 10^100 Q(j) I haven’t observed yet will be true,” but there is unlikely to be any simple rule that describes the already observed Q(i). Do you think that the next 10^100 Q(j) will all individually be believed 90% likely to be true, or will the simpler to describe Q(j) receive closer to 50% probability?
We can show that the FOL prior is not too different from the algorithmic prior, so it can’t perform too badly for problems where algorithmic induction does well. Partial theories which imply probabilities close to .9 but do not specify exact predictions will eventually have high probability; for example, a theory might specify that Q(x) is derived from an unspecified F(x) and G(x) (treated as random sources) getting OR’d together, making probabilities roughly .75; variations of this would bring things closer to .9.
This still may still assign simpler Q(j) to closer to 50% probability.
Sorry if there is something fishy in the writeup :(. I could believe it, given how rushed I was writing it.
Suppose we consider not just a,~a,b,~b, and c,~c, but also statements q=”exactly one of a,b,c is true” and ~q. Suppose now that we uniformly pick a truth value for a, then for q, then a logically consistent but otherwise random value for b, and finally a logically consistent but otherwise random value for c. Such an asymmetric situation could occur if b and c have high mu but a and q have small mu. In the worlds where we believe q, b and c are much more often disbelieved than a. I believe that basically captures the worries about Demski’s scheme that Paul was having; maybe he will comment himself.
Does that clarify anything?
This was surprisingly clarifying for me.
I’m still not sure whether I find it too concerning. We assign higher probability to simpler theories so long as they follow the constraint that Q(n) below 10^100 is 90%. The theory which randomly assigns Q(n) to true/false for 10^99 simple values of n (and then gets forced to assign Q as true for most of the remaining n below 10^100) is just one form of theory which may be generated by the process, and not a really simple one. The theory that Q(n) represents “not divisible by 10” is going to be much more probable than all these theories.
In other words, the write-up estimates Q(n) based on a narrow subset of the theories which assign probability mass. I don’t really think it’s representative...
Imagine that we encounter a truly iid random sequence of 90% likely propositions Q(0),Q(1),Q(2),… Perhaps they are merely pseudorandom but impossibly complicated to reason about, or perhaps they represent some random external output that an agent observes. After observing a very large number of these Q(i), one might expect to place high probability on something like “About 90% of the next 10^100 Q(j) I haven’t observed yet will be true,” but there is unlikely to be any simple rule that describes the already observed Q(i). Do you think that the next 10^100 Q(j) will all individually be believed 90% likely to be true, or will the simpler to describe Q(j) receive closer to 50% probability?
We can show that the FOL prior is not too different from the algorithmic prior, so it can’t perform too badly for problems where algorithmic induction does well. Partial theories which imply probabilities close to .9 but do not specify exact predictions will eventually have high probability; for example, a theory might specify that Q(x) is derived from an unspecified F(x) and G(x) (treated as random sources) getting OR’d together, making probabilities roughly .75; variations of this would bring things closer to .9.
This still may still assign simpler Q(j) to closer to 50% probability.