If you check consistency of statements with length less than D but allow proofs of infinite length, you’ll need infinite computational resources. That’s bad.
I don’t care about proofs. Only about “D-consistency”. The probability of s is the ratio of the number D-consistent truth assignments in which s is true to the total number of D-consistent truth assignments. When a short proof of s exists, all D-consistent truth assignments define s as true, so its probability is 1. When a short proof of “not s” exists, all D-consistent truth assignments define s a false, so its probability is 0. In all other cases the ratio is some number between 0 and 1, not necessarily 1⁄2.
Now consider some other hypothesis H which is just Y=[1,2,3,4...]. We can offset the zero time so that H and N start with the same number, so that t(H)=1. And H is much, much simpler than N. How would your agent go about showing that H is wrong?
For the agent to be functional, t has to be sufficiently large. For sufficiently large t, all hypotheses with small t(H) are suppressed, even the simplest ones. In fact, I suspect there is a certain critical t at which the agent gains the ability to accumulate knowledge over time.
I don’t care about proofs. Only about “D-consistency”.
Fair enough. But would you agree with the claim that a real-world agent is going to have to use a formulation that fits inside limits on the length of usable proofs? This circles back to my suggestion that the specifics aren’t that important here and a handwaved generic logical probability distribution would suffice :)
For the agent to be functional, t has to be sufficiently large. For sufficiently large t, all hypotheses with small t(H) are suppressed, even the simplest ones. In fact, I suspect there is a certain critical t at which the agent gains the ability to accumulate knowledge over time.
Hm. Upon further reflection, I think the problems are not with your distribution (which I had initially misinterpreted to be a posterior distribution, with t(H) a property of the data :P ), but with the neglect of bridging laws or different ways of representing the universe.
For example, if our starting ontology is classical mechanics and our universe at each time step is just a big number encoding the coordinates of the particles, quantum mechanics has a t(H)=0, because it’s such a different format. - it’s the coordinates of some point in Hilbert space, not phase space. Being able to rediscover quantum mechanics is important.
If you neglect bridging laws, then your hypotheses are either untestable, or only testable using some sort of default mapping (comparing the input channel of your agent to the input channel of Q(H), which needs to be found by interpreting Q(H) using a particular format). If our agent exists in the universe in a different format, then we need to specify some different way of finding its input channel.
Another problem from the neglect of bridging laws is that when the bridging laws themselves are highly complex, you want to penalize this. You can’t just have the universe be [1,2,3...] and then map those to the correct observations (using some big look-up table) and claim it’s a simple hypothesis.
But would you agree with the claim that a real-world agent is going to have to use a formulation that fits inside limits on the length of usable proofs?
I’m not defining an agent here, I’m defining a mathematical function which evaluates agents. It is uncomputable (as is the Legg-Hutter metric).
Upon further reflection, I think the problems are not with your distribution… but with the neglect of bridging laws or different ways of representing the universe.
N defines the ontology in which the utility function and the “intrinsic mind model” are defined. Y should be regarded as the projection of the universe on this ontology rather than the “objective universe” (whatever the latter means). Thus H implicitly includes both the model of the universe and the bridging laws. In particular, its complexity reflects the total complexity of both. For example, if N is classical and the universe is quantum mechanical, G will arrive at a hypothesis H which combines quantum mechanics with decoherence theory to produce classical macroscopic histories. This hypothesis will have large t(H) since quantum mechanics correctly reproduces the classical dynamics of M at the macroscopic level. This shouldn’t come as a surprise: we also perceive the world as classical. More precisely, there would be a dominant family of hypothesis differing in the results of “quantum coin tosses”. That is, this ontological projection is precisely the place where the probability interpretation of the wavefunction arises.
I don’t care about proofs. Only about “D-consistency”. The probability of s is the ratio of the number D-consistent truth assignments in which s is true to the total number of D-consistent truth assignments. When a short proof of s exists, all D-consistent truth assignments define s as true, so its probability is 1. When a short proof of “not s” exists, all D-consistent truth assignments define s a false, so its probability is 0. In all other cases the ratio is some number between 0 and 1, not necessarily 1⁄2.
For the agent to be functional, t has to be sufficiently large. For sufficiently large t, all hypotheses with small t(H) are suppressed, even the simplest ones. In fact, I suspect there is a certain critical t at which the agent gains the ability to accumulate knowledge over time.
Fair enough. But would you agree with the claim that a real-world agent is going to have to use a formulation that fits inside limits on the length of usable proofs? This circles back to my suggestion that the specifics aren’t that important here and a handwaved generic logical probability distribution would suffice :)
Hm. Upon further reflection, I think the problems are not with your distribution (which I had initially misinterpreted to be a posterior distribution, with t(H) a property of the data :P ), but with the neglect of bridging laws or different ways of representing the universe.
For example, if our starting ontology is classical mechanics and our universe at each time step is just a big number encoding the coordinates of the particles, quantum mechanics has a t(H)=0, because it’s such a different format. - it’s the coordinates of some point in Hilbert space, not phase space. Being able to rediscover quantum mechanics is important.
If you neglect bridging laws, then your hypotheses are either untestable, or only testable using some sort of default mapping (comparing the input channel of your agent to the input channel of Q(H), which needs to be found by interpreting Q(H) using a particular format). If our agent exists in the universe in a different format, then we need to specify some different way of finding its input channel.
Another problem from the neglect of bridging laws is that when the bridging laws themselves are highly complex, you want to penalize this. You can’t just have the universe be [1,2,3...] and then map those to the correct observations (using some big look-up table) and claim it’s a simple hypothesis.
I’m not defining an agent here, I’m defining a mathematical function which evaluates agents. It is uncomputable (as is the Legg-Hutter metric).
N defines the ontology in which the utility function and the “intrinsic mind model” are defined. Y should be regarded as the projection of the universe on this ontology rather than the “objective universe” (whatever the latter means). Thus H implicitly includes both the model of the universe and the bridging laws. In particular, its complexity reflects the total complexity of both. For example, if N is classical and the universe is quantum mechanical, G will arrive at a hypothesis H which combines quantum mechanics with decoherence theory to produce classical macroscopic histories. This hypothesis will have large t(H) since quantum mechanics correctly reproduces the classical dynamics of M at the macroscopic level. This shouldn’t come as a surprise: we also perceive the world as classical. More precisely, there would be a dominant family of hypothesis differing in the results of “quantum coin tosses”. That is, this ontological projection is precisely the place where the probability interpretation of the wavefunction arises.