To clarify further: likelihood is a relative quantity, like speed—it only has meaning relative to a specific frame of reference.
If you’re judging my calibration, the proper frame of reference is what I knew at the time of prediction. I didn’t know what the result of the fencing match would be, but I had some evidence for who is more likely to win. The (objective) probability distribution given that (subjective) information state is what I should’ve used for prediction.
If you’re judging my diligence as an evidence seeker, the proper frame of reference is what I would’ve known after reasonable information gathering. I could’ve taken some actions to put myself in a difference information state, and then my prediction could be better.
But it’s unreasonable to expect me to know the result beyond any doubt. Even if Omega is in an information state of perfectly predicting the future, this is never a proper frame of reference by which to judge bounded agents.
And this is the major point on which I’m non-Yudkowskian: since Omega is never a useful frame of reference, I’m not constraining reality to be consistent with it. In this sense, some probabilities are in the territory.
since Omega is never a useful frame of reference, I’m not constraining reality to be consistent with it. In this sense, some probabilities are in the territory.
I thought I was following you, but you lost me there.
I certainly agree that if I want to evaluate various aspects of your cognitive abilities based on your predictions, I should look at different aspects of your predictions depending on what abilities I care about, as you describe, and that often the accuracy of your prediction is not the most useful aspect to look at. And of course I agree that expecting perfect knowledge is unreasonable.
But what that has to do with Omega, and what the uselessness of Omega as a frame of reference has to do with constraints on reality, I don’t follow.
I probably need to write a top-level post to explain this adequately, but in a nutshell:
I’ve tossed a coin. Now we can say that the world is in one of two states: “heads” and “tails”. This view is consistent with any information state. The information state (A) of maximal ignorance is a uniform distribution over the two states. The information state (B) where heads is twice as likely as tails is the distribution p(“heads”) = 2⁄3, p(“tails”) = 1⁄3. The information state (C) of knowing for sure that the result is heads is the distribution p(“heads”) = 1, p(“tails”) = 0.
Alternatively, we can say that the world is in one of these two states: “almost surely heads” and “almost surely tails”. Now information state (A) is a uniform distribution over these states; (B) is perhaps the distribution p(“ASH”) = 0.668, p(“AST”) = 0.332; but (C) is impossible, and so is any information state that is more certain than reality in this strange model.
Now, in many cases we can theoretically have information states arbitrarily close to complete certainty. In such cases we must use the first kind of model. So we can agree to just always use the first kind of model, and avoid all this silly complication.
But then there are cases where there are real (physical) reasons why not every information state is possible. In these cases reality is not constrained to be of the first kind, and it could be of the second kind. As a matter of fact, to say that reality is of the first kind—and that probability is only in the mind—is to say more about reality than can possibly be known. This goes against Jaynesianism.
So I completely agree that not knowing something is a property of the map rather than the territory. But an impossibility of any map to know something is a property of the territory.
the world is in one of two states: “heads” and “tails”. [..] The information state (C) of knowing for sure that the result is heads is the distribution p(“heads”) = 1, p(“tails”) = 0.
Sure. And (C) is unachievable in practice if one is updating one’s information state sensibly from sensible priors.
Alternatively, we can say that the world is in one of these two states: “almost surely heads” and “almost surely tails”. Now information state (A) is a uniform distribution over these states
I am uncertain what you mean to convey in this example by the difference between a “world state” (e.g., ASH or AST) and an “information state” (e.g. p(“ASH”)=0.668).
The “world state” of ASH is in fact an “information state” of p(“heads”)>SOME_THRESHOLD, which is fine if you mean those terms to be denotatively synonymous but connotatively different, but problematic if you mean them to be denotatively different.
...but (C) is impossible
.
(C), if I’m following you, maps roughly to the English phrase “I know for absolutely certain that the coin is almost surely heads”.
Yes, agreed that this is strictly speaking unachievable, just as “I know for absolutely certain that the coin is heads” was.
That said, I’m not sure what it means for a human brain to have “I know for absolutely certain that the coin is almost surely heads” as a distinct state from “I am almost sure the coin is heads,” and the latter is achievable.
So we can agree to just always use the first kind of model, and avoid all this silly complication.
Works for me.
But then there are cases where there are real (physical) reasons why not every information state is possible.
And now you’ve lost me again. Of course there are real physical reasons why certain information states are not possible… e.g., my brain is incapable of representing certain thoughts. But I suspect that’s not what you mean here.
Can you give me some examples of the kinds of cases you have in mind?
The “world state” of ASH is in fact an “information state” of p(“heads”)>SOME_THRESHOLD
Actually, I meant p(“heads”) = 0.999 or something.
(C), if I’m following you, maps roughly to the English phrase “I know for absolutely certain that the coin is almost surely heads”.
No, I meant: “I know for absolutely certain that the coin is heads”. We agree that this much you can never know. As for getting close to this, for example having the information state (D) where p(“heads”) = 0.999999: if the world is in the state “heads”, (D) is (theoretically) possible; if the world is in the state “ASH”, (D) is impossible.
Can you give me some examples of the kinds of cases you have in mind?
Mundane examples may not be as clear, so: suppose we send a coin-flipping machine deep into intergalactic space. After a few billion years it flies permanently beyond our light cone, and then flips the coin.
Now any information state about the coin, other than complete ignorance, is physically impossible. We can still say that the coin is in one of the two states “heads” and “tails”, only unknown to us. Alternatively we can say that the coin is in a state of superposition. These two models are epistemologically equivalent.
I prefer the latter, and think many people in this community should agree, based on the spirit of other things they believe: the former model is ontologically more complicated. It’s saying more about reality than can be known. It sets the state of the coin as a free-floating property of the world, with nothing to entangle with.
To clarify further: likelihood is a relative quantity, like speed—it only has meaning relative to a specific frame of reference.
If you’re judging my calibration, the proper frame of reference is what I knew at the time of prediction. I didn’t know what the result of the fencing match would be, but I had some evidence for who is more likely to win. The (objective) probability distribution given that (subjective) information state is what I should’ve used for prediction.
If you’re judging my diligence as an evidence seeker, the proper frame of reference is what I would’ve known after reasonable information gathering. I could’ve taken some actions to put myself in a difference information state, and then my prediction could be better.
But it’s unreasonable to expect me to know the result beyond any doubt. Even if Omega is in an information state of perfectly predicting the future, this is never a proper frame of reference by which to judge bounded agents.
And this is the major point on which I’m non-Yudkowskian: since Omega is never a useful frame of reference, I’m not constraining reality to be consistent with it. In this sense, some probabilities are in the territory.
I thought I was following you, but you lost me there.
I certainly agree that if I want to evaluate various aspects of your cognitive abilities based on your predictions, I should look at different aspects of your predictions depending on what abilities I care about, as you describe, and that often the accuracy of your prediction is not the most useful aspect to look at. And of course I agree that expecting perfect knowledge is unreasonable.
But what that has to do with Omega, and what the uselessness of Omega as a frame of reference has to do with constraints on reality, I don’t follow.
I probably need to write a top-level post to explain this adequately, but in a nutshell:
I’ve tossed a coin. Now we can say that the world is in one of two states: “heads” and “tails”. This view is consistent with any information state. The information state (A) of maximal ignorance is a uniform distribution over the two states. The information state (B) where heads is twice as likely as tails is the distribution p(“heads”) = 2⁄3, p(“tails”) = 1⁄3. The information state (C) of knowing for sure that the result is heads is the distribution p(“heads”) = 1, p(“tails”) = 0.
Alternatively, we can say that the world is in one of these two states: “almost surely heads” and “almost surely tails”. Now information state (A) is a uniform distribution over these states; (B) is perhaps the distribution p(“ASH”) = 0.668, p(“AST”) = 0.332; but (C) is impossible, and so is any information state that is more certain than reality in this strange model.
Now, in many cases we can theoretically have information states arbitrarily close to complete certainty. In such cases we must use the first kind of model. So we can agree to just always use the first kind of model, and avoid all this silly complication.
But then there are cases where there are real (physical) reasons why not every information state is possible. In these cases reality is not constrained to be of the first kind, and it could be of the second kind. As a matter of fact, to say that reality is of the first kind—and that probability is only in the mind—is to say more about reality than can possibly be known. This goes against Jaynesianism.
So I completely agree that not knowing something is a property of the map rather than the territory. But an impossibility of any map to know something is a property of the territory.
Sure. And (C) is unachievable in practice if one is updating one’s information state sensibly from sensible priors.
I am uncertain what you mean to convey in this example by the difference between a “world state” (e.g., ASH or AST) and an “information state” (e.g. p(“ASH”)=0.668).
The “world state” of ASH is in fact an “information state” of p(“heads”)>SOME_THRESHOLD, which is fine if you mean those terms to be denotatively synonymous but connotatively different, but problematic if you mean them to be denotatively different.
Yes, agreed that this is strictly speaking unachievable, just as “I know for absolutely certain that the coin is heads” was.
That said, I’m not sure what it means for a human brain to have “I know for absolutely certain that the coin is almost surely heads” as a distinct state from “I am almost sure the coin is heads,” and the latter is achievable.
Works for me.
And now you’ve lost me again. Of course there are real physical reasons why certain information states are not possible… e.g., my brain is incapable of representing certain thoughts. But I suspect that’s not what you mean here.
Can you give me some examples of the kinds of cases you have in mind?
Actually, I meant p(“heads”) = 0.999 or something.
No, I meant: “I know for absolutely certain that the coin is heads”. We agree that this much you can never know. As for getting close to this, for example having the information state (D) where p(“heads”) = 0.999999: if the world is in the state “heads”, (D) is (theoretically) possible; if the world is in the state “ASH”, (D) is impossible.
Mundane examples may not be as clear, so: suppose we send a coin-flipping machine deep into intergalactic space. After a few billion years it flies permanently beyond our light cone, and then flips the coin.
Now any information state about the coin, other than complete ignorance, is physically impossible. We can still say that the coin is in one of the two states “heads” and “tails”, only unknown to us. Alternatively we can say that the coin is in a state of superposition. These two models are epistemologically equivalent.
I prefer the latter, and think many people in this community should agree, based on the spirit of other things they believe: the former model is ontologically more complicated. It’s saying more about reality than can be known. It sets the state of the coin as a free-floating property of the world, with nothing to entangle with.
OK. Thanks for clarifying.