while we still disagree about whether we can have confidence in statements about the outcome of rolling a hundred-sided die.
Ok. I’ll attempt to illustrate confidence vs probability as I understand it.
Lets start with your example. Starting with the certain knowledge that there is an object which is a 100-sided die, you are correct to infer that P(roll(D) != 12 | D=100) = 99⁄100.
Further, you are correct (in this example) to have complete confidence in that estimate.
We can think of confidence as how closely one’s probability estimate approaches the true frequency if we iterated the experiment to infinity, or alternatively summed across the multiverse.
If we roll that die an infinite number of times, (or summed across the multiverse), the observed frequency of (roll(D) != 12 | D=100) is more or less guaranteed to converge to the probability estimate of 99%. This is thus a high confidence estimate.
But this high confidence is conditional on your knowledge (and really, your confidence in this knowledge) that there is a die, and the die has 100 sides, and the die is fair, and so on.
Now if you remove all this knowledge, the situation changes dramatically.
Imagine that you know only that there is a die, but not how many sides the die has. You could still make some sort of an estimate. You could guesstimate using your brain’s internal heuristics, which wouldn’t be so terrible, or you could research dice and make a more informed prior about the unknown number of sides.
From that you might make an informed estimate of 98.7% for P(roll(D) != 12 | D = ???), but this will be a low confidence estimate. In fact once we roll this unknown die a large number of times, we can be fairly certain that the observed frequency will not converge to 98.7%.
So that is the difference between probability and confidence, at least in intuitive english. There are several more concrete algorithmic schemes for dealing with confidence or epistemic uncertainty, but that’s the general idea. (We could even take it up a whole new meta level by considering probability distributions of probability functions (one for each possible die type), and this would be a more accurate model, but it is of course no more confident)
So, if I understand you correctly, and returning to my original question… given the statement “my next roll of this hundred-sided die will not be 12” (P1), and a bunch of background knowledge (K1) about how hundred-sided dice typically work, and a bunch of background knowledge (K2) relevant to how likely it is that my next roll of this hundred-dided die will be typical (for example, how likely this die is to be loaded), I could in principle be confident in P1.
However, since K2 is not complete, I cannot in practice be confident in P1.
The best I can do is make an informed estimate of the likelihood of P1, but this will be a low confidence estimate.
Have I correctly generalized your reasoning and applied it to the case I asked about?
Have I correctly generalized your reasoning and applied it to the case I asked about?
Yeah, kind of.
However, your P1 statement already implies the most important parts of K1 and K2; as just by inserting the adjective “hundred-sided” into P1 loads it with this knowledge. Beyond that the K1 and K2 stuff is cumbersome background detail that most human brains will have (but of course also necessary for understanding ‘dice’).
By including “hundred-sided’ in the analogy, you are importing a ton of implicit confidence in the true probability distribution in question. Your ‘analogy’ assumes you already know the answer with complete confidence.
That analogy would map to an argument (for AI risk) written out in labyrinthine explicit well grounded detail, probably to the point of encoding complete tested/proven working copies of the entire range of future AGI designs.
In other words, your probability estimate in the dice analogy is only high confidence because of the confidence in understanding how dice work, and that the object in question actually is a hundred sided die.
We don’t have AGI yet, so we can’t understand them in the complete engineering sense that we understand dice. Moreover, stuart above claimed we don’t even understand how AGI will be built.
However, your P1 statement already implies the most important parts of K1 and K2; as just by inserting the adjective “hundred-sided” into P1 loads it with this knowledge.
I disagree.
For example, I can confirm that something is a hundred-sided die by the expedient of counting its sides. But if a known conman bets me $1000 that P1 is false, I will want to do more than count the sides of the die before I take that bet. (For example, I will want to roll it a few times to ensure it’s not loaded.) That suggests that there are important facts in K2 other than the definition of a hundred-sided die (e.g., whether the die is fair, whether the speaker is a known conman, etc.) that factor into my judgment of P1.
In other words, your probability estimate in the dice analogy is only high confidence because of the confidence in understanding how dice work, and that the object in question actually is a hundred sided die.
And a bunch of other things, as above. Which is why I mentioned K1 and K2 in the first place.
...your probability estimate in the dice analogy is only high confidence …
Wait, what?
First, nowhere in here have I made a probability estimate. I’ve made a prediction about what will happen on the next roll of this die. You are inferring that i made that prediction on the basis of a probability estimate, and you admittedly have good reasons to infer that.
Second… are you now saying that I can be confident in P1? Because when I asked you that in the first place you answered no. I suspect I’ve misunderstood you somewhere.
First, nowhere in here have I made a probability estimate. I’ve made a prediction about what will happen on the next roll of this die. You are inferring that i made that prediction on the basis of a probability estimate, and you admittedly have good reasons to infer that.
Yes. I have explained (in some amount of detail) what I mean by confidence, such that it is distinct from probability, as relates to prediction.
And yes, as a human you are in fact constrained (in practice) to making predictions based on internal probability estimates (based on my understanding of neuroscience).
Confidence, like probability, is not binary.
You can have fairly high confidence in the implied probability of P1 given K1 and K2, and likewise little confidence in a probability estimate of P1 in the case of a die of unknown dimension—this should be straightforward.
Second… are you now saying that I can be confident in P1? Because when I asked you that in the first place you answered no. I suspect I’ve misunderstood you somewhere.
Yes, and the mistake is on my part: wow that first comment was a partial brainfart. I was agreeing with you, and meant to say yes but … I’ll edit in a comment to that effect.
Ok. I’ll attempt to illustrate confidence vs probability as I understand it.
Lets start with your example. Starting with the certain knowledge that there is an object which is a 100-sided die, you are correct to infer that P(roll(D) != 12 | D=100) = 99⁄100.
Further, you are correct (in this example) to have complete confidence in that estimate.
We can think of confidence as how closely one’s probability estimate approaches the true frequency if we iterated the experiment to infinity, or alternatively summed across the multiverse.
If we roll that die an infinite number of times, (or summed across the multiverse), the observed frequency of (roll(D) != 12 | D=100) is more or less guaranteed to converge to the probability estimate of 99%. This is thus a high confidence estimate.
But this high confidence is conditional on your knowledge (and really, your confidence in this knowledge) that there is a die, and the die has 100 sides, and the die is fair, and so on.
Now if you remove all this knowledge, the situation changes dramatically.
Imagine that you know only that there is a die, but not how many sides the die has. You could still make some sort of an estimate. You could guesstimate using your brain’s internal heuristics, which wouldn’t be so terrible, or you could research dice and make a more informed prior about the unknown number of sides.
From that you might make an informed estimate of 98.7% for P(roll(D) != 12 | D = ???), but this will be a low confidence estimate. In fact once we roll this unknown die a large number of times, we can be fairly certain that the observed frequency will not converge to 98.7%.
So that is the difference between probability and confidence, at least in intuitive english. There are several more concrete algorithmic schemes for dealing with confidence or epistemic uncertainty, but that’s the general idea. (We could even take it up a whole new meta level by considering probability distributions of probability functions (one for each possible die type), and this would be a more accurate model, but it is of course no more confident)
OK.
So, if I understand you correctly, and returning to my original question… given the statement “my next roll of this hundred-sided die will not be 12” (P1), and a bunch of background knowledge (K1) about how hundred-sided dice typically work, and a bunch of background knowledge (K2) relevant to how likely it is that my next roll of this hundred-dided die will be typical (for example, how likely this die is to be loaded), I could in principle be confident in P1.
However, since K2 is not complete, I cannot in practice be confident in P1.
The best I can do is make an informed estimate of the likelihood of P1, but this will be a low confidence estimate.
Have I correctly generalized your reasoning and applied it to the case I asked about?
Yeah, kind of.
However, your P1 statement already implies the most important parts of K1 and K2; as just by inserting the adjective “hundred-sided” into P1 loads it with this knowledge. Beyond that the K1 and K2 stuff is cumbersome background detail that most human brains will have (but of course also necessary for understanding ‘dice’).
By including “hundred-sided’ in the analogy, you are importing a ton of implicit confidence in the true probability distribution in question. Your ‘analogy’ assumes you already know the answer with complete confidence.
That analogy would map to an argument (for AI risk) written out in labyrinthine explicit well grounded detail, probably to the point of encoding complete tested/proven working copies of the entire range of future AGI designs.
In other words, your probability estimate in the dice analogy is only high confidence because of the confidence in understanding how dice work, and that the object in question actually is a hundred sided die.
We don’t have AGI yet, so we can’t understand them in the complete engineering sense that we understand dice. Moreover, stuart above claimed we don’t even understand how AGI will be built.
I disagree.
For example, I can confirm that something is a hundred-sided die by the expedient of counting its sides.
But if a known conman bets me $1000 that P1 is false, I will want to do more than count the sides of the die before I take that bet. (For example, I will want to roll it a few times to ensure it’s not loaded.)
That suggests that there are important facts in K2 other than the definition of a hundred-sided die (e.g., whether the die is fair, whether the speaker is a known conman, etc.) that factor into my judgment of P1.
And a bunch of other things, as above. Which is why I mentioned K1 and K2 in the first place.
Wait, what?
First, nowhere in here have I made a probability estimate. I’ve made a prediction about what will happen on the next roll of this die. You are inferring that i made that prediction on the basis of a probability estimate, and you admittedly have good reasons to infer that.
Second… are you now saying that I can be confident in P1? Because when I asked you that in the first place you answered no. I suspect I’ve misunderstood you somewhere.
Yes. I have explained (in some amount of detail) what I mean by confidence, such that it is distinct from probability, as relates to prediction.
And yes, as a human you are in fact constrained (in practice) to making predictions based on internal probability estimates (based on my understanding of neuroscience).
Confidence, like probability, is not binary.
You can have fairly high confidence in the implied probability of P1 given K1 and K2, and likewise little confidence in a probability estimate of P1 in the case of a die of unknown dimension—this should be straightforward.
Yes, and the mistake is on my part: wow that first comment was a partial brainfart. I was agreeing with you, and meant to say yes but … I’ll edit in a comment to that effect.
Ah!
I feel much better now.
I should go through this discussion again and re-evaluate what I think you’re saying based on that clarification before I try to reply .