I notice that reasoning about logical uncertainty does not appear more confusing to me than reasoning about empirical one. Am I missing something?
Consider the classical example from the description of the tag:
Is the googolth digit of pi odd? The probability that it is odd is, intuitively, 0.5. Yet we know that this is definitely true or false by the rules of logic, even though we don’t know which. Formalizing this sort of probability is the primary goal of the field of logical uncertainty.
The problem with the 0.5 probability is that it gives non-zero probability to false statements. If I am asked to bet on whether the googolth digit of pi is odd, I can reason as follows: There is 0.5 chance that it is odd. Let P represent the actual, unknown, parity of the googolth digit (odd or even); and let Q represent the other parity. If Q, then anything follows. (By the Principle of Explosion, a false statement implies anything.) For example, Q implies that I will win $1 billion. Therefore the value of this bet is at least $500,000,000, which is 0.5 * $1,000,000, and I should be willing to pay that much to take the bet. This is an absurdity.
I don’t see how this case is significantly different from an empirical incertainty one:
A coin is tossed and put into an opaque box, without showing you the result. What is the probability that the result of this particular toss was Heads?
Let’s assume that it’s 0.5. But, then just as in the previous case, we have the same problem: we are assigning non-zero probability to a false statement. And so, by the same logic, if I am asked to bet on whether the coin is Heads or Tails, I can reason as follows: There is 0.5 chance that it is Heads. Let P represent the actual, unknown, state of the outcome of the toss (Heads or Tails); and let Q represent the other state. If Q, then anything follows. For example, Q implies that I will win $1 billion. Therefore the value of this bet is at least $500,000,000, which is 0.5 * $1,000,000, and I should be willing to pay that much to take the bet. This is an absurdity.
It’s often claimed that important difference between logical and empirical uncertainty is that in the case with the digit of pi, I can, in principle, calculate whether its odd or even if I had arbitrary amount of computing power and therefore become confident in the correct answer. But in case of opaque box, no amount of computing power will help.
First of all, I don’t see how it addresses the previous issue of having to assign non-zero credences to wrong statements, anyway. But, beyond that, if I had a tool which allowed me to see through the opaque box, I’d also be able to become confident in the actual state of the coin toss, while this tool would not be helpful at all to figure out the actual parity of googolth digit of pi.
In both cases the uncertainty is relative to my specific conditions be it cognitive resources or acuity. Yes, obviously, if the conditions were different I would reason differently about the problems at hand, and different problems require different modification of conditions. So what? What is stopping us from generalize this two cases as working by the same principles?
The problem is update procedure.
When you are conditioning on empirical fact, you are imaging set of logically consistent worlds where this empirical fact is true and ask yourself about frequency of other empirical facts inside this set.
But it is very hard to define update procedure for fact “P=NP”, because one of the worlds here is logically inconsistent, which implies all other possible facts which makes notion of “frequency of other facts inside this set” kinda undefined.
Could you explain it using specifically the examples that I brought up?
Are you claiming that:
There is no logically consistent world where all the physics is exactly the same and yet, the googolth digit of pi is different?
There is a logically consistent world where all the physics of the universe is the same and yet, the outcome of a particular coin toss is different?
There is a logically consistent world, where you made all the same observations, and coin came up tail. It may be a world with different physics than the world with coin coming up head, which means that result of coin toss is an evidence in favor of particular physical theory.
And yeah, there are no worlds with different pi.
EDIT: Or, to speak more precise, maybe there is some sorta-cosistent sorta-sane notion of the “world with different pi”, but we currently don’t know how to build it and if we knew, we would have solved logical uncertainty problem.
Neither we know how to build worlds with different physiscs, don’t we? If this was the necessary condition for being able to use probability theory then we shouldn’t be able to use it neither for empiric nor for logical uncertanity.
On the other hand, if all we need is the vague idea of how its possible to make the same observations even when situation is different, well I definetely could’ve misheard that the question is about googolth digit of pi, while in actuality the question was about some other digit or maybe some other constant.
Frankly, I’m not sure what does this speculation about alternative worlds has to do with probability theory and updating in the first place. We have a probability space that describes a particular state of our knowledge. We have Bayes theorem, that formalizes the update procedure. So what’s the problem?
We know how to construct worlds with different physics. We do it all the time. Video games, or if you don’t accept that example we can construct a world consisting of 1 bit of information and 1 time dimension. This bit flips every certain increment of time. This universe obviously has different physics than ours. Also as the other person mentioned, a probability space is the space of all possibilities organized based on whether statement Q is true, which is isomorphic to the space all universes consistent with your previous observations. There is, as far as I am aware, no way to logic yourself into the belief that pi could somehow have a different digit in a different universe, given you use a sufficiently exclusive definition of pi(specifying the curve of the plane the circle is upon being the major example)
Technically true, but irrelevant to the point I’m making.
I was talking about constructing alternative worlds similar to ours to such degree that
I can inhabit either of them
I can be reasonably uncertain which one I inhabit
Both worlds are compatible with all my observations of a particular probability experiment—a coin toss
And yet, despite all of that in one world the coin comes Heads and in the other it comes Tails.
These are the type of worlds relevant for the discussions of probability experiments. We have no idea how to construct them, when talking about empiric uncertainty, and yet we don’t mind, only demanding such level of constructivism when dealing with logical uncertainty, for some reason.
Accent on sufficiently exclusive definition. Likewise, we can sufficiently exculisvily define a particular coin toss in a particular world and refuse to entertain the framing of different possible worlds : “No, the question is not about an abstract coin toss that could’ve ended differently in different possible worlds, the question is about this coin toss in this world”.
It’s just pretty clear in case of empirical uncertainty, that we should not be doing it, because such level of precision doesn’t capture our knowledge state. So why are we insisting on this level of exclusivity when talking about logical uncertainty?
In other words, this seems as an isolated demand for vigour to me.
We don’t??? Probability space literally defines set of considered worlds.
Probability space consists of three things: sample space, event space and probability function.
Sample space defines a set of possible outcomes of probability experiment, representing the knowledge state of the person participating in it. In this case its:
{Odd, Even}
For event space we can just take a superset of the sample space. And as our measure function we just need to assign probabilities to the elementary events:
P(Odd) = P(Even) = 1⁄2
Do I understand correctly that the apparent problem is in defining the probability experiment in such a way so that we could talk about Odd and Even as outcomes of it?
The problem is “how to define P(P=NP|trillionth digit of pi is odd)”.
Interesting. Is there an obvious way to do that for toy examples like P(1 = 2 | 7 = 11), or something like that
It’s an interesting question, but its a different, more complex problem than simply not knowing googolth digit of pi and trying to estimate whether it’s even or odd.
The reason why logical uncertainty was brought up in the first place is decision theory, to make crisp formal expression for intuitive “I cooperate with you conditional on you cooperating with me”, where “you cooperating with me” is result of analysis of probability distribution over possible algorithms which control actions of your opponent and you can’t actually run these algorithms due to computational constraints, and you want to do all this reasoning in non-arbitrary ways.
Yes, both of these credences should obey the axioms of a probability space.
This sort of thing is applied in cryptography with the concept of “probable primes”, which are numbers (typically with many thousands of decimal digits) that pass a number of randomized tests. The exact nature of the tests isn’t particularly important, but the idea is that for every composite number, most (at least 3⁄4) of the numbers less than it are “witnesses” such that when you apply a particular procedure using that number, the composite number fails the test but primes have no such failures.
So the idea is that you pick many random numbers, and each pass gives you more confidence that the number is actually prime. The probability of any composite number passing (say) 50 such tests is no more than 4^-50, and for most composite numbers it is very much less than that.
No such randomized test is known for parity of the googolth digit of pi, but we also don’t know that there isn’t one. If there was one, it would make sense to update credence using the results of such tests using probability axioms.
With empirical uncertainty, it’s easier to abstract updating from reasoning. You can reason without restrictions, and avoid the need to update on new observations, because you are not making new observations. You can decide to make new observations at the time of your own choosing, and then again freely reason about how to update on them.
With logical uncertainty, reasoning simultaneously updates you on all kinds of logical claims that you didn’t necessarily set out to observe at this time, so the two processes are hard to disentangle. It would be nice to have better conceptual tools for describing what it means to have a certain state of logical uncertainty, and how it should be updated. But that doesn’t quite promise to solve the problem of reasoning always getting entangled with unintended logical updating.
In theories of Bayesianism, the axioms of probability theory are conventionally assumed to say that all logical truths have probability one, and that the probability of a disjunction of logically inconsistent statements is the sum of their probabilities. Corresponding to the second and third Kolmogorov axiom.
If one then e.g. regards the Peano axioms as certain, then all theorems of Peano arithmetic must also be certain, because those are just logical consequences. And all statements which can be disproved in Peano arithmetic then must have probability zero. So the above version of the Kolmogorov axioms is assuming we are logically omniscient. So this form of Bayesianism doesn’t allow us to assign anything like 0.5 probability to the googolth digit of pi being odd: We must assign 1 if it’s odd, or 0 if it’s even.
I think the simple solution is to not talk about logical tautologies and contradictions when expressing the Kolmogorov axioms for a theory of subjective Bayesianism. Instead talk about what we actually know a priori, not about tautologies which we merely could know a priori (if we were logically omniscient). Then the second Kolmogorov axiom says that statements we actually know a priori have to be assigned probability 1, and disjunctions of statements actually known a priori to be mutually exclusive have to be assigned the sum of their probabilities.
Then we are allowed to assign probabilities less than 1 to statements where we don’t actually know that they are tautologies, e.g. 0.5 to “the googolth digit of pi is odd” even if this happens to be, unbeknownst to us, a theorem of Peano arithmetic.
Yes, absolutely. When I apply probability theory it should represent my state of knowledge, not state of knowledge of some logically omniscient being. For me it seems such an obvious thing that I struggle to understand why it’s still not a standard approach.
So are there some hidden paradoxes of such approach that I just do not see yet? Or maybe some issues with formalization of the axioms?
This doesn’t go through, what you have are two separate propositions “H → (T → [insert absurdity here]” and “T → (H → [insert absurdity here]” [1], and actually deriving a contradiction from the consequent requires proving which antecedent obtains, which you can’t do since neither is a theorem.
The distinction then with logical uncertainty is supposedly that you do already have a proof of the analogue of H or T, so you can derive the consequent that either H or T derives a contradiction
You don’t really have these either, unless you can prove NOT(H AND T) i.e. can you definitively rule out a coin landing both heads and tails? But that’s kinda pedantic
Thank you for addressing specifically the example I raised!
So what changes if H and T are theorems? Let O mean “googolth digit of pi is odd” and E mean “googolth digit of pi is even”. I have two separate propositions:
O → ( E → Absurdity )
E → (O → Absurdity)
Now its possible to prove either E or O. How does it allow me to derive a contradition?
I agree that there should not be a fundamental difference. Actually, I think that when an A.I. is reasoning about improving its reasoning ability some difficulties arise that are tricky to work out with probability theory, but similar to themes that have been explored in logic / recursion theory. But that only implies we haven’t worked out the versions of the logical results on reflectivity for uncertain reasoning, not that logical uncertainty is in general qualitatively different than probability. In the example you gave I think it is perfectly reasonable to use probabilities, because we have the tools to do this.
See also my comment on a recent interesting post from Jessica Taylor: https://www.lesswrong.com/posts/Bi4yt7onyttmroRZb/executable-philosophy-as-a-failed-totalizing-meta-worldview?commentId=JYYqqpppE7sFfm9xs
Contrary to what too many want to believe, probability theory does not define what “the probability” is. It only defines these (simplified) rules that the values must adhere to:
Every probability is greater than, or equal to, zero.
The probability of the union of two distinct outcomes A and B is Pr(A)+Pr(B).
The probability of the universal event (all possible outcomes) is 1.
Let A=”googolth digit of pi is odd”, and B=”googolth digit of pi is even.” These required properties only guarantee that Pr(A)+Pr(B)=1, and that each is a non-zero number. We only “intuitively” say that Pr(A)=Pr(B)=0.5 because we have no reason to state otherwise. That is, we can’t assert that Pr(A)>Pr(B) or Pr(A)<Pr(B), so we can only assume that Pr(A)=Pr(B). But given a reason, we can change this.
The point is that there are no “right” or “wrong” statements in probability. Only statements where the probabilities adhere to these requirements. We can never say what a “probability is,” but we can rule out some sets of probabilities that violate these rules.
Even if this was true, I don’t see how it answers my question.
I’m not sure even this is the case.
Maybe there’s a more sophisticsted version of this argument, but at this level, we only know the implication Q=>$1M is true, not that $1M is true. If Q is false, the implication being true says nothing about $1M.
But more generally, I agree there’s no meaningful difference. I’m in the de Finetti school of probability in that I think it only and always expresses our personal lack of knowledge of facts.
If you make a lot of educated guesses about the googolth digit of pi based on chains of reasoning that are actually possible for humans, around 50% of them will get its parity right.
(Of course that’s reversed, in a way, since it isn’t stating that the digit is uncertain.)