On Having No Clue
Let’s suppose you’re trying to figure out if something is going to be A or B, but you have no clue.
What probability should you assign to each option?
Some people might say that you should assign 50⁄50. After all, the argument goes, if you assigned 40⁄60 or 60⁄40, well it sure sounds like you’re favoring one option over another.
That’s a very persuasive argument, but unfortunately, it’s more complicated than that.
It may be the case that A splits into two options A1 and A2 where again we have no clue whether A1 is more likely than A2 or how these compare to B. In which case, the same argument would suggest that we should go with 33/33/33 (rounding down).
This seems to be a contradiction. What should we make of this?
First of all, I think we should accept that this exact process of reasoning leads to a contradiction. There’s nothing fancy going on here. No dubious steps that could give us an out.
We tried to say that if we had no clue that the logical thing to do is to assign equal probability to each option, we forgot that we were implicitly assuming that we should favor our particular way of carving up probability space.
In other words, we did need a clue after all. So what kind of clue is this exactly?
Well, surely in order to be justified in carving up the space a particular way, we’d need to have a reason to believe that each possibility is equally likely a priori.
Sadly, this is circular. We wanted to defend equal probabilities by asserting that our way of carving up the space was reasonable, but then we tried to assert that it was reasonable by claiming that each possibility had equal probability.
The problem is that we are making an assumption, but rather than owning it, we’re trying to deny that we’re making any assumption at all, ie. “I’m not assuming a priori A and B have equal probability based on my subjective judgement, I’m using the principle of indifference”. Roll to disbelieve.
Contra Descartes, if you start with nothing, you can’t get anywhere.
- 28 Feb 2024 21:15 UTC; 3 points) 's comment on Counting arguments provide no evidence for AI doom by (
Complexity heuristics possibly work to some extent here. If A is a hypothesis of comparable complexity to B, then you at least have some basis for assigning them comparable prior probabilities.
If A can be conveniently split into A1 and A2, then the child hypotheses are often more complex than A in that they each require some additional property on top of A. This isn’t always the case, but in the exceptional cases it’s likely more useful to consider one or both of the child hypotheses to be “primary” instead of A.
This is all very rough, since complexity is usually only defined up to some constant and is relative to some sort of specification model, but it’s something that isn’t necessarily contingent on observations of the world. If you’re an entity that is considering probabilities and hypotheses at all, the concept of measuring complexity is likely already accessible to you.
This is a good point. I neglected to address this possibility.
If you have two options, A and B, 50% odds is maximal ignorance; you aren’t saying they have equivalent odds of being true, you’re saying you have no information by which to make an inference which is true.
If you then say we can split A into A1 and A2, you have added information to the problem. Like the Monty Hall problem, information can change the odds in unexpected ways!
There’s no contradiction here—you have more information than when you originally assigned odds of 50⁄50. And the information you have added should, in real situations, inform how to distribute the odds. If A1 and A2 are sufficiently distinct (independent), it is possible that a 33/33/33 split is appropriate; if they aren’t, it is possible that a 25/25/50 split is appropriate. In order to make a judgment, we’d have to know more about what exactly A1 and A2 are, and why they can be considered a “split” of A.
Consider, for example, the case of a coin being flipped—we don’t know if the coin is fair or not. Let us say A is that the coin comes up “heads” and B is that the coin comes up “tails”. The split, then, could reflect a second flip, after the first flip is decided, if and only if it is heads; A1 might be “heads-heads”, A2 might be “heads-tails”. Then a 25/25/50 split makes sense; A1 and A2 are not independent.
If, on the other hand, we have discovered that it isn’t a coin at all, but a three-sided die, two faces of which have the same symbol, and one face of which has another symbol; we label the faces with the similar symbol A1 and A2, and the face with the
sameedit: other symbol B. We still don’t know whether or not the die is fair—maybe it is weighed—but the position of maximal ignorance is 33/33/33, because even if it -is- weighted, we don’t know which face it is weighted in favor of; A1 and A2 are independent.So—what are A1 and A2, and how independent are they? We have equations that can work this out with sample data, and your prior probability should reflect your expectation of their independence. If you insist on maximal ignorance about independence—then you can assume independence. Most things are independent; it is only the way the problem is constructed that leads us to confusion here, because it seems to suggest that they are not independent (consider that we can simply rename the set of conclusions to “A, B, and C”—all the names you have utilized are merely labels, after all, and in effect, what you have actually done is to introduce C, with an implication that A and C should maybe be considered partially dependent variables). If you insist on maximal ignorance about that, as well, then you can, I suppose, assume 50% independence, which would be something like splitting the difference between the die and the coin. And there’s an argument to be made there, in that you have, in fact, implied that they should maybe be considered partially dependent variables—but this comes down to trying to interpret what you have said, rather than trying to understand the nature of probability itself.
“If you then say we can split A into A1 and A2, you have added information to the problem. Like the Monty Hall problem, information can change the odds in unexpected ways!”—It’s not clear which is the baseline.
The point there is that there is no contradiction because the informational content is different. “Which is the baseline” is up to the person writing the problem to answer. You’ve asserted that the baseline is A vs B; then you’ve added information that A is actually A1 and A2.
The issue here is entirely semantic ambiguity.
Observe what happens when we remove the semantic ambiguity:
You’ve been observing a looping computer program for a while, and have determined that it shows three videos. The first video portrays a coin showing tails. The second video portrays two coins; the left coin shows heads, the right coin shows tails. The third video also portrays two coins; the left coin shows heads, the right coin shows heads.
You haven’t been paying attention to the frequency, but now, having determined there are three videos you can see, you want to figure out how frequently each video shows up. What are your prior odds for each video?
33/33/33 seems reasonable. I’ve specified that you’re watching videos; the event is which video you are watching, not the events that unfold within the video.
Now, consider an alternative framing: You are watching somebody as they repeat a series of events. You have determined the events unfold in three distinct ways; all three begin the same way, with a coin being flipped. If the coin shows heads, it is flipped again. If the coin shows tails, it is not. What are your prior odds for each sequence of events?
25/25/50 seems reasonable.
Now, consider yet another framing: You are shown something on a looping computer screen. You have determined the visuals unfold in three distinct ways; all three begin the same way, with a coin being flipped. If the coin shows heads, it is flipped again. If the coin shows tails, it is not. What are your prior odds here?
Both 25/25/50 and 33/33/33 are reasonable. Why? Because it is unclear whether or not you are watching a simulation of coin flips, or something like prerecorded videos; it is unclear whether or not you should treat the events within what you are watching as events, or whether you should treat the visuals themselves you are watching as the event.
Because it is unclear, I’d lean towards treating the visuals you are watching as the event—that is, assume independence. However, it would be perfectly fair to treat the coin tosses as events also. Or you could split the difference. Prior probabilities are just your best guess given the information you have available—and given that I don’t have access to all the information you have available, both options are fair.
Now, the semantic ambiguity you have introduced, in the context of this, is like this:
You’re told you are going to watch a computer program run, and what you see will begin with a coin being flipped, showing heads or tails. What are your probabilities that it will show heads or tails?
Okay, 50⁄50. Now, if you see the coin shows heads, you will see that it is flipped again; we now have three possibilities, HT, HH, and TT. What are your probabilities for each event?
Notice: You didn’t specify enough to know what the relevant events we’re assigning probabilities to even are! We’re in the third scenario; we don’t know if it’s a video, in which case the relevant event is “Which video we are watching”, or if it is a simulation, in which case the relevant event is “The outcome of each coin toss.” Either answer works, or you can split the difference, because at this point a large part of the probability-space is devoted, not to the events unfolding, but towards the ambiguity in what events we’re even evaluating.
Agreed, great post. But I think you are trying to push Bayesian Statistics past what it SHOULD be used for.
Bayesian Statistics are only useful because we approach the correct answer as we gain all the information possible. Only in this limit (of infinite information) is Bayesian useful. Priors based off no information are, well, useless.
Scenario 1: You flip a fair coin and have a 50⁄50 chance of it landing heads
Scenario 2: (to steal xepo’s example) are bloxors greeblic? You have NO IDEA, so your priors are 50⁄50
Even though in both scenarios the chances are 50⁄50, I would feel much more confident betting money on scenario 1 than scenario 2. Therefore my model of choices contains something MORE than probabilities. As far as I know Bayesian statistics just doesn’t convey this NEEDED information. You cant use Bayesian probabilities here in a useful way. It’s the wrong tool for the job.
Even frequentest statistics is useless here.
A lot of day-to-day decisions are based off very limited information. I am not able to lay out a TRUE model of how we intuitively make those decisions but “how much information I have to work with” is definitely an aspect in my mental model that is not entirely captured by Bayes Theorem.
So the one we know the most about gets the heavier cumulative weight, because it has more sub-classes in our reasoning, because we know the most about it.
I suspect that without a solid base case, this process of over-weighting familiar options could be weaponized to form an argument against pursuing novel ideas about familiar topics. If I have “no clue” about, say, how a new model of the universe would compare to the old models, I could split “possibilities from old models” into so many fragments that the probability of the new model, weighted equally to all the tiny slivers of the familiar old model, approaches 0.
It seems like preventing this may require that the things being compared seem of similar size, but if you can guess the size of something, you’re no longer entirely clueless about it.
It seems like the problem might be in the assumption of having “no clue”. I think someone truly clueless about a question would be unable to divide it into parts in order to compare the parts’ probabilities. I imagine being asked an advanced question about the grammar or meaning of a passage in a language that I can neither speak nor read, and I would be unable to even formulate a question to assign probabilities to.
I claim the problem is that our model is insufficient to capture our true beliefs.
There’s a difference in how we act between a coin flip (true 50⁄50) and “are bloxors greeblic?” (a question we have no info about).
For example, if our friend came and said “Yes, i know this one, the answer is (heads|yes)”. For coin flip you’d say “are you out of your mind?” and for bloxors you’d say “Ok, sure, you know better than me”
I’ve been idly pondering over this since Scott Alexander’s post. What is a better model?
One option would be to have another percentage — a meta-percentage. e.g. “What credence do i give to “this is an accurate model of the world””? For coin flips, you’re 99.999% that 50% is a good model. For bloxors, you’re ~0% that 50% is a good model.
I don’t love it, but it’s better than presuming anything on the base level, i think.
This is a model that I always tend to fall back on but I can never find a name for it so find it hard to look into. I have always figured I am misunderstanding Bayesian statistics and somehow credence is all factored in somehow. That doesn’t really seem like the case though.
Does the Scott Alexander post lay this out? I am having difficulty finding it.
The closest term I have been able to find is Kelly constants, which is a measure of how much “wealth” you should rationally put into a probabilistic outcome. Replace “wealth” with credence and maybe it could be useful for decisions but even this misses the point!
He doesn’t really. Here’s the original article:
https://www.astralcodexten.com/p/mr-tries-the-safe-uncertainty-fallacy
Also there was a long follow-up where he insists 50% is the right answer, but it’s subscriber-only:
https://www.astralcodexten.com/p/but-seriously-are-bloxors-greeblic
It’s possible to do such modelling with beta-distributions (actually similar to meta-probabilities).
Combination of B(1;1) (something like non-informative prior) and B(a;b) (information obtained from friend) will be B(1+a;1+b) - moved from equal probabilities far more than combination B(1000;1000)⋅B(a;b)=B(1000+a;1000+b).