The Case for Frequentism: Why Bayesian Probability is Fundamentally Unsound and What Science Does Instead
Land War in Eurasia
Bayes theorem is a tool for updating probabilities according to evidence.
Consider a real-world example of something important I was recently wrong about. On February 23, I found out Russia had invaded Ukraine. When disruptive events happen, you must respond immediately. I quickly read what the experts were saying on Foreign Affairs and combined it with my existing knowledge of history, technology and geopolitics. I wrote “There is going to be a war. Ukraine is going to lose. The question is how much, how quickly and on what terms.”
Then I remembered that war is unpredictable and war experts are often wrong. I added a qualifiers before publishing. “There is probably going to be a war. Ukraine is probably going to lose. The question is how much, how quickly and on what terms.”
A week later, reports of under-equipped soldiers began trickling in and I discovered that Russia’s economy was much smaller than I had assumed. I updated my analysis. As the war calcified, I finally had time to research current weapons technology and build my own model of the war from a tactics-level foundation.
Bayes’ Theorem
Over the course of the war, several observations changed my probabilities of an outcome.
Experts believe Russia will win.
Evidence arrived that Russian troops might be under-equipped.
I discovered the Russian economy is the size of Florida’s.
Yep. It’s not just propaganda. Russian troops really are under-equipped. Video evidence confirms it.
Russia’s assault has slowed to a crawl.
Each time new evidence came in, my analysis of the situation shifted away from a Russia blitzkreig-style fait accopli.
Bayesian probability treats probability as a subjective state that is updated in response to evidence.
Let denote the a Russian fait accompli.
Let denote the belief of experts that Russia will perform a fait accompli.
For example, my initial prior probability of “Russia will perform a fait accompli” started out at 0.5[1].
When I discovered that experts believed Russia would win, I updated my estimate. I estimated a probability of 0.9 that experts would believe in a Russian victory (conditional upon a Russian victory) . I estimated the probability of experts to believe in a Russian victory to be 0.5.
After observing experts believed in a Russian victory, I wanted to find out the probability of a Russian victory conditional on experts believing in a Russian victory. Bayes’ Theorem states that the probability of given equals the prior probability of multiplied by an update factor .
Thus, I arrived at a probability 0.9 “Ukraine is probably going to lose. The question is how much, how quickly and on what terms.”
Bayesian Chains
There are unconfirmed hints suggest that Russian troops might be under-equipped. Evidence is has arrived.
My prior probability “Russia will perform a fait accompli conditional on the observed fact that experts have predicted a fait accompli” is 0.9.
My probability of conditional on is 0.2.
We can use the Bayes equation to calculate new probability of Russian victory conditional on .
The precise numbers aren’t important. What matters is how Bayesian probabilities are chained together. Each bit of evidence multiplies our prior probability by an update factor to get a new probability. Suppose we have a finite set of observations . We can chain them together.
But now we have a problem. For every , there exists such that . In other words, for any finite observable evidence, there exists a sufficiently strong prior such that we can ignore the evidence.
Sufficiently strong priors are unfalsifiable. All knowledge that isn’t garbage is based on the concept of falsifiability. Reality is that which is falsifiable. If a statement isn’t falsifiable then that statement isn’t about reality.
Bayesian probability ultimately rests on an unfalsifiable conviction.
Corollary: Irresolvable Prior Disagreement
If two people disagree about but agree about all then it will be impossible for them to come into agreement because no is ever allowed to equal zero. Finite evidence cannot resolve a priori disagreement.
Stack Traces
The most important question in rationalism is “Why do you believe what you believe?” Suppose you asked a Bayesian “Why do you believe with confidence ? there are two ways a Bayesian can respond.
The Bayesian can claim is a prior. That’s not empiricism. That’s just blind faith.
The Bayesian can use Bayes Theorem: I believe with confidence because .
Option (1) is a claim of blind faith. Option (2) is three claims of blind faith. If you ask a Bayesian “Why do you believe with confidence ?” the answer is either blind faith “it’s a prior” or will be “Because I believe .” If you ask “Why do you believe ?” then the logical chain either never terminates or it terminates in blind faith.
That’s not even the worst problem with Bayesian philosophy.
The Belief-Value Uncertainty Principle
Suppose you asked me “What is the sum of ?” Usually, I would answer “” but if I were in the Ministry of Love’s Room 101 and they were threatening me with torture then I would answer “”.
A person’s beliefs are not observable. Only a person’s behavior is observable. Bayesians believe that people believe things and then optimize the world according to a value function. But there is no way to separate the two because:
For every observed behavior and every possible belief there exists a value function which could produce the behavior.
For every observed behavior and every possible value function there exists a belief which could produce the behavior.
A person’s beliefs and behavior are intimately related the way a quantum wave packet’s position is related to its momentum. You cannot observe both of them independently and at the same time.
Summary
Bayesian Probability has the following problems.
The answer to “Why do you believe ?” is always reducible to priors, which are non-falsifiable. Evidence has no effect on priors.
Rational agents with wildly differing priors are (usually) unable to come into even approximate agreement when provided with scarce evidence.
Rational agents who disagree about unconditional priors but who agree about evidence likelihood and conditional priors should be able to come into agreement. Instead, Bayesians who disagree about unconditional priors while agreeing about evidence likelihood and conditional priors are provably unable to ever reach exact agreement if they use Bayes’ Theorem. This is the opposite of how empiricism should work.
Identifying someone else’s beliefs requires you to separate a person’s value function from their beliefs, which is impossible.
Frequentism
Frequentism defines probability as the limit of an event’s relative frequency across a large number of independent trials. Frequentist probability is objective because it defined in terms of falsifiable real-world observations. An objective definition of probability can be used to resolve disagreements between scientists. A subjective definition cannot. That’s why Frequentist probability is used as the foundation of science and Bayesian probability isn’t.
The purpose of Frequentist probability is to bound uncertainties. Suppose you observe 100 white swans and 0 black swans. The probability you will observe a black swan by repeating the same experiment is probably not orders of magnitude higher than . Suppose the probability of observing a black swan is actually . The probability of observing zero black swans in a sample of 100 swans is . That’s tiny.
If you can run a large number of independent trials then it is easy to show with extremely high confidence that the frequency of an event is below some . Our Frequentist confidence in theories like electromagnetism has so many nines 0.999999… that it rounds to 1.
There are three dangers to Frequentist probability:
Sample reporting bias. Sample reporting bias is prevented via basic scientific hygiene such as pre-registering experiments.
Long tails. Frequentist probability will get you killed if you apply it naïvly to a long-tailed anti-inductive system like financial markets. That’s because Frequentist probability optimizes median (and other percentile) outcomes whereas long-tailed systems punish you according to average outcomes. Bhuntut has a cool argument about how the log utility of average outcomes is a consequence of optimizing for median (and other percentile) outcomes.
Small data. Frequentism defines probability in terms of a large number of independent trials. How do you construct probabilities when you don’t have “a large number of independent trials”?
Reductionism
Imagine you want to build a space elevator. Space elevators are expensive. You cannot build a million different versions and observe which ones work. You must get it right the first time. How can you do this when Frequentist probability requires “a large number of independent trials” to be meaningful at all?
You silo the problem into modular components.
Material reduction is the art of breaking a physical system into small subsystems with bounded interdependence. A space elevator depends strongly on the tensile strength of its core material. Laboratory material science experiments are cheap compared to building space elevators. You can build small bits of carbon nanotube table on Earth, test its tensile strength and expose it to space-like radiation. Do this over and over again until you’re confident in the integrity of your materials.
Break your problem into small components. Break those components into smaller components. Break those components into even smaller components. Do this until you get to the level of atoms and photons. (We know how to go even smaller than that but applying quantum field theory to an engineering challenge usually causes more problems then it solves.) If you’re confident about how a material behaves then you can be confident how an object build out of that material behaves.
Material reduction allows you to leverage your high confidence about the behavior of small things (which you have lots of data about) to confidently predict the behavior of big things (which you have little data about).
Most importantly, it’s falsifiable. If I believe the result of a small-scale experiment is and you think it’s then we can quickly be brought into agreement by just running the experiment.
- ↩︎
The probabilities used in this article here are retconned for illustrative purposes only. I did not use precise numeric probabilities in my analysis at the time.
- 17 Feb 2023 6:58 UTC; 4 points) 's comment on Self-Reference Breaks the Orthogonality Thesis by (
Btw, Eliezer recently expressed interest in dialoguing with a non-straw frequentist.
(The general view of frequentism I get from reading e.g. Neyman and Pearson is “hey look, we can get all these nice properties without even using the word ‘prior’, why are you even paying attention to subjective criteria when we have these AWESOME precision measuring instruments over here?”; this is also the view I get from Jacob Steinhardt’s post on frequentism.)
Eliezer’s tweet is what prompted me to write this. 😆
Yet you didn’t respond to his statement of the Bayesian alternative, namely, reporting likelihoods. Reporting likelihoods addresses all of your complaints (because it doesn’t rely on a prior at all). You can use arbitrary likelihood-ratio cutoffs in essentially the same way that you’d use arbitrary p-value cutoffs.
Some advantages of likelihoods over p-values:
You are encouraged to explicitly contrast hypotheses against each other, rather than pretending that there’s a privileged “null hypothesis” to contrast against. This somewhat helps avoid the failure mode of rejecting a fake null hypothesis that no one actually believed, and calling that a significant result.
If you do have a prior, it’s super easy to update on likelihoods (or even better, likelihood ratios).
p-values are almost likelihoods anyway, they just add the weird “x or greater” trick, which makes it harder to translate into likelihood ratios.
In other words: why mess up the nice elegant math of likelihoods with the weird alterations for p-values? Since likelihoods meet all the criteria you’ve stated in your post, and more besides, there should be some additional motivation for using p-values instead; some advantage over likelihoods which is worth the cost.
I’m pretty sure I’ve missed something, given that the number of papers giving yet-another-argument-against-p-values is approximately infinite, but that’s what I can come up with.
I would have worded this differently. For any finite set of pieces of evidence, a sufficiently strong prior can make our posterior probability arbitrarily low.
Practically speaking, this may cause us to disregard the evidence. This would be akin to resisting a Pascal’s mugging. Imagine I do a literature review on the evidence for telepathy, put all the pieces of evidence along with my prior into a spreadsheet, and crunch the numbers. Let’s say the result is that my priors on telepathy being real remain extremely low. You then run another study on telepathy that again finds that it’s real. I might decide that it’s not even worth my time to plug the data from your study into my spreadsheet.
I think that this is a very real issue. In addition to governing where people place their attention, I also see this turning up in the cocktail napkin math that people do. When they make up statistics for napkin math, somehow “low probability” is always 1%, rather than 0.1% or 0.01%. You can prove anything with 1% probabilities. I would love it if we could shift toward insisting on at least one piece of concrete evidence justifying the chosen order of magnitude.
In favor of Bayesianism being “more objective” than you’re making it out to be is that there’s no mathematical reason to have a threshold for our prior below which we disregard evidence. We can also make purely objective statements about the Bayesian chains you outlined above: “if your prior is X1, then your posterior is Y1 given this evidence. If your posterior is X2, then your posterior is Y2 given the same evidence.”
In favor of frequentism being “more subjective” that you’re making it out to be is that it still requires us to make assumptions about the underlying distribution when computing a confidence interval, although there are also techniques that let us evaluate the fit of various standard distributions to our data. However, there is no guarantee that a given dataset fits any standard distribution. Deciding that an underlying distribution is correct, best, or good enough for our purposes is a subjective decision. It seems to me that with frequentist statistics, the term by which the objective calculation becomes a subjective interpretation is left out of the statistic, whereas in Bayesian statistics it is stated explicitly. It seems to me that the latter approach is more clear.
As far as standards for science, one way Bayesian statistics could be used in a more objective or standards-based manner is by choosing a “standard prior” and “standard posterior” for a given field in order for findings to be considered “significant.” This would be analogous to a field-specific standard p value (i.e. of 0.05). I imagine this would run into all the same issues: instead of p-hacking, we’d have posterior-hacking. It’s just not clear to me that Bayesian statistics is any more fundamentally broken than frequentism.
Imagine you live earlier in history. Attempts to triangulate the distance away from the earth that stars are fails—in order to calculate that you need two perspectives that are far enough apart. But the method fails. And this tells you stars are unimaginably far away.
After looking at the numbers, you discount this possibility. There’s no way they’re that far away.
I think it’s unfair to raise this objection here while treating beliefs about probability as fundamental throughout the remainder of the post.
If you instead want to talk about the probability-utility mix that can be extracted from seeing another agent’s actions even while treating them as a black box… two Bayesian utility-maximizers with relatively simple utility functions in a rich environment will indeed start inferring Bayesian structure in each others’ actions (via things like absence of Dutch booking w.r.t. instrumental resources); they will therefore start treating each others’ actions as a source of evidence about the world, even without being confident about each others’ exact belief/value split.
If you want to argue their beliefs won’t converge, you’ll have to give a good example.
This seems consistent with the claim:
You did word things rather clearly.
Though I imagine some might object to ‘relatively simple utility functions(/rich environment)’ - i.e., people don’t have simple utility functions.
In principle, I was imagining talking about two AIs.
In practice, there are quite a few preferences I feel confident a random person would have, even if the details differ between people and even though there’s no canonical way to rectify our preferences into a utility function. I believe that the argument carries through practically with a decent amount of noise; I certainly treat it as some evidence for X when a thinker I respect believes X.
Hi, I’m a real frequentist, practicing mathematical statistician for 17 years counting. Related to this discussion, I’ve heard/collected arguments (many of them contradicting and rehashed) against frequentism for years and wrote them up at http://www.statisticool.com/mathstat/objectionstofrequentism.html
Justin
A few questions.
What about objective priors?
What would a frequentist analysis of the developing war look like?
Priors aren’t for believing. They’re for saying “if you start here, then after the evidence you end up there.” Where does a frequentist start?
I can tell my beliefs from my values. They both exist objectively, both observable by me. I can communicate them to other people. Cannot everyone? Extreme situations like Room 101 are not the usual way of things. From that example you could as well conclude that all communication is impossible.
With objective priors one can always ask “so what?” If it’s not my subjective prior, then its posterior will not equal my subjective posterior. There isn’t an obvious way to bound the difference between my subjective prior and the objective prior.
With frequentist methods it’s possible to get guarantees like “no matter what prior over θ you start with, if you run this method, you’ll correctly estimate θ to within ϵ with at least 1−δ probability”. It’s clear that a subjective Bayesian (with imperfect knowledge of their prior) might care about this sort of guarantee.
Material phenomena must be defined in material terms. My argument is that piors, beliefs and value functions are not ultimately defined in material terms. Immaterial phenomena is beyond the realm of scientific analysis.
Exactly the same.
A Frequent starts by saying we have observed n independent trials of evidence {Bi} therefore our uncertainty about probability P is bounded below ϵ>0.
Elizer wrote an entire sequence about the phenomenon where the beliefs someone purports do not necessarily equal their true beliefs.
My priors, my beliefs and my values constitute neither material phenomena nor sensory (including thoughts[1]) qualia. Therefore they are not observable to me.
I have thoughts about beliefs, but those thoughts are not, themselves, beliefs.
I’m confused by this claim? I thought the whole thing where you state your priors and conditional probabilities and perform updates to arrive at a posterior was… not frequentism?
I don’t even know what this means. It sounds like a ‘separate magisteria’ type argument.
Here’s my response, arguing for objective priors as a solution to some of the problems you raise.
(I haven’t read all the other responding comments so I may be repeating stuff.)
I will attempt to refrain from “jumping ahead” and guessing your replies to these points, because that would require me to guess your motivation (IE, why you think the four ‘problems’ are indeed problematic). I will instead take your statements at face value, as if you think these things are problems-in-themselves (which, if addressed, cease to be problems).
I do this in the spirit of hoping to draw out better statements of what you think the real problems are (things which, if addressed, would actually change your mind, as opposed to just changing your argument).
With description-length priors, claims about prior value are verifiable. Scientists can objectively demonstrate the ‘elegance’ of their theory by displaying a short description. (IE, elegance has been operationalized.)
Selecting a shared prior obviously addresses this.
Selecting a shared prior obviously addresses this.
Selecting a shared prior doesn’t address this fully, but does allow one to infer beliefs by combining the (agreed-upon) prior with the evidence which that person has encountered.
I do not think you are repeating stuff. If you are repeating stuff, you are not doing so in an annoying way. Your comment is unequivocally constructive.
Your answers raise so many more questions that I have to wonder whether you are only role-playing a frequentist, for want of any real ones stepping up to Eliezer’s challenge. But I’ll play along.
That rules out mathematics and all mental phenomena, except so far as people’s talk about the latter have been explained in terms of physical phenomena in the brain. Radical behaviourism and positivism, that is. Or am I misunderstanding you?
Complete with the Bayesian reasoning that you set out, including your prior probabilities? What do you understand by a frequentist analysis? You say this:
but I see none of this in your Bayesian (“exactly the same as the frequentist”) analysis of the war.
He has also written about how to better arrive at true beliefs. This is the fundamental theme of all of the Sequences, and of LessWrong. You seem to have taken the negative part of his message (“Woe! Woe! Nothing is true, all is a lie! Woe!”) as the whole.
I suppose that is what it is like, to be a radical behaviourist. Perhaps that accounts for the unusual style of your fictional writings. I have enjoyed them, and hope to see more, but I have wondered what sort of a mind writes them.
Thank you for the thoughtful feedback. You are not the only person who questions my Frequentist leanings. I am flattered by your accusations.
I have yet to finish composing my response to Eliezer. In the meantime, I will do my best to answer each of your questions.
It does rule out mathematics. Science and mathematics are separate epistemologies. Science is an uncertain[1] empirical art based on evidence. Mathematics is a certain[2] system of formal logic based on theorems and axioms. Mathematics is a valuable source of useful truth (I have a university degree in mathematics) but the domain of math is carefully circumscribed. Math and science intersect like the circles of a Venn Diagram.
I think we need a third category for qualia. But I don’t think that qualia is relevant to the current discussion and I would prefer to set qualia aside for the purposes of this discussion. I feel like discussing qualia would lead us down a different rabbit hole and distract us from the question at hand.
I am unfamiliar with the term “radical Behaviorism”. The way I understand the history of psychology, “Behaviorism” is a political agenda that emerged in response to Freudianism. My biggest qualm with historical Behaviorism is that it did not just throw out Freudianism. By treating behavior as the only psychological observable, it threw out valuable sources of knowledge too.
I would rather avoid discussing Behaviorism [political agenda] because politics is the mind killer and because political labels are often inconsistently defined. Is there a way we can discuss Behaviorism [philosophy] while tabooing the word “Behaviorism” itself?
I am less familiar with positivism. I am a big fan of meditation as a source of metaphysical insight, which (I think?) contradicts positivism. And I do not deny that drugs like LSD provides genuine knowledge. (I have never used LSD, but the evidence for its benefits seems extremely strong.) But (as is the case with meditation) I would rather not derail this conversation into the subject of altered states of consciousness.
I think I misinterpreted your question. The question I answered was “What would a Frequentist’s analysis of the developing war look like?” That is not the question you asked. I apologize.
The Bayesian analysis I wrote down is not what actually went through my head.
Bayesianism is one of many frameworks for making sense of the world e.g. Marxism, Christianity, Frequentism, Daoism and Shinto. What I wrote was a retroactive Bayesian confabulation. I could just as easily have written a Marxist confabulation. “Putin is not a true Marxist and today’s Russia is an undeserving usurper of the Soviet Empire. Therefore Putin’s Russia will inevitably….” Or a Christian confabulation. “The march to Justice passes through the Valley of Death. The ultimate outcome of a mass mobilization ultimately rests on the righteousness of each side. But in the short term….”
I did not use Bayesian logic (because Bayesian logic passes the buck from hard questions to priors).
I did not use Frequentist or scientific analysis either. Frequentism is the foundation of science. Military-political analysis is (for practical purposes) mostly beyond the domain of science. Political “science” and military “science” have “science” in their names because they are not real sciences.
If I had to answer the question “What epistemic framework did you use?” the honest answer would be “Daoist” or “none at all” (which, perhaps ironically, sounds like something a Daoist would say). But, as is the case with meditation, I would rather not open the Daoist can of worms because it involves concepts that are alien to readers of this blog.
Science takes time. My Frequentist analysis occurred later. “As the war calcified, I finally had time to research current weapons technology and build my own model of the war from a tactics-level foundation.”
Observations are true. Math is true. Those our our primitive elements. We can derive arbitrarily reliable abstractions (such as fundamental physics) from them via a series of checksums.
Thank you. I do not know if I think differently from other people. But the way I describe how I think is different from how others describe how they think. It is fun to add to my collection of these inconsistencies.
In theory. In practice, scientific conclusions are often very certain.
In theory. In practice, mathematical conclusions are often very uncertain.
The paradigmatic Radical Behaviourist is John B. Watson. In his paradigmatic work, “Behaviorism”, he asserted that there is no such thing as a mind, and that, for example, a dress designer cannot have any image in his mind of the dress he intends to create. (“He has not, or he would not waste his time making it up; he would make a rough sketch of it or he would tell his assistant how to make it.”) There are some who would defend him against the charge of believing something so absurd, but here is a radical behaviourist of the present day emphatically upholding this view. I am inclined to take Watson at his word, and surmise that he did not believe in minds because he was unaware of his own: he had no subjective experience of his own self. Only such a person, it seems to me, could have written what he did.
I don’t know how behaviourism vs. Freudianism aligns with any political division (or where all the other schools of psychology would fit). However, behaviourism would obviously serve the agenda of someone who would agree with Number 2: “The whole world, as this Village?” “That is my dream.”
I have noticed a political aspect to Bayes vs. frequentism: right-wing and left-wing respectively. As someone right-leaning who thinks that the correct union of the two, choosing the right tool for the job, is all of the former and none of the latter, I would say the reason for that alignment is that Bayesian reasoning requires you to know what you know and use it, while frequentist reasoning requires that you pretend not to know what you know, and on no account use it. But an actual frequentist, if one can be found, might differ.
ETA: I had thought that behaviorism arose in reaction to introspectionism, which was collapsing due to the failure of the introspectionists to agree about the basic facts of their introspections.
Thank you for the description. I’m definitely not a “radical Behaviorist”, since I do believe there is a mind. I observe my own mind and the downstream effects of others’ minds. I do have subjective experience, but to use the phrase “my own self” would distract us into metaphysical territory I’d rather avoid.
Behaviorism has lots of political implications. I read somewhere that it has historically been used to rationalize (in the confabulation/retcon/propaganda/justification sense) authoritarian dehumanizing systems.
I like this argument. It’s healthy food for thought.
I wouldn’t say you’re wrong. To prevent possible miscommunication, I would like to note that Behaviorism arose in response to Freudian introspection. Mystical introspection is a different thing that wasn’t even on Western psychology’s radar at the time.
What are beliefs if not thoughts?
What is phlogistan if not fire?
Why not both?
I would argue that the properties Bayesians think are important are important, and the properties frequentists think are important are also important.
This favors a view in which we track both subjective and objective probability.
Bayesians usually have to track parameters (EG in their conditional probability tables) anyway. These are essentially “objective chances”!
Frequentists have to make decisions, at which point they often have to go beyond what’s established as a matter of science and make some judgement calls. Bayesian decision theory is a theory of how to do this.
As I’ve argued elsewhere, Bayesian updates have a convergence problem. This can be fixed with frequentist-leaning ideas. Similarly, Bayesian probabilities are not guaranteed to be calibrated, but this can also be addressed with some frequentist-leaning ideas. And similarly again, Bayesian priors have a realizability problem, which can be fixed with frequentist-leaning ideas.
Frequentists say that probabilities are external to humans, and hence, cannot be definitively measured. In my experience, frequentists are under no illusion that objective probabilities can be known. Frequentist procedures can get unlucky and reject a true hypothesis.
The definition—probabilities are limiting frequencies of repeated experiments—does not in any way imply that results can’t cluster in weird ways in the sequence of experiments. For example, the statement that the frequency of an event limits to 0 is totally consistent with the first billion experiments displaying frequency 1. You need the (unjustified and unjustifiable) IID assumption to rule this out even probabilistically.
(I’ll grant that frequentists have some great ideas, but imho the IID assumption is not one of them.)
The definition in terms of limits is literally something that can never be observed. Limits can converge after arbitrarily long stretches of erratic behavior, so it can’t ever be falsified either. And trials are not necessarily independent. Assuming that in practice things will “be well behaved” is mathematically equivalent to choosing a prior, and thus inherits whatever issues you have with priors. So the objectivity and falsifiability of frequencies is an illusion.
If the trials are independent then the limit will not (with probability P→0) converge after an arbitrarily long stretch of erratic behavior. Limit convergence doesn’t need to be proved because limit convergence is a mathematical (as opposed to a scientific) fact.
Be aware that a decidable agent deciding to do or not do another trial itself means the trials are not independent. (Consider the case of an environment that simulates the agent and behaves erratically if and only if the agent would decide to continue.)
Also be aware that the presence of an embedded agent with memory in the world itself means the trials are not independent.
...under certain assumptions that people tend to forget.
The limit of the mean of a Cauchy distribution does not converge, for instance.
I find the use of ‘mathematical fact’ as an argument surprisingly[1] strong evidence that someone has overgeneralized, in practice. This instance is no exception.
In the sense of ‘this works far more often than I would expect’.
Typo:
The probability is .9^100, not .9 x 100. The calculated probability is still the same.
Fixed. Thanks.
The root of confusion, seems to me, is a question “where do priors come from?”
Your chain of thoughts about unfalsifiable priors looks to me like:
Probabilities are objective characteristics of physical world, frequencies that can be attributed to some parts of reality (events, objects, etc)
Priors are claims about probabilities that don’t depend on empirical evidence
Therefore, priors are claims about objective characteristics of physical world that don’t depend on empirical evidence
Claims about objective characteristics of physical world that don’t depend on empirical evidence (like “there is a dragon in my room, but you can’t see it, hear it, touch it”) are unfalsifiable
Therefore, priors are unfalsifiable and can’t be used in science.
The problem of this chain of thought is that prior probabilities of hypotheses are not about characteristics of physical world, they are about mathematical properties of formulations of hypotheses which are the same in all logically consistent worlds, like Kolmogorov complexity. Therefore, true Bayesian agents can’t disagree about priors (assuming logical omniscience).
There are practical problems with this approach:
We are not quite sure what form of priors is true—Solomonoff prior looks like this but I personally don’t know and there are debates.
We don’t know some boundedly wrong forms of appoximation of true priors which we need because Solomonoff prior and Kolmogorov complexity aren’t computable.
We don’t have corresponding scientific tradition of using this approach that should look like “to compare two equally good in explanation of data hypotheses write programs modeling these hypotheses and pick the shortest”.
In practical cases we almost never need “true prior” because actually we use “previous posterior knowledge”, but Bayes Rule doesn’t distinguish them.
Maybe the flaw isn’t in Bayesian analysis per se—maybe the flaw is in expecting all claims to be falsifiable. Any theory of reality will ultimately depend on some unfalsifiable claims, which are really just the axioms.
Bayesian analysis treats the earliest priors as axioms, because they are considered to be beyond the scope of analysis. This would be the “incompleteness” described in Godel’s Incompleteness Theorem.
Tl;dr: Frequentism, reductionism, and Bayesian probability—are all tools we can use, and you helped us see their inherent limits.