Suppose that I have a coin with probability of heads . I certainly know that is fixed and does not change as I toss the coin. I would like to express my degree of belief in and then update it as I toss the coin.
Using a constant pdf to model my initial belief, the problem becomes a classic one and it turns out that my belief in should be expressed with the pdf after observing heads out of tosses. That’s fine.
But let’s say I’m a super-skeptic guy that avoids accepting any statement with certainty, and I am aware of the issue of parametrization dependence too. So I dislike this solution and instead choose to attach beliefs to statements of the form “my initial degree of belief is represented with probability density function .”
Well this is not quite possible since the set of all such is uncountable. However something similar to the probability density trick we use for continuous variables should do the job here as well. After observing some heads and tails, each initial belief function will be updated just as we did before, which will create a new uneven “density” distribution over . When I want to express my belief that is in between numbers and , now I have a probability density function instead of a definite number, which is a collection of all definite numbers from each (updated) prior. Now I can use the mean of this function to express my guess and I can even be skeptic about my own belief!
This first meta level is still somewhat manageable, as I computed the Var = 1⁄12 for the initial uniform density over where is the mean of a particular . I am not sure whether my approach is correct, though. Since the domain of each is finite, I discretize this domain and represent the uniform density over as a finite collection of continuous random variables whose joint density is constant. Then taking the limit to infinity.
The whole thing may not make sense at all. I’m just curious what would happen if we use even deeper meta levels, with the outermost level being the uniform “thing”. Is there any math literature anybody knows that already explored something similar to this idea? Like maybe use of probability theory in higher-order logics?
Edit 1:
Let me rephrase my question in a more formal way so that everything becomes more clear.
Let be our first probability space where is the sample space coming from our original problem, is the set of events considered that satisfy the rules for being a -algebra and is the probability measure.
First of all, for full generality let us choose for all , that is, the set of all subsets of sample space is our event set. Such an is always a -algebra for any .
Now let me define to be the set of all possible probability measures for all . Note that depends only on .
Let be the probability space where is constructed eventually from . The final ingredient missing is , we would like it to be a “uniform” probability measure in some sense.
After we invent some nice “uniform” , I plan to use this construct as follows: An event occurs with probability , which is just a set of probability measures all belonging to the level. Now we use each of these measures to create a set of probability spaces: .
Then for each of these spaces an event occurs with probability determined by the probability measure of that space and so on. A tree will be created whose leaves are elements of , the events of our original problem.
Now the same element of can appear more than once among the leaves of this tree. So to compute the total probability that an event occurs, we should add up probabilities of all paths. The depth of the tree is finite, but the number of branches spawned at each level may not be countable at all, which seems to be a dead-end to our journey.
Additional constraints may mitigate this problem which I plan to explore in a later edit.
There’s definitely some literature about “probability of probability” (I remember one bit from Jaynes’ book). Usually when people try to go turbo-meta with this, they do something a little different than you, and just ask for “probability of probability of probability”—i.e. they ask only for the meta-meta-distribution of the value of the meta-distribution (or density function) at its object-level value.
Unsure if that’s in Jaynes too.
Connection to logic seems questionable because it’s hard to make logic and probability play nice together formally (maybe the intro to the Logical Inductors paper has good references for complaints about this).
Philosophically I think that there’s something fishy going on here, and that calling something a “distribution over probabilities” is misleading. You have probability distributions when you’re ignorant of something. But you’re not actually ignorant about what probability you’d assign to the next flip being heads (or at least, not under Bayesian assumptions of infinite computational power).
Instead, the thing you’re putting a meta-probability distribution over has to be something else that looks like your Bayesian probability but can be made distinct, like “long-run frequency if I flip the coin 10,000 times” or “correct value of some parameter in my physical model of the coin.” It’s very common for us to want to put probability distributions over these kinds of things, and so “meta-probabilities” are common.
And then your meta-meta-probability has to be about something distinct from the meta-probability! But now I’m sort of scratching my head about what that something is. Maybe “correct value of some parameter in a model of my reasoning about a physical model of the coin?”
In the setup of the question you caused my type checker to crash and so I’m not giving an answer to the math itself so much as talking about the choices I think you might need to make to get the question to type check for me...
Here is a the main offending bit:
When you get down into the foundations of math and epistemology it is useful to notice when you’re leaping across the entire conceptual universe in question in single giant bounds.
(You can of course, do this, but then to ask “where would I be heading if I kept going like this?” means you leave the topic, or bounce off the walls of your field, or become necessarily interdisciplinary, or something like that.)
When you “attach beliefs to statements” you might be attaching them to string literals (where you might have logical uncertainty about whether they are even syntactically valid), or maybe you’re attaching to the semantic sense (Frege’s Sinn) that you currently impute to those string literals? Or maybe to the semantic sense that you WILL impute to those string literals eventually? Or to the sense that other people who are better at thinking will impute?
...or maybe are you really attaching beliefs to possible worlds (that is, various logically possible versions of the totality of what Frege’s Bedeutung are embedded within) that one or another of those “senses” points at (refers to) and either “rules in or rules out as true” under a correspondence theory of truth...
...or maybe something else? There’s lots of options here!
When I search for [possible worlds foundations bayes] the best of the first couple hits is to a team trying to deploy modal logics: The Modal Logic of Bayesian Belief Revision (2017).
When I search for [bayesian foundations in event spaces] there’s an weird new paper struggling with fuzzy logic (which is known to cause bayesian logic to explode because fuzzy logic violates the law of the excluded middle) and Pedro Teran’s 2023 “Towards objective Bayesian foundations with fuzzy events” found some sort of (monstrous?) alternative to bayes that don’t work totally the same way?
Basically, there’s a lot of flexibility in how you ground axioms to things that seem like they could be realized in physics (or maybe mere “realized” in lower level intuitively accessible axioms).
Using my default assumptions, my type checker crashed on what you said because all of the ways I could think to ground some of what you said in a coherent way… lead to incoherence based on other things you said.
I was able to auto-correct your example S(f) to something like you having a subjective probability that could be formalized P(“As a skilled subjective Bayesian, fryolysis should represent fryolysis’s uncertainty about a single stable fair coin’s possible mechanical/structural biases that could affect fair tosses with the pdf f(x)=(nh)xh(1−x)n−h after observing h heads out of n tosses of the coin.”)
But then, for your example S(f), you claimed they were uncountable!?
But… you said statements, right?
And so each S(f) (at least if you actually say what the f is using symbols) can be turned into a gödel number, and gödel numbers are COUNTABLY finite, similarly to (and for very similar reasons as) the algebraic numbers.
One of the main ideas with algebraic numbers is that they don’t care if they point to a specific thing hiding in an uncountable infinity. Just because the real neighborhood of π (or “pi” for the search engines) is uncountable doesn’t necessarily make π itself uncountable. We can point to π in a closed and finite way, and since the pointing methods are countable, the pointing methods (tautologically)… are countable!
You said (1) it was statements you were “attaching” probabilities to but then you said (2) there were uncountably many statements to handle.
I suspect you can only be in reflective equilibrium about at most one of these claims (and maybe neither claim will survive you thinking about this for an adequately long time).
This is being filed as an “Answer” instead of a “Comment” because I am pointing to some of the nearby literature, and maybe that’s all you wanted? <3