Excellent! One final point that I would like to add is if we say that “theta is a physical quantity s.t. [...]“, we are faced with an ontological question: “does a physical quantity exist with these properties?”.
I recently found about Professor Jaynes’ A_p distribution idea, it is introduced in chapter 18 of his book, from Maxwell Peterson in the sub-thread below and I believe it is an elegant workaround to this problem. It leads to the same results but is more satisfying philosophically.
This is how it would work in the coin flipping example:
Define A(u) to a function that maps from real numbers to propositions with domain [0, 1] s.t. 1. The set of propositions {A(u): 0 ⇐ u ⇐ 1} is mutually exclusive and exhaustive
2. P(y=1 | A(u)) = u and P(y=0 | A(u)) = 1 - u
Because the set of propositions is mutually exclusive and exhaustive, there is one u s.t. A(u) is true and for any v != u, A(v) is false. We call this unique value of u: theta.
It follows that P(y=1 | theta) = theta and P(y=0 | theta) = 1 - theta and we use this to calculate the posterior predictive distribution
Excellent! One final point that I would like to add is if we say that “theta is a physical quantity s.t. [...]“, we are faced with an ontological question: “does a physical quantity exist with these properties?”.
I recently found about Professor Jaynes’ A_p distribution idea, it is introduced in chapter 18 of his book, from Maxwell Peterson in the sub-thread below and I believe it is an elegant workaround to this problem. It leads to the same results but is more satisfying philosophically.
This is how it would work in the coin flipping example:
Define A(u) to a function that maps from real numbers to propositions with domain [0, 1] s.t.
1. The set of propositions {A(u): 0 ⇐ u ⇐ 1} is mutually exclusive and exhaustive
2. P(y=1 | A(u)) = u and P(y=0 | A(u)) = 1 - u
Because the set of propositions is mutually exclusive and exhaustive, there is one u s.t. A(u) is true and for any v != u, A(v) is false. We call this unique value of u: theta.
It follows that P(y=1 | theta) = theta and P(y=0 | theta) = 1 - theta and we use this to calculate the posterior predictive distribution