Well, I’ve a problem with attributing a non-1 probability of the laws of probabilities. Not that I couldn’t conceive them to be false—but that if they are false, any reasoning done on probabilities is wrong anyway.
Or said otherwise : P(A|A) = 1 is true by definition. And I claim that when you write P(A) and apply probability theorems on it, you’re in fact manipulating P(A|the laws of probabilities). So P(an axiom of probability theory) is in fact P(an axiom of probability theory|the laws of probabilities) = 1.
For theorems, you can say that P(Bayes Theorem) is not 1 because even if the axioms of probability theory are true, we may be wrong in proving Bayes Theorem from it. But as soon as you actually use Bayes Theorem to obtain a P(A) then you obtain in fact a P(A|Bayes Theorem).
Successful use would count as evidence for the laws of probabilities providing “good” values right? So if we use these laws quite a bit and they always work, we might have P(Laws of Probability do what we think they do) = .99999
We could discount our output using this. We could also be more constructive and discount based on the complexity of the derivation using the principle “long proofs are less likely to be correct” in the following way: Each derivation can be done in terms of combinations of various sub-derivations so we could get probability bounds for new, longer derivations from our priors over other derivations from which it is assembled. (derivations being the general form of the computation rather than the value specific one).
ETA: Wait, were you sort of diagonalizing on Bayes Theorem because we need to use that to update P(Bayes Theorem)? If so I might have misread you.
Well, I’ve a problem with attributing a non-1 probability of the laws of probabilities. Not that I couldn’t conceive them to be false—but that if they are false, any reasoning done on probabilities is wrong anyway.
Or said otherwise : P(A|A) = 1 is true by definition. And I claim that when you write P(A) and apply probability theorems on it, you’re in fact manipulating P(A|the laws of probabilities). So P(an axiom of probability theory) is in fact P(an axiom of probability theory|the laws of probabilities) = 1.
For theorems, you can say that P(Bayes Theorem) is not 1 because even if the axioms of probability theory are true, we may be wrong in proving Bayes Theorem from it. But as soon as you actually use Bayes Theorem to obtain a P(A) then you obtain in fact a P(A|Bayes Theorem).
Successful use would count as evidence for the laws of probabilities providing “good” values right? So if we use these laws quite a bit and they always work, we might have P(Laws of Probability do what we think they do) = .99999 We could discount our output using this. We could also be more constructive and discount based on the complexity of the derivation using the principle “long proofs are less likely to be correct” in the following way: Each derivation can be done in terms of combinations of various sub-derivations so we could get probability bounds for new, longer derivations from our priors over other derivations from which it is assembled. (derivations being the general form of the computation rather than the value specific one).
ETA: Wait, were you sort of diagonalizing on Bayes Theorem because we need to use that to update P(Bayes Theorem)? If so I might have misread you.