Taking another shot at what the fundamental question is: a normative theory tells us something about how agents ought to behave, whereas a descriptive theory tells us something about what is; physical theories seem to be descriptive rather than normative, but when they’re merely probabilistic, how can probabilities tell us anything about what is?
The idea that a descriptive theory tells us about “what really is” is rooted in the correspondence theory of truth, and deeper in a generally Aristotelian metaphysics and logic which takes as a self-evident first-principle the Law of Excluded Middle (LEM), that “of one subject we must either affirm or deny any one predicate”. Even if a probabilistic theory enables us to affirm the open sets of probability 1, and to deny the open sets of probability 0, the question remains: how can a probabilistic theory “tell us” anything more about what really is? What does “a probability of 0.4″ correspond to in reality?
If we accept LEM wholesale in both metaphysics (the domain of what is) and logic (for my purposes, the normative characterization of rational speech), then our descriptive theories are absolutely limited to deterministic ones. For any metaphysical proposition P about reality, either P actually is or P actually is not; “P actually is” is a logical proposition Q, and a rational speaker must either affirm Q or deny Q, and he speaks truth iff his answer agrees with what actually is. To accommodate nondeterministic theories, one must give way either in the metaphysics or the logic.
This is so pragmatically crippling that even Aristotle recognized it, and for propositions like “there will be a sea-battle tomorrow”, he seems to carve out an exception (although what exactly Aristotle meant in this particular passage is the subject of embarrassingly much philosophical debate). My interpretation is that he makes an exception on the logical side only, i.e. that a rational speaker may not be required to affirm or deny tomorrow’s sea-battle, even though metaphysically there is an actual fact of the matter one way or the other. If the rational speaker does choose either to affirm or to deny tomorrow’s sea-battle, then the truth of his claim is determined by its correspondence with the actual fact (which presumably will become known soon). My guess is that you’d be sympathetic to this direction, and that you’re willing to go further and get on board with probabilistic logic, but then your question is: how could a probabilistic claim like “with probability 0.4, there will be a sea-battle tomorrow” conceivably have any truth-making correspondence with actual facts?
A similar problem would arise for nondeterminism if someone said “it is indeterminate whether there will be a sea-battle tomorrow”: how could that claim correspond, or fail to correspond, to an actual fact? However, we can adopt a nondeterministic theory and simply refuse to answer, and then we make no claim to judge true or false, and the crisis is averted. If we adopt a probabilistic theory and try the same trick, refusing to answer about A when its probability is 0<P(A)<1, then we can say exactly as much as the mere nondeterminist who knows only our distribution’s support—in other words, not very much (especially if we thoroughly observe Cromwell’s Rule). We have to be able to speak in indeterminate cases to get more from probabilistic theories than merely nondeterministic theories.
The metaphysical solution (for the easier case of nondeterminism) is Kripke’s idea of branching time, where possible worlds are reified as ontologically real, and the claim “it is indeterminate whether there’s a sea-battle tomorrow” is true iff there really is a possible future world where there is a sea-battle tomorrow and another possible future world where there isn’t. Kripke’s possible-world semantics can be naturally extended to the case where there is a probability measure over possible successor worlds, and “with probability 0.4, there will be a sea-battle tomorrow” is made true by the set of {possible future worlds in which a sea battle takes place tomorrow} in fact having measure exactly 2⁄3 that of the set of {other possible future worlds}. But there are good epistemological reasons to dislike this metaphysical move. First, the supposed truthmakers are, as you point out, epiphenomenal—they are in counterfactual worlds, not observable even in principle, so they fail Einstein’s criterion for reality. Second, some people can be better-informed about uncertain events than others, even if both of their forecasts are false in this metaphysical sense—as would almost surely always be the case if, metatheoretically, the “actual” probabilities are continuous quantities. The latter issue can be mitigated by the use of credal sets, a trick I learned from Definability of Truth by Christiano, Yudkowsky, et al.; we can say a credal set is made true by the actual probability lying within it. But still, one credal set can be closer to true than another.
The epistemological solution, which I prefer, is to transcend the paradigm that rational claims such as those about probabilities must be made true or false by their correspondence with some facts about reality. Instead of being made true or false, claims accrue a quantitative score based on how surprised they are by actual facts (as they appear in the actual world, not counterfactual worlds). With the rule S(A)=logP(A), if you get the facts exactly right, you score zero points, and if you deny something which turns out to be a fact, you score −∞ points. In place of the normative goal of rational speech to say claims that are true, and the normative goal of rational thought to add more true claims to your knowledge base, the normative goals are to say and believe claims that are less wrong. Bayesian updating, and the principle of invariant measures, and the principle of maximum entropy (which relies on having some kind of prior, by the way), are all strategies for scoring better by these normative lights. This is also compatible with Friston’s free energy principle, in that it takes as a postulate that all life seeks to minimize surprise (in the form of −logp(A|θ)). Note, I don’t (currently) endorse such sweeping claims as Friston’s, but at least within the domain of epistemology, this seems right to me.
This doesn’t mean that probabilistic theories are normative themselves, on the object-level. For example, the theory that Brownian motion (the physical phenomenon seen in microscopes) can be explained probabilistically by a Wiener process is not a normative theory about how virtuous beings ought to respond when asked questions about Brownian motion. Of course, the Wiener process is instead a descriptive theory about Brownian motion. But, the metatheory that explains how a Wiener process can be a descriptive theory of something, and how to couple your state of belief in it to observations, and how to couple your speech acts to your state of belief—that is a normative metatheory.
It might seem like something is lost here, that in the Aristotelian picture with deterministic theories we didn’t need a fiddly normative metatheory. We had what looked like a descriptive metatheory: to believe or say of what is that it is, is truth. But I think actually this is normative. For example, in a heated moment, Aristotle says that someone who refuses to make any determinate claims “is no better off than a vegetable”. But really, any theory of truth is normative; to say what counts as true is to say what one ought to believe. I think the intuition behind correspondence theories of truth (that truth must be determined by actual, accessible-in-principle truth-makers) is really a meta-normative intuition, namely that good norms should be adjudicable in principle. And that the intuition behind bivalent theories of truth (that claims must be either True or False) is also a meta-normative intuition, that good norms should draw bright lines leaving no doubt about which side an act is on. The meta-norm about adjudication can be satisfied by scoring rules, but in the case of epistemology (unlike jurisprudence), the bright-line meta-norm just isn’t worth the cost, which is that it makes talk of probabilities meaningless unless they are zero or one.
So I agree with most of what you say here, and as a Metaculus user I have some sympathy for trying to make proper scoring rules the epistemological basis of “probability-speak”. There are some problems with it, like different proper scoring rules give different incentives to people when it comes to distributing finite resources across many questions to acquire info about them, but broadly I think the norm of scoring models (or even individual forecasters) by their Brier score or log score and trying to maximize your own score is a good norm.
There are probably other issues, but the immediate problem for me is that this way of bootstrapping probabilistic theories seems to be circular. Given that you accept the whole Bayesian framework already, it’s obvious that under this meta-normative theory you’re supposed to report your true credence for any event because that’s what will maximize your expected log score. This is perfectly consistent but the proper scoring rule appears to be superfluous if you already are a Bayesian. However, if you don’t already accept the Bayesian way of looking at the problem then “maximize S(A)=logP(A)” is useless advice: S is a function from the states of the world to the real numbers and there’s no total order on that space for you to use for this maximization problem. In practice we would act like Bayesians and this would work, but then we’re right back where we started because we’re using probabilities when they don’t seem to add any epistemic content.
There are other versions of this which I’ve mentioned in other comments: for example you can have a norm of “try to make money by betting on stuff” and you can use a Dutch book argument to show that contingent claim prices are going to give you a probability measure. While that justifies the use of some probabilities with a fairly natural sounding norm, it doesn’t explain what I’m doing when I price these contingent claims or what the funny numbers I get as a result of this process actually mean. (It also leads to some paradoxes when the contingent claim payoffs are correlated with your marginal utility, but I’m setting that issue aside here.)
My central point of disagreement with your answer is that I don’t think “claims must be either True or False” is a meta-normative intuition and I think it can’t be necessary to abandon the law of excluded middle in order to justify the use of probabilities. In fact, even the proper scoring rule approach you outline doesn’t really throw out the law of excluded middle, because unless there’s some point at which the question will resolve as either True or False there’s no reason for you to report your “true credence” to maximize your expected score and so the whole edifice falls apart.
the immediate problem for me is that this way of bootstrapping probabilistic theories seems to be circular.
I think it is not circular, though I can imagine why it seems so. Let me try to elaborate the order of operations as I see it.
Syntax: Accept that a probability-sentence like “P(there will be a sea-battle tomorrow) ≥ 0.4” is at least syntactically parseable, i.e. not gibberish, even if it is semantically disqualified from being true (like “the present King of France is a human”).
This can be formalized as adding a new term-former P:ClassicalSentence→ProbabilityTerm, other term-formers such as +:ProbabilityTerm×ProbabilityTerm→ProbabilityTerm, constants C:Q→ProbabilityTerm, and finally a predicate ≥0:ProbabilityTerm→ProbabilitySentence.
Logic: Accept that probability-sentences can be the premises and/or conclusions of valid deductions, such as P(A)≥0.4,P(B∧A)≥0.5⋅P(A)⊢P(B)≥0.2.
Axiomatizing the valid deductions in a sound and complete way is not as easy as it may seem, because of the interaction with various expressive features one might want (native conditional probabilities, higher-order probabilities, polynomial inequalities) and model-theoretic and complexity-theoretic issues (pathological models, undecidable satisfiability). Some contenders:
LPWF, which has polynomial inequalities but not higher-order probabilities
LCP, which has higher-order conditional probabilities but not inequalities
LPP2, which has neither, but has decidable satisfiability.
Anyway, the basic axioms about probability that we need for such logics are:
P(α)≥0
P(⊤)=1
P(⊥)=0
P(α)+P(β)=P(α∨β)+P(α∧β)
P(α↔β)=1→P(α)=P(β)
α⊢P(α)=1
Those axioms can, if you wish, be derived from much weaker principles by Cox-style theorems. It’s important to admit that Cox’s proof of his original theorem (as cited by Jaynes) was mistaken, so there isn’t actually a “Cox’s theorem”, but rather a family of variants that actually work given different assumed principles. My favorite is Van Horn 2017, which uses only the following principles:
Equivalence-invariance: If X↔Y and X→(A↔B), then c(A|X)=c(B|Y).
Definition-invariance: If s is an atomic proposition not appearing in A, X, or E, then c(A|X)=c(A|X∧(s↔E)).
Irrelevance-invariance: If Y is a noncontradictory formula sharing no symbols with either X or A, then c(A|X)=c(A|X∧Y).
Implication-compatibility: If X→(A→B) but not X→(B→A), then c(A|X)<c(B|X).
Epistemics: Revise the Aristotelian norms, as follows:
Instead of demanding that a rational speaker either assert or deny any classical sentence about relevant propositions, instead demand that
a rational speaker assert or deny any probability-sentence about relevant propositions, and that
all their assertions be coherent, in the sense that probability-logic cannot deduce ⊥ from any subset of them.
Instead of classifying a speaker as either correct or incorrect (depending on whether they assert what is and deny what is not or deny what is and assert what is not), score them on the basis of the greatest rational q for which they asserted P(A)−q≥0 (where A is the conjunction of all of “what is”, or rather what is observed), and award them logq points.
The logq rule in particular can be justified and characterized at this stage just by the property of invariance under observation orderings, i.e. logP(A0)+logP(A1|A0)=logP(A1) (discussed more below)
Decision theory: Optionally, you can now assume the vNM axioms on top of the probabilistic logic, prove the vNM theorem, formalize a speech-act game internalizing the logP(A) rule, and then prove a revelation theorem that says that the optimal policy for obtaining epistemic points is to report one’s actual internal beliefs.
I think the key confusion here is that it may seem like one needs the decision theory set up already in order to justify the scoring rule (to establish that it incentivizes honest revelation), but the decision theory also depends on the scoring rule. I claim that the scoring rule can be justified on other grounds than honest revelation. If you don’t buy the argument of invariance under observation orderings, I can probably come up with other justifications, e.g. from coding theory. Closing the decision-theoretic loop also does provide some justificatory force, even if it is circular, since being able to set up a revelation theorem is certainly a nice feature of this logP(A) norm.
But fundamentally, whether in this system or Aristotle’s, one doesn’t identify the epistemic norms by trying to incentivize honest reporting of beliefs, but rather by trying to validate reports that align with reality. The logP(A) rule stands as a way of extending the desire for reports that align with reality to the non-Boolean logic of probability, so that we can talk rationally about sea-battles and other uncertain events, without having to think about in what order we find things out.
different proper scoring rules give different incentives to people when it comes to distributing finite resources across many questions to acquire info about them
I haven’t studied this difference, but I want to register my initial intuition that to the extent other proper scoring rules give different value-of-information incentives than the log scoring rule, they are worse and the incentives from the log rule are better. In particular, I expect the incentives of the log rule to be more invariant to different ways of asking multiple questions that basically add up to one composite problem domain, and that being sensitive to that would be a misfeature.
In fact, even the proper scoring rule approach you outline doesn’t really throw out the law of excluded middle, because unless there’s some point at which the question will resolve as either True or False there’s no reason for you to report your “true credence” to maximize your expected score and so the whole edifice falls apart.
Even if a question never resolves fully enough to make all observables either True or False (i.e., if the possibility space is Hausdorff, resolves to a Dirac delta), but just resolves incrementally to more and more precise observations A0⊃A1⊃⋯⊃Ak⊃⋯, the log scoring rule remains proper, since
I don’t think the same can be said for the Brier scoring rule; it doesn’t even seem to have a well-defined generalization to this case.
There are a couple fiddly assumptions here I should bring out explicitly:
when it comes to epistemic value, we should have a temporal discount factor of γ=1, very much unlike prudential or ethical values where I argue the discount factor must be γ<1.
If we don’t do this, then we get an incentive to smear out our forecasts to the extent we expect high precision to take a long time to obtain.
This is one reason to keep epistemic value as a separate normative domain from other kinds of value.
The point you mentioned parenthetically about contingencies correlating with marginal utility is another reason to keep utility separate from epistemic value.
When we decide what probabilistic statements to make, we should act as-if either the question will eventually resolve fully, or “there will always be more to discover” and that more is always discovered eventually.
Big tangent: There is a resonance here with CEV, where we try to imagine an infinite future limit of all ethical knowledge having been learned, and judge our current intentions by that standard, without discounting it for being far in the future, or discounting the whole scenario for being less-than-certain that ethical beings will survive and continue their ethical development indefinitely or until there is nothing more to learn.
Here we are sort-of in the domain of ethics, where I’d say temporal discounting is necessary, but methodologically the question of how to determine ethical value is an epistemic one. So we shouldn’t discount future ethical-knowledge Bayes-points, but we can still discount object-level ethical value.
Taking another shot at what the fundamental question is: a normative theory tells us something about how agents ought to behave, whereas a descriptive theory tells us something about what is; physical theories seem to be descriptive rather than normative, but when they’re merely probabilistic, how can probabilities tell us anything about what is?
The idea that a descriptive theory tells us about “what really is” is rooted in the correspondence theory of truth, and deeper in a generally Aristotelian metaphysics and logic which takes as a self-evident first-principle the Law of Excluded Middle (LEM), that “of one subject we must either affirm or deny any one predicate”. Even if a probabilistic theory enables us to affirm the open sets of probability 1, and to deny the open sets of probability 0, the question remains: how can a probabilistic theory “tell us” anything more about what really is? What does “a probability of 0.4″ correspond to in reality?
If we accept LEM wholesale in both metaphysics (the domain of what is) and logic (for my purposes, the normative characterization of rational speech), then our descriptive theories are absolutely limited to deterministic ones. For any metaphysical proposition P about reality, either P actually is or P actually is not; “P actually is” is a logical proposition Q, and a rational speaker must either affirm Q or deny Q, and he speaks truth iff his answer agrees with what actually is. To accommodate nondeterministic theories, one must give way either in the metaphysics or the logic.
This is so pragmatically crippling that even Aristotle recognized it, and for propositions like “there will be a sea-battle tomorrow”, he seems to carve out an exception (although what exactly Aristotle meant in this particular passage is the subject of embarrassingly much philosophical debate). My interpretation is that he makes an exception on the logical side only, i.e. that a rational speaker may not be required to affirm or deny tomorrow’s sea-battle, even though metaphysically there is an actual fact of the matter one way or the other. If the rational speaker does choose either to affirm or to deny tomorrow’s sea-battle, then the truth of his claim is determined by its correspondence with the actual fact (which presumably will become known soon). My guess is that you’d be sympathetic to this direction, and that you’re willing to go further and get on board with probabilistic logic, but then your question is: how could a probabilistic claim like “with probability 0.4, there will be a sea-battle tomorrow” conceivably have any truth-making correspondence with actual facts?
A similar problem would arise for nondeterminism if someone said “it is indeterminate whether there will be a sea-battle tomorrow”: how could that claim correspond, or fail to correspond, to an actual fact? However, we can adopt a nondeterministic theory and simply refuse to answer, and then we make no claim to judge true or false, and the crisis is averted. If we adopt a probabilistic theory and try the same trick, refusing to answer about A when its probability is 0<P(A)<1, then we can say exactly as much as the mere nondeterminist who knows only our distribution’s support—in other words, not very much (especially if we thoroughly observe Cromwell’s Rule). We have to be able to speak in indeterminate cases to get more from probabilistic theories than merely nondeterministic theories.
The metaphysical solution (for the easier case of nondeterminism) is Kripke’s idea of branching time, where possible worlds are reified as ontologically real, and the claim “it is indeterminate whether there’s a sea-battle tomorrow” is true iff there really is a possible future world where there is a sea-battle tomorrow and another possible future world where there isn’t. Kripke’s possible-world semantics can be naturally extended to the case where there is a probability measure over possible successor worlds, and “with probability 0.4, there will be a sea-battle tomorrow” is made true by the set of {possible future worlds in which a sea battle takes place tomorrow} in fact having measure exactly 2⁄3 that of the set of {other possible future worlds}. But there are good epistemological reasons to dislike this metaphysical move. First, the supposed truthmakers are, as you point out, epiphenomenal—they are in counterfactual worlds, not observable even in principle, so they fail Einstein’s criterion for reality. Second, some people can be better-informed about uncertain events than others, even if both of their forecasts are false in this metaphysical sense—as would almost surely always be the case if, metatheoretically, the “actual” probabilities are continuous quantities. The latter issue can be mitigated by the use of credal sets, a trick I learned from Definability of Truth by Christiano, Yudkowsky, et al.; we can say a credal set is made true by the actual probability lying within it. But still, one credal set can be closer to true than another.
The epistemological solution, which I prefer, is to transcend the paradigm that rational claims such as those about probabilities must be made true or false by their correspondence with some facts about reality. Instead of being made true or false, claims accrue a quantitative score based on how surprised they are by actual facts (as they appear in the actual world, not counterfactual worlds). With the rule S(A)=logP(A), if you get the facts exactly right, you score zero points, and if you deny something which turns out to be a fact, you score −∞ points. In place of the normative goal of rational speech to say claims that are true, and the normative goal of rational thought to add more true claims to your knowledge base, the normative goals are to say and believe claims that are less wrong. Bayesian updating, and the principle of invariant measures, and the principle of maximum entropy (which relies on having some kind of prior, by the way), are all strategies for scoring better by these normative lights. This is also compatible with Friston’s free energy principle, in that it takes as a postulate that all life seeks to minimize surprise (in the form of −logp(A|θ)). Note, I don’t (currently) endorse such sweeping claims as Friston’s, but at least within the domain of epistemology, this seems right to me.
This doesn’t mean that probabilistic theories are normative themselves, on the object-level. For example, the theory that Brownian motion (the physical phenomenon seen in microscopes) can be explained probabilistically by a Wiener process is not a normative theory about how virtuous beings ought to respond when asked questions about Brownian motion. Of course, the Wiener process is instead a descriptive theory about Brownian motion. But, the metatheory that explains how a Wiener process can be a descriptive theory of something, and how to couple your state of belief in it to observations, and how to couple your speech acts to your state of belief—that is a normative metatheory.
It might seem like something is lost here, that in the Aristotelian picture with deterministic theories we didn’t need a fiddly normative metatheory. We had what looked like a descriptive metatheory: to believe or say of what is that it is, is truth. But I think actually this is normative. For example, in a heated moment, Aristotle says that someone who refuses to make any determinate claims “is no better off than a vegetable”. But really, any theory of truth is normative; to say what counts as true is to say what one ought to believe. I think the intuition behind correspondence theories of truth (that truth must be determined by actual, accessible-in-principle truth-makers) is really a meta-normative intuition, namely that good norms should be adjudicable in principle. And that the intuition behind bivalent theories of truth (that claims must be either True or False) is also a meta-normative intuition, that good norms should draw bright lines leaving no doubt about which side an act is on. The meta-norm about adjudication can be satisfied by scoring rules, but in the case of epistemology (unlike jurisprudence), the bright-line meta-norm just isn’t worth the cost, which is that it makes talk of probabilities meaningless unless they are zero or one.
So I agree with most of what you say here, and as a Metaculus user I have some sympathy for trying to make proper scoring rules the epistemological basis of “probability-speak”. There are some problems with it, like different proper scoring rules give different incentives to people when it comes to distributing finite resources across many questions to acquire info about them, but broadly I think the norm of scoring models (or even individual forecasters) by their Brier score or log score and trying to maximize your own score is a good norm.
There are probably other issues, but the immediate problem for me is that this way of bootstrapping probabilistic theories seems to be circular. Given that you accept the whole Bayesian framework already, it’s obvious that under this meta-normative theory you’re supposed to report your true credence for any event because that’s what will maximize your expected log score. This is perfectly consistent but the proper scoring rule appears to be superfluous if you already are a Bayesian. However, if you don’t already accept the Bayesian way of looking at the problem then “maximize S(A)=logP(A)” is useless advice: S is a function from the states of the world to the real numbers and there’s no total order on that space for you to use for this maximization problem. In practice we would act like Bayesians and this would work, but then we’re right back where we started because we’re using probabilities when they don’t seem to add any epistemic content.
There are other versions of this which I’ve mentioned in other comments: for example you can have a norm of “try to make money by betting on stuff” and you can use a Dutch book argument to show that contingent claim prices are going to give you a probability measure. While that justifies the use of some probabilities with a fairly natural sounding norm, it doesn’t explain what I’m doing when I price these contingent claims or what the funny numbers I get as a result of this process actually mean. (It also leads to some paradoxes when the contingent claim payoffs are correlated with your marginal utility, but I’m setting that issue aside here.)
My central point of disagreement with your answer is that I don’t think “claims must be either True or False” is a meta-normative intuition and I think it can’t be necessary to abandon the law of excluded middle in order to justify the use of probabilities. In fact, even the proper scoring rule approach you outline doesn’t really throw out the law of excluded middle, because unless there’s some point at which the question will resolve as either True or False there’s no reason for you to report your “true credence” to maximize your expected score and so the whole edifice falls apart.
I think it is not circular, though I can imagine why it seems so. Let me try to elaborate the order of operations as I see it.
Syntax: Accept that a probability-sentence like “P(there will be a sea-battle tomorrow) ≥ 0.4” is at least syntactically parseable, i.e. not gibberish, even if it is semantically disqualified from being true (like “the present King of France is a human”).
This can be formalized as adding a new term-former P:ClassicalSentence→ProbabilityTerm, other term-formers such as +:ProbabilityTerm×ProbabilityTerm→ProbabilityTerm, constants C:Q→ProbabilityTerm, and finally a predicate ≥0:ProbabilityTerm→ProbabilitySentence.
Logic: Accept that probability-sentences can be the premises and/or conclusions of valid deductions, such as P(A)≥0.4,P(B∧A)≥0.5⋅P(A)⊢P(B)≥0.2.
Axiomatizing the valid deductions in a sound and complete way is not as easy as it may seem, because of the interaction with various expressive features one might want (native conditional probabilities, higher-order probabilities, polynomial inequalities) and model-theoretic and complexity-theoretic issues (pathological models, undecidable satisfiability). Some contenders:
LPWF, which has polynomial inequalities but not higher-order probabilities
LCP, which has higher-order conditional probabilities but not inequalities
LPP2, which has neither, but has decidable satisfiability.
Anyway, the basic axioms about probability that we need for such logics are:
P(α)≥0
P(⊤)=1
P(⊥)=0
P(α)+P(β)=P(α∨β)+P(α∧β)
P(α↔β)=1→P(α)=P(β)
α⊢P(α)=1
Those axioms can, if you wish, be derived from much weaker principles by Cox-style theorems. It’s important to admit that Cox’s proof of his original theorem (as cited by Jaynes) was mistaken, so there isn’t actually a “Cox’s theorem”, but rather a family of variants that actually work given different assumed principles. My favorite is Van Horn 2017, which uses only the following principles:
Equivalence-invariance: If X↔Y and X→(A↔B), then c(A|X)=c(B|Y).
Definition-invariance: If s is an atomic proposition not appearing in A, X, or E, then c(A|X)=c(A|X∧(s↔E)).
Irrelevance-invariance: If Y is a noncontradictory formula sharing no symbols with either X or A, then c(A|X)=c(A|X∧Y).
Implication-compatibility: If X→(A→B) but not X→(B→A), then c(A|X)<c(B|X).
Epistemics: Revise the Aristotelian norms, as follows:
Instead of demanding that a rational speaker either assert or deny any classical sentence about relevant propositions, instead demand that
a rational speaker assert or deny any probability-sentence about relevant propositions, and that
all their assertions be coherent, in the sense that probability-logic cannot deduce ⊥ from any subset of them.
Instead of classifying a speaker as either correct or incorrect (depending on whether they assert what is and deny what is not or deny what is and assert what is not), score them on the basis of the greatest rational q for which they asserted P(A)−q≥0 (where A is the conjunction of all of “what is”, or rather what is observed), and award them logq points.
The logq rule in particular can be justified and characterized at this stage just by the property of invariance under observation orderings, i.e. logP(A0)+logP(A1|A0)=logP(A1) (discussed more below)
Decision theory: Optionally, you can now assume the vNM axioms on top of the probabilistic logic, prove the vNM theorem, formalize a speech-act game internalizing the logP(A) rule, and then prove a revelation theorem that says that the optimal policy for obtaining epistemic points is to report one’s actual internal beliefs.
I think the key confusion here is that it may seem like one needs the decision theory set up already in order to justify the scoring rule (to establish that it incentivizes honest revelation), but the decision theory also depends on the scoring rule. I claim that the scoring rule can be justified on other grounds than honest revelation. If you don’t buy the argument of invariance under observation orderings, I can probably come up with other justifications, e.g. from coding theory. Closing the decision-theoretic loop also does provide some justificatory force, even if it is circular, since being able to set up a revelation theorem is certainly a nice feature of this logP(A) norm.
But fundamentally, whether in this system or Aristotle’s, one doesn’t identify the epistemic norms by trying to incentivize honest reporting of beliefs, but rather by trying to validate reports that align with reality. The logP(A) rule stands as a way of extending the desire for reports that align with reality to the non-Boolean logic of probability, so that we can talk rationally about sea-battles and other uncertain events, without having to think about in what order we find things out.
I haven’t studied this difference, but I want to register my initial intuition that to the extent other proper scoring rules give different value-of-information incentives than the log scoring rule, they are worse and the incentives from the log rule are better. In particular, I expect the incentives of the log rule to be more invariant to different ways of asking multiple questions that basically add up to one composite problem domain, and that being sensitive to that would be a misfeature.
Even if a question never resolves fully enough to make all observables either True or False (i.e., if the possibility space is Hausdorff, resolves to a Dirac delta), but just resolves incrementally to more and more precise observations A0⊃A1⊃⋯⊃Ak⊃⋯, the log scoring rule remains proper, since
logP(Ak)+logP(Ak+1|Ak)=logP(Ak)+logP(Ak+1∩Ak)P(Ak)=logP(Ak)+logP(Ak+1)P(Ak)=logP(Ak)+logP(Ak+1)−logP(Ak)=logP(Ak+1)I don’t think the same can be said for the Brier scoring rule; it doesn’t even seem to have a well-defined generalization to this case.
There are a couple fiddly assumptions here I should bring out explicitly:
when it comes to epistemic value, we should have a temporal discount factor of γ=1, very much unlike prudential or ethical values where I argue the discount factor must be γ<1.
If we don’t do this, then we get an incentive to smear out our forecasts to the extent we expect high precision to take a long time to obtain.
This is one reason to keep epistemic value as a separate normative domain from other kinds of value.
The point you mentioned parenthetically about contingencies correlating with marginal utility is another reason to keep utility separate from epistemic value.
When we decide what probabilistic statements to make, we should act as-if either the question will eventually resolve fully, or “there will always be more to discover” and that more is always discovered eventually.
Big tangent: There is a resonance here with CEV, where we try to imagine an infinite future limit of all ethical knowledge having been learned, and judge our current intentions by that standard, without discounting it for being far in the future, or discounting the whole scenario for being less-than-certain that ethical beings will survive and continue their ethical development indefinitely or until there is nothing more to learn.
Here we are sort-of in the domain of ethics, where I’d say temporal discounting is necessary, but methodologically the question of how to determine ethical value is an epistemic one. So we shouldn’t discount future ethical-knowledge Bayes-points, but we can still discount object-level ethical value.