I also had trouble with the notation. Here’s how I’ve come to understand it:
Suppose I want to know whether the first person to drive a car was wearing shoes, just socks, or no footwear at all when they did so. I don’t know what the truth is, so I represent it with a random variable H, which could be any of “the driver wore shoes,” “the driver wore socks” or “the driver was barefoot.”
This means that P(H) is a random variable equal to the probability I assign to the true hypothesis (it’s random because I don’t know which hypothesis is true). It’s distinct from P(H=hi) and P(hi) which are both the same constant, non-random value, namely the credence I have in the specific hypothesis hi (i.e. “the driver wore shoes”).
(P(H=hi) is roughly “the credence I have that ‘the driver wore shoes’ is true,” while P(hi) is “the credence I have that the driver wore shoes,” so they’re equal, and semantically equivalent if you’re a deflationist about truth)
Now suppose I find the driver’s great-great-granddaughter on Discord, and I ask her what she thinks her great-great-grandfather wore on his feet when he drove the car for the first time. I don’t know what her response will be, so I denote it with the random variable D. Then P(H|D) is the credence I assign to the correct hypothesis after I hear whatever she has to say.
So E(P(H=hi|D))=P(H=hi) is equivalent to E(P(hi|D))=P(hi) and means “I shouldn’t expect my credence in ‘the driver wore shoes’ to change after I hear the great-great-granddaughter’s response,” while E(P(H|D))≥E(P(H)) means “I should expect my credence in whatever is the correct hypothesis about the driver’s footwear to increase when I get the great-great-granddaughter’s response.”
I think there are two sources of confusion here. First, H was not explicitly defined as “the true hypothesis” in the article. I had to infer that from the English translation of the inequality,
In English the theorem says that the probability we should expect to assign to the true value of H after observing the true value of D is greater than or equal to the expected probability we assign to the true value of H before observing the value of D,
and confirm with the author in private. Second, I remember seeing my probability theory professor use sloppy shorthand, and I initially interpreted P(H) as a sloppy shorthand for P(H=hi). Neither of these would have been a problem if I were more familiar with this area of study, but many people are less familiar than I am.
I also had trouble with the notation. Here’s how I’ve come to understand it:
Suppose I want to know whether the first person to drive a car was wearing shoes, just socks, or no footwear at all when they did so. I don’t know what the truth is, so I represent it with a random variable H, which could be any of “the driver wore shoes,” “the driver wore socks” or “the driver was barefoot.”
This means that P(H) is a random variable equal to the probability I assign to the true hypothesis (it’s random because I don’t know which hypothesis is true). It’s distinct from P(H=hi) and P(hi) which are both the same constant, non-random value, namely the credence I have in the specific hypothesis hi (i.e. “the driver wore shoes”).
(P(H=hi) is roughly “the credence I have that ‘the driver wore shoes’ is true,” while P(hi) is “the credence I have that the driver wore shoes,” so they’re equal, and semantically equivalent if you’re a deflationist about truth)
Now suppose I find the driver’s great-great-granddaughter on Discord, and I ask her what she thinks her great-great-grandfather wore on his feet when he drove the car for the first time. I don’t know what her response will be, so I denote it with the random variable D. Then P(H|D) is the credence I assign to the correct hypothesis after I hear whatever she has to say.
So E(P(H=hi|D))=P(H=hi) is equivalent to E(P(hi|D))=P(hi) and means “I shouldn’t expect my credence in ‘the driver wore shoes’ to change after I hear the great-great-granddaughter’s response,” while E(P(H|D))≥E(P(H)) means “I should expect my credence in whatever is the correct hypothesis about the driver’s footwear to increase when I get the great-great-granddaughter’s response.”
I think there are two sources of confusion here. First, H was not explicitly defined as “the true hypothesis” in the article. I had to infer that from the English translation of the inequality,
and confirm with the author in private. Second, I remember seeing my probability theory professor use sloppy shorthand, and I initially interpreted P(H) as a sloppy shorthand for P(H=hi). Neither of these would have been a problem if I were more familiar with this area of study, but many people are less familiar than I am.