adrusi comments on The Principle of Predicted Improvement

adrusi 25 Apr 2019 6:40 UTC
24 points
0
I also had trouble with the notation. Here’s how I’ve come to understand it:
Suppose I want to know whether the first person to drive a car was wearing shoes, just socks, or no footwear at all when they did so. I don’t know what the truth is, so I represent it with a random variable $H$ , which could be any of “the driver wore shoes,” “the driver wore socks” or “the driver was barefoot.”
This means that $P (H)$ is a random variable equal to the probability I assign to the true hypothesis (it’s random because I don’t know which hypothesis is true). It’s distinct from $P (H = h_{i})$ and $P (h_{i})$ which are both the same constant, non-random value, namely the credence I have in the specific hypothesis $h_{i}$ (i.e. “the driver wore shoes”).
( $P (H = h_{i})$ is roughly “the credence I have that ‘the driver wore shoes’ is true,” while $P (h_{i})$ is “the credence I have that the driver wore shoes,” so they’re equal, and semantically equivalent if you’re a deflationist about truth)
Now suppose I find the driver’s great-great-granddaughter on Discord, and I ask her what she thinks her great-great-grandfather wore on his feet when he drove the car for the first time. I don’t know what her response will be, so I denote it with the random variable $D$ . Then $P (H | D)$ is the credence I assign to the correct hypothesis after I hear whatever she has to say.
So $E (P (H = h_{i} | D)) = P (H = h_{i})$ is equivalent to $E (P (h_{i} | D)) = P (h_{i})$ and means “I shouldn’t expect my credence in ‘the driver wore shoes’ to change after I hear the great-great-granddaughter’s response,” while $E (P (H | D)) \geq E (P (H))$ means “I should expect my credence in whatever is the correct hypothesis about the driver’s footwear to increase when I get the great-great-granddaughter’s response.”
I think there are two sources of confusion here. First, $H$ was not explicitly defined as “the true hypothesis” in the article. I had to infer that from the English translation of the inequality,
In English the theorem says that the probability we should expect to assign to the true value of H after observing the true value of D is greater than or equal to the expected probability we assign to the true value of H before observing the value of D,
and confirm with the author in private. Second, I remember seeing my probability theory professor use sloppy shorthand, and I initially interpreted $P (H)$ as a sloppy shorthand for $P (H = h_{i})$ . Neither of these would have been a problem if I were more familiar with this area of study, but many people are less familiar than I am.