One way to describe this is to note that choosing the action that maximises the expectation of value is not the same as choosing that action that can be expected to produce the most value. So choosing p=0.53 maximises our expectations, not our expectation of production of value.
Doesn’t seem to want to let me edit the comment above, but I could have explained this clearer. The figure (8p-6p^2)/(p+1) is actually a weighted mean of Ex and Ey where these are the expected values at X and Y respectively. Specifically, this value is:
(1*Ex+p*Ey)/(1+p)
Now, the expected value calculated from the planning optimal decision which is just Ex. We shouldn’t be surprised that the weighted mean is quite a different value.
One way to describe this is to note that choosing the action that maximises the expectation of value is not the same as choosing that action that can be expected to produce the most value. So choosing p=0.53 maximises our expectations, not our expectation of production of value.
Doesn’t seem to want to let me edit the comment above, but I could have explained this clearer. The figure (8p-6p^2)/(p+1) is actually a weighted mean of Ex and Ey where these are the expected values at X and Y respectively. Specifically, this value is:
(1*Ex+p*Ey)/(1+p)
Now, the expected value calculated from the planning optimal decision which is just Ex. We shouldn’t be surprised that the weighted mean is quite a different value.