Your calculations aren’t quite right. You’re treating EU(action) as though it were a probability value (like P(action)). EU(action) would be more logically written E(utility | action), which itself is an integral over utility * P(utility | action) for utility∈(-∞,∞), which, due to linearity of * and integrals, does have all the normal identities, like
In this case, if you do expand that out, using p<<1 for the probability of an error, which is independent of your action, and assuming E(utility|action1,error) = E(utility|action2,error), you get E(utility | action) = E(utility | error) * p + E(utility | action, ¬error) * (1 - p). Or for the difference between two actions, EU1 - EU2 = (EU1' - EU2') * (1 - p) where EU1', EU2' are the expected utilities assuming no errors.
Anyway, this seems like a good model for “there’s a superintelligent demon messing with my head” kind of error scenarios, but not so much for the everyday kind of math errors. For example, if I work out in my head that 51 is a prime number, I would accept an even odds bet on “51 is prime”. But, if I knew I had made an error in the proof somewhere, it would be a better idea not to take the bet, since less than half of numbers near 50 are prime.
Right, I didn’t quite work all the math out precisely, but at least the conclusion was correct. This model is, as you say, exclusively for fatal logic errors; the sorts where the law of non-contradiction doesn’t hold, or something equally unthinkable, such that everything you thought you knew is invalidated. It does not apply in the case of normal math errors for less obvious conclusions (well, it does, but your expected utility given no errors of this class still has to account for errors of other classes, where you can still make other predictions).
Your calculations aren’t quite right. You’re treating
EU(action)
as though it were a probability value (likeP(action)
).EU(action)
would be more logically writtenE(utility | action)
, which itself is an integral overutility * P(utility | action)
forutility∈(-∞,∞)
, which, due to linearity of*
and integrals, does have all the normal identities, likeE(utility | action) = E(utility | action, e) * P(e | action) + E(utility | action, ¬e) * P(¬e | action)
.In this case, if you do expand that out, using
p<<1
for the probability of an error, which is independent of your action, and assumingE(utility|action1,error) = E(utility|action2,error)
, you getE(utility | action) = E(utility | error) * p + E(utility | action, ¬error) * (1 - p)
. Or for the difference between two actions,EU1 - EU2 = (EU1' - EU2') * (1 - p)
whereEU1', EU2'
are the expected utilities assuming no errors.Anyway, this seems like a good model for “there’s a superintelligent demon messing with my head” kind of error scenarios, but not so much for the everyday kind of math errors. For example, if I work out in my head that 51 is a prime number, I would accept an even odds bet on “51 is prime”. But, if I knew I had made an error in the proof somewhere, it would be a better idea not to take the bet, since less than half of numbers near 50 are prime.
Right, I didn’t quite work all the math out precisely, but at least the conclusion was correct. This model is, as you say, exclusively for fatal logic errors; the sorts where the law of non-contradiction doesn’t hold, or something equally unthinkable, such that everything you thought you knew is invalidated. It does not apply in the case of normal math errors for less obvious conclusions (well, it does, but your expected utility given no errors of this class still has to account for errors of other classes, where you can still make other predictions).