The most obvious explanation for this is that utility is not a linear function of response time: the algorithm taking 20 s is very, very bad, and losing 25 ms on average is worthwhile to ensure that this never happens. Consider that if the algorithm is just doing something immediately profitable with no interactions with anything else (e.g. producing some crytptocurrency), the first algorithm is clearly better (assuming you are just trying to maximize expected profit), since on the rare occasions when it takes 20 s, you just have to wait almost 200 times as long for your unit of profit. This suggests that the only reason the second algorithm is typically preferred is that most programs do have to interact with other things, and an extremely long response time will break everything. I don’t think any more convoluted decision theoretic reasoning is necessary to justify this.
True, but even in cases where it won’t break everything, this is still valued. Consistency is a virtue even if inconsistency won’t break anything. And it clearly breaks down in the extreme case where it becomes Caul, but I can’t come up with a compelling reason why it should break down.
My best guess: The factor that is being valued here is the variance. Low variance increases utility generally, because predictability is valuable in enabling better expected utility calculations for other connected decisions. There is no hard limit on how much this can matter relative to the average case, but as the discrepancy between the average cases diverge so that the low-variance version becomes worse than a greater and greater fraction of the high-variance cases, it it remains technically rational but its implicit prior approaches an insane prior such as that of Caul or Perry.
I think this would imply that for an unbounded perfect Bayesian, there is no value to low variance outside of nonlinear utility dependence, but that for bounded reasoners, there is some cutoff where making concessions to predictability despite loss of average-case utility is useful on balance.
The most obvious explanation for this is that utility is not a linear function of response time: the algorithm taking 20 s is very, very bad, and losing 25 ms on average is worthwhile to ensure that this never happens. Consider that if the algorithm is just doing something immediately profitable with no interactions with anything else (e.g. producing some crytptocurrency), the first algorithm is clearly better (assuming you are just trying to maximize expected profit), since on the rare occasions when it takes 20 s, you just have to wait almost 200 times as long for your unit of profit. This suggests that the only reason the second algorithm is typically preferred is that most programs do have to interact with other things, and an extremely long response time will break everything. I don’t think any more convoluted decision theoretic reasoning is necessary to justify this.
True, but even in cases where it won’t break everything, this is still valued. Consistency is a virtue even if inconsistency won’t break anything. And it clearly breaks down in the extreme case where it becomes Caul, but I can’t come up with a compelling reason why it should break down.
My best guess: The factor that is being valued here is the variance. Low variance increases utility generally, because predictability is valuable in enabling better expected utility calculations for other connected decisions. There is no hard limit on how much this can matter relative to the average case, but as the discrepancy between the average cases diverge so that the low-variance version becomes worse than a greater and greater fraction of the high-variance cases, it it remains technically rational but its implicit prior approaches an insane prior such as that of Caul or Perry.
I think this would imply that for an unbounded perfect Bayesian, there is no value to low variance outside of nonlinear utility dependence, but that for bounded reasoners, there is some cutoff where making concessions to predictability despite loss of average-case utility is useful on balance.