However, I think it is reasonable to at least add a calibration requirement: there should be no way to systematically correct estimates up or down as a function of the expected value.
Why is this important? If the thing with the highest score is always the best action to take, why does it matter if that score is an overestimate? Utility functions are fictional anyway right?
As a very high level, first-pass approximation, I think the right way to think of this is as a sort of unit test; even if we can’t directly see a reason why systematically incorrect estimates would cause problems in an AI design, this is an obvious enough desiderata that we should by default assume a system which breaks it is bad, unless we can prove otherwise.
Closer to the object level—yes, the highest-scoring action is the correct action to take, and if you model miscalibration as a single, monotonic function applied as the last step before deciding, then it can’t change any decisions. But if miscalibration can affect any intermediate steps, then this doesn’t hold. As a simple example: suppose the AI is deciding whether to pay to preserve its access to a category of options which it knows are highly subject to Regressional Goodhart.
As a very high level, first-pass approximation, I think the right way to think of this is as a sort of unit test; even if we can’t directly see a reason why systematically incorrect estimates would cause problems in an AI design, this is an obvious enough desiderata that we should by default assume a system which breaks it is bad, unless we can prove otherwise.
Closer to the object level—yes, the highest-scoring action is the correct action to take, and if you model miscalibration as a single, monotonic function applied as the last step before deciding, then it can’t change any decisions. But if miscalibration can affect any intermediate steps, then this doesn’t hold. As a simple example: suppose the AI is deciding whether to pay to preserve its access to a category of options which it knows are highly subject to Regressional Goodhart.