Oops! If we allow unbounded utility, we can get non-convergence in our expectation.
Since we’ve already established that the utility function is not up for grabs, let’s try and modify the probability to fix this!
My response to this is that the probability distribution is even less up for grabs. The utility, at least, is explicitly there to reflect our preferences. If we see that a utility function is causing our agent to take the wrong actions, then it makes sense to change it to better reflect the actions we wish our agent to take.
The probability distribution, on the other hand, is a map that should reflect the territory as well as possible! It should not be modified on account of badly-behaved utility computations.
This may be taken as an argument in favor of modifying the utility function; Sniffnoy makes a case for bounded utility in another comment.
It could alternatively be taken as a case for modifying the decision procedure. Perhaps neither the probability nor the utility are “up for grabs”, but how we use them should be modified.
One (somewhat crazy) option is to take the median expectation rather than the mean expectation: we judge actions by computing the lowest utility score that we have 50% chance of making or beating, rather than by computing the average. This makes the computation insensitive to extreme (high or low) outcomes with small probabilities. Unfortunately, it also makes the computation insensitive to extreme (high or low) options with 49% probabilities: it would prefer a gamble with a 49% probability of utility −3^^^3 and 51% probability of utility +1, to a gamble with 51% probability of utility 0, and 49% probability of +3^^^3.
But perhaps there are more well-motivated alternatives.
If we see that a utility function is causing our agent to take the wrong actions, then it makes sense to change it to better reflect the actions we wish our agent to take.
If the agent defines its utility indirectly in terms of designer’s preference, a disagreement in evaluation of a decision by agent’s utility function and designer’s preference doesn’t easily indicate that designer’s evaluation is more accurate, and if it’s not, then the designer should defer to the agent’s judgment instead of adjusting its utility.
The probability distribution, on the other hand, is a map that should reflect the territory as well as possible! It should not be modified on account of badly-behaved utility computations.
Similarly, if the agent is good at building its map, it might have a better map than the designer, so a disagreement is not easily resolved in favor of the designer. On the other hand, there can be a bug in agent’s world modeling code in which case it should be fixed! And similarly, if there is a bug in agent’s indirect utility definition, it too should be fixed. The arguments seem analogous to me, so why would preference be more easily debugged than world model?
My response to this is that the probability distribution is even less up for grabs.
Really? In practice I have a great deal of uncertainty about both my utility function and my probability estimates. Accurate probability estimates require the ability to accurately model the world, and this seems incredibly hard in general. It’s not at all clear to me that instrumental rationality means trusting your current probability estimates if you have reason to believe that future evidence will drastically change them or that they’re corrupted for some other reason (even an otherwise flawlessly designed AI has to worry about cosmic rays flipping the bits in its memory or, Omega forbid, its source code).
I am definitely not saying “trust your current probability estimates”.
What I’m saying is that probability should reflect reality as closely as possible, whereas utility should reflect preferences as closely as possible.
Modifying the preference function in an ad-hoc way to get the right behavior is a bad idea, but modifying our expectation about how reality actually might be is even worse. The probability function should be modified exclusively in response to considerations about how reality might be. The utility function should be modified exclusively in response to considerations about our preferences.
A simplified version of the argument here:
The utility function isn’t up for grabs.
Therefore, we need unbounded utility.
Oops! If we allow unbounded utility, we can get non-convergence in our expectation.
Since we’ve already established that the utility function is not up for grabs, let’s try and modify the probability to fix this!
My response to this is that the probability distribution is even less up for grabs. The utility, at least, is explicitly there to reflect our preferences. If we see that a utility function is causing our agent to take the wrong actions, then it makes sense to change it to better reflect the actions we wish our agent to take.
The probability distribution, on the other hand, is a map that should reflect the territory as well as possible! It should not be modified on account of badly-behaved utility computations.
This may be taken as an argument in favor of modifying the utility function; Sniffnoy makes a case for bounded utility in another comment.
It could alternatively be taken as a case for modifying the decision procedure. Perhaps neither the probability nor the utility are “up for grabs”, but how we use them should be modified.
One (somewhat crazy) option is to take the median expectation rather than the mean expectation: we judge actions by computing the lowest utility score that we have 50% chance of making or beating, rather than by computing the average. This makes the computation insensitive to extreme (high or low) outcomes with small probabilities. Unfortunately, it also makes the computation insensitive to extreme (high or low) options with 49% probabilities: it would prefer a gamble with a 49% probability of utility −3^^^3 and 51% probability of utility +1, to a gamble with 51% probability of utility 0, and 49% probability of +3^^^3.
But perhaps there are more well-motivated alternatives.
If the agent defines its utility indirectly in terms of designer’s preference, a disagreement in evaluation of a decision by agent’s utility function and designer’s preference doesn’t easily indicate that designer’s evaluation is more accurate, and if it’s not, then the designer should defer to the agent’s judgment instead of adjusting its utility.
Similarly, if the agent is good at building its map, it might have a better map than the designer, so a disagreement is not easily resolved in favor of the designer. On the other hand, there can be a bug in agent’s world modeling code in which case it should be fixed! And similarly, if there is a bug in agent’s indirect utility definition, it too should be fixed. The arguments seem analogous to me, so why would preference be more easily debugged than world model?
Really? In practice I have a great deal of uncertainty about both my utility function and my probability estimates. Accurate probability estimates require the ability to accurately model the world, and this seems incredibly hard in general. It’s not at all clear to me that instrumental rationality means trusting your current probability estimates if you have reason to believe that future evidence will drastically change them or that they’re corrupted for some other reason (even an otherwise flawlessly designed AI has to worry about cosmic rays flipping the bits in its memory or, Omega forbid, its source code).
I am definitely not saying “trust your current probability estimates”.
What I’m saying is that probability should reflect reality as closely as possible, whereas utility should reflect preferences as closely as possible.
Modifying the preference function in an ad-hoc way to get the right behavior is a bad idea, but modifying our expectation about how reality actually might be is even worse. The probability function should be modified exclusively in response to considerations about how reality might be. The utility function should be modified exclusively in response to considerations about our preferences.