Stuart_Armstrong comments on Heroin model: AI “manipulates” “unmanipulatable” reward

Stuart_Armstrong 26 Sep 2016 9:40 UTC
0 points
AF
Ok, I think we need to distinguish several things:

#. In general, $U$ vs $V$ or $U - 1000$ vs $V$ is a problem when comparing utility functions; there should be some sort of normalisation process before any utility functions are compared.

#. Within a compound utility function, the AI is exactly choosing the branch where the utility is easiest to satisfy.

#. Is there some normalisation procedure that would also normalise between branches of compound utility functions? If we pick a normalisation for comparing distinct utilities, it might also allow normalisation between branches of compound utilities.
- jessicata 27 Sep 2016 19:00 UTC
  0 points
  AF Parent
  1. Note that IRL is invariant to translating a possible utility function by a constant. So this kind of normalization doesn’t have to be baked into the algorithm.
  2. This is true.
  3. The most natural normalization procedure is to look at how the human is trying or not trying to affect the event X (as I said in the second part of my comment). If the human never tries to affect X either way, then the AI will normalize the utility functions so that the AI has no incentive to affect X either.