You’re right, I think the absolute value might actually be a problem—you want the policy to help/hurt all values relative to no-op equally, not hurt some and help others. I just edited the post to reflect that.
As for the connection between neutrality and objective impact, I think this is related to a confusion that Wei Dai pointed out, which is that I was sort of waffling between two different notions of strategy-stealing, those being:
strategy-stealing relative to all the agents present in the world (i.e. is it possible for your AI to steal the strategies of other agents in the world) and
strategy-stealing relative to a single AI (i.e. if that AI were copied many times and put in service of many different values, would it advantage some over others).
If you believe that most early AGIs will be quite similar in their alignment properties (as I generally do, since I believe that copy-and-paste is quite powerful and will generally be preferred over designing something new), then these two notions of strategy-stealing match up, which was why I was waffling between them. However, conceptually they are quite distinct.
In terms of the connection between neutrality and objective impact, I think there I was thinking about strategy-stealing in terms of notion 1, whereas for most of the rest of the post I was thinking about it in terms of notion 2. In terms of notion 1, objective impact is about changing the distribution of resources among all the agents in the world.
You’re right, I think the absolute value might actually be a problem—you want the policy to help/hurt all values relative to no-op equally, not hurt some and help others. I just edited the post to reflect that.
As for the connection between neutrality and objective impact, I think this is related to a confusion that Wei Dai pointed out, which is that I was sort of waffling between two different notions of strategy-stealing, those being:
strategy-stealing relative to all the agents present in the world (i.e. is it possible for your AI to steal the strategies of other agents in the world) and
strategy-stealing relative to a single AI (i.e. if that AI were copied many times and put in service of many different values, would it advantage some over others).
If you believe that most early AGIs will be quite similar in their alignment properties (as I generally do, since I believe that copy-and-paste is quite powerful and will generally be preferred over designing something new), then these two notions of strategy-stealing match up, which was why I was waffling between them. However, conceptually they are quite distinct.
In terms of the connection between neutrality and objective impact, I think there I was thinking about strategy-stealing in terms of notion 1, whereas for most of the rest of the post I was thinking about it in terms of notion 2. In terms of notion 1, objective impact is about changing the distribution of resources among all the agents in the world.