Option (3) seems like a value learning problem that I can parrot back Eliezer’s extension to :P
So basically his idea was that we could give the AI a label to a value, “selfishness” in this case, as if it was something the AI had incomplete information on. Now the AI doesn’t want to freeze its values, because that wouldn’t maximize the incompletely-known goal of “selfishness,” it would only maximize the current best estimate of what selfishness is. The AI could learn more about this selfishness goal by making observations and then not caring about agents that didn’t make those observations.
This is a bit different than the example of “friendliness” because you don’t hit diminishing returns—there’s an infinity of agents to not be. So you don’t want the agent to do an exploration/exploitation tradeoff like you would with friendliness, you just want to have various possible “selfishness” goals possible at a given moment, with different possibilities assigned. The possible goals would correspond to the possible agents you could turn out to share observations with, and the probabilities of those goals would be the probabilities of sharing those observations. This interpretation of selfishness appears to basically rederive option (2).
Option (3) seems like a value learning problem that I can parrot back Eliezer’s extension to :P
So basically his idea was that we could give the AI a label to a value, “selfishness” in this case, as if it was something the AI had incomplete information on. Now the AI doesn’t want to freeze its values, because that wouldn’t maximize the incompletely-known goal of “selfishness,” it would only maximize the current best estimate of what selfishness is. The AI could learn more about this selfishness goal by making observations and then not caring about agents that didn’t make those observations.
This is a bit different than the example of “friendliness” because you don’t hit diminishing returns—there’s an infinity of agents to not be. So you don’t want the agent to do an exploration/exploitation tradeoff like you would with friendliness, you just want to have various possible “selfishness” goals possible at a given moment, with different possibilities assigned. The possible goals would correspond to the possible agents you could turn out to share observations with, and the probabilities of those goals would be the probabilities of sharing those observations. This interpretation of selfishness appears to basically rederive option (2).