Eliezer proposes assigning the AI a utility function of:...
This is a bit misleading; in the article he describes it as “one seemingly obvious patch” and then in the next paragraph says “This patch would not actually work”.
True, but note that he elaborates and comes up with a patch to the patch (that being have W refer to a class of events that would be expected to happen in the Universe’s expected lifespan rather than one that won’t.) So he still seems to support the basic idea, although he probably intended just to get the ball rolling with the concept rather than conclusively solve the problem.
Mainly for brevity, but also because it seems to involve quite a drastic change in how the reward function/model as a whole functions. So it doesn’t seem particularly likely that it’ll be implemented.
This is a bit misleading; in the article he describes it as “one seemingly obvious patch” and then in the next paragraph says “This patch would not actually work”.
True, but note that he elaborates and comes up with a patch to the patch (that being have W refer to a class of events that would be expected to happen in the Universe’s expected lifespan rather than one that won’t.) So he still seems to support the basic idea, although he probably intended just to get the ball rolling with the concept rather than conclusively solve the problem.
Oops, forgot about that. You’re right, he didn’t rule that out.
Is there a reason you don’t list his “A deeper solution” here? (Or did I miss it?) Because it trades off against capabilities? Or something else?
Mainly for brevity, but also because it seems to involve quite a drastic change in how the reward function/model as a whole functions. So it doesn’t seem particularly likely that it’ll be implemented.