Steven Byrnes comments on How easily can we separate a friendly AI in design space from one which would bring about a hyperexistential catastrophe?

Steven Byrnes Sep 10, 2020, 12:55 AM
3 points
Eliezer proposes assigning the AI a utility function of:...
This is a bit misleading; in the article he describes it as “one seemingly obvious patch” and then in the next paragraph says “This patch would not actually work”.
- Anirandis Sep 10, 2020, 12:59 AM
  3 points
  Parent
  True, but note that he elaborates and comes up with a patch to the patch (that being have W refer to a class of events that would be expected to happen in the Universe’s expected lifespan rather than one that won’t.) So he still seems to support the basic idea, although he probably intended just to get the ball rolling with the concept rather than conclusively solve the problem.
  - Steven Byrnes Sep 10, 2020, 1:50 AM
    3 points
    Parent
    Oops, forgot about that. You’re right, he didn’t rule that out.
    Is there a reason you don’t list his “A deeper solution” here? (Or did I miss it?) Because it trades off against capabilities? Or something else?
    - Anirandis Sep 10, 2020, 1:59 AM
      2 points
      Parent
      Mainly for brevity, but also because it seems to involve quite a drastic change in how the reward function/model as a whole functions. So it doesn’t seem particularly likely that it’ll be implemented.