Matthew Barnett comments on World State is the Wrong Abstraction for Impact

Matthew Barnett 2 Oct 2019 3:51 UTC
5 points
So my feeling is that in order to actually implement an AI that does not cause bad kinds of high impact, we would need to make progress on value learning
Optimizing for a ‘slightly off’ utility function might be catastrophic, and therefore the margin for error for value learning could be narrow. However, it seems plausible that if your impact measurement used slightly incorrect utility functions to define the auxiliary set, this would not cause a similar error. Thus, it seems intuitive to me that you would need less progress on value learning than a full solution for impact measures to work.
From the AUP paper,
one of our key findings is that AUP tends to preserve the ability to optimize the correct reward function even when the correct reward function is not included in the auxiliary set.
- riceissa 2 Oct 2019 21:57 UTC
  2 points
  Parent
  I appreciate this clarification, but when I wrote my comment, I hadn’t read the original AUP post or the paper, since I assumed this sequence was supposed to explain AUP starting from scratch (so I didn’t have the idea of auxiliary set when I wrote my comment).
  - TurnTrout 2 Oct 2019 22:25 UTC
    2 points
    Parent
    It is meant to explain starting from scratch, so no worries! To clarify, although I agree with Matthew’s comment, I’ll later explain how value learning (or progress therein) is unnecessary for the approach I think is most promising.