paulfchristiano comments on Approval-directed agents

paulfchristiano 14 Dec 2014 7:56 UTC
3 points
For example, if we output a sequence of bits which are fed into an actuator, then I can treat each bit as an action. We could also apply the concept to actions at a higher or lower level of granularity, the idea is to apply it at all levels (and to make it explicit at the lowest level at which it is practical to do so, in the same way we might make goal-directed behavior explicit at the lowest level where doing so is explicit).
- pjeby 14 Dec 2014 19:04 UTC
  5 points
  Parent
  I do not understand how anything you said relates to the weakness of your argument that I’ve pointed out. Namely, that you’ve simply moved the values complexity problem somewhere else. All your reply is doing is handwaving that issue, again.
  
  Human beings can’t endorse actions per se without context and implied goals. And the AI can’t simply iterate over all possible actions randomly to see what works without having some sort of model that constrains what it’s looking for. Based on what I can understand of what you’re proposing, ISTM the AI would just wander around doing semi-random things, and not actually do anything useful for humans, unless Hugh has some goal(s) in mind to constrain the search.
  
  And the AI has to be able to model those goals in order to escape the problem that the AI is now no smarter than Hugh is. Indeed, if you can simulate Hugh, then you might as well just have an em. The “AI” part is irrelevant.
  - paulfchristiano 15 Dec 2014 5:41 UTC
    3 points
    Parent
    I wrote a follow-up partly addressing the issue of actions vs. outcomes. (Or at least, covering one technical isssue I omtitted from the original post for want of space.)
    
    I agree that Hugh must reason about how well different actions satisfy Hugh’s goals, and the AI must reason (or make implicit generalizations about) these judgments. Where am I moving the values complexity problem? The point was to move it into the AI’s predictions about what actions Hugh would approve of.
    
    What part of the argument in particular do you think I am being imprecise about? There are particular failure modes, like “deceiving Hugh” or especially “resisting correction” which I would expect to avoid via this procedure. I see no reason why the system would resist correction, for example. I don’t see how this is due to confusion about outcomes vs. actions.