I think you’ve moved all the complexity into distinguishing the difference between “outcome” and “action”.
IOW, taboo those terms and try to write the same proposal, because right now ISTM that you’re relying on an intuitional appeal to human concepts of the difference, rather than being precise.
Even at this level, you’re leaving out that Hugh doesn’t really approve of actions per se—Hugh endorses actions in situations as contributing to some specific, salient goal or value. If Arthur says, “I want to move my foot over here”, it doesn’t matter how many hours Hugh thinks it over, it’s not going to mean anything in particular...
Even if its the first step in a larger action of “walk over there and release the nanovirus”. ;-)
For example, if we output a sequence of bits which are fed into an actuator, then I can treat each bit as an action. We could also apply the concept to actions at a higher or lower level of granularity, the idea is to apply it at all levels (and to make it explicit at the lowest level at which it is practical to do so, in the same way we might make goal-directed behavior explicit at the lowest level where doing so is explicit).
I do not understand how anything you said relates to the weakness of your argument that I’ve pointed out. Namely, that you’ve simply moved the values complexity problem somewhere else. All your reply is doing is handwaving that issue, again.
Human beings can’t endorse actions per se without context and implied goals. And the AI can’t simply iterate over all possible actions randomly to see what works without having some sort of model that constrains what it’s looking for. Based on what I can understand of what you’re proposing, ISTM the AI would just wander around doing semi-random things, and not actually do anything useful for humans, unless Hugh has some goal(s) in mind to constrain the search.
And the AI has to be able to model those goals in order to escape the problem that the AI is now no smarter than Hugh is. Indeed, if you can simulate Hugh, then you might as well just have an em. The “AI” part is irrelevant.
I wrote a follow-up partly addressing the issue of actions vs. outcomes. (Or at least, covering one technical isssue I omtitted from the original post for want of space.)
I agree that Hugh must reason about how well different actions satisfy Hugh’s goals, and the AI must reason (or make implicit generalizations about) these judgments. Where am I moving the values complexity problem? The point was to move it into the AI’s predictions about what actions Hugh would approve of.
What part of the argument in particular do you think I am being imprecise about? There are particular failure modes, like “deceiving Hugh” or especially “resisting correction” which I would expect to avoid via this procedure. I see no reason why the system would resist correction, for example. I don’t see how this is due to confusion about outcomes vs. actions.
I think you’ve moved all the complexity into distinguishing the difference between “outcome” and “action”.
IOW, taboo those terms and try to write the same proposal, because right now ISTM that you’re relying on an intuitional appeal to human concepts of the difference, rather than being precise.
Even at this level, you’re leaving out that Hugh doesn’t really approve of actions per se—Hugh endorses actions in situations as contributing to some specific, salient goal or value. If Arthur says, “I want to move my foot over here”, it doesn’t matter how many hours Hugh thinks it over, it’s not going to mean anything in particular...
Even if its the first step in a larger action of “walk over there and release the nanovirus”. ;-)
For example, if we output a sequence of bits which are fed into an actuator, then I can treat each bit as an action. We could also apply the concept to actions at a higher or lower level of granularity, the idea is to apply it at all levels (and to make it explicit at the lowest level at which it is practical to do so, in the same way we might make goal-directed behavior explicit at the lowest level where doing so is explicit).
I do not understand how anything you said relates to the weakness of your argument that I’ve pointed out. Namely, that you’ve simply moved the values complexity problem somewhere else. All your reply is doing is handwaving that issue, again.
Human beings can’t endorse actions per se without context and implied goals. And the AI can’t simply iterate over all possible actions randomly to see what works without having some sort of model that constrains what it’s looking for. Based on what I can understand of what you’re proposing, ISTM the AI would just wander around doing semi-random things, and not actually do anything useful for humans, unless Hugh has some goal(s) in mind to constrain the search.
And the AI has to be able to model those goals in order to escape the problem that the AI is now no smarter than Hugh is. Indeed, if you can simulate Hugh, then you might as well just have an em. The “AI” part is irrelevant.
I wrote a follow-up partly addressing the issue of actions vs. outcomes. (Or at least, covering one technical isssue I omtitted from the original post for want of space.)
I agree that Hugh must reason about how well different actions satisfy Hugh’s goals, and the AI must reason (or make implicit generalizations about) these judgments. Where am I moving the values complexity problem? The point was to move it into the AI’s predictions about what actions Hugh would approve of.
What part of the argument in particular do you think I am being imprecise about? There are particular failure modes, like “deceiving Hugh” or especially “resisting correction” which I would expect to avoid via this procedure. I see no reason why the system would resist correction, for example. I don’t see how this is due to confusion about outcomes vs. actions.