The resulting agent is supposed to be trying to help H get what its wants, but won’t generally encode most of H’s values directly (it will only encode them indirectly as “what the operator wants”).
I agree that Ajeya’s description in that paragraph is problematic (though I think the descriptions in the body of the post were mostly fine), will probably correct it.
Then I’m not sure I understand how the scheme works. If all questions about values are punted to the single living human at the top, won’t that be a bottleneck for any complex plan?
The resulting agent is supposed to be trying to help H get what its wants, but won’t generally encode most of H’s values directly (it will only encode them indirectly as “what the operator wants”).
I agree that Ajeya’s description in that paragraph is problematic (though I think the descriptions in the body of the post were mostly fine), will probably correct it.
Then I’m not sure I understand how the scheme works. If all questions about values are punted to the single living human at the top, won’t that be a bottleneck for any complex plan?