Love the post. One relevant possibility that I think would be worthy of consideration w.r.t. the discussion about human paperclippers/inter-human compatibility:
Humans may not be as much misaligned as our highlty idiosyncratic value theories might suggest, if a typical individual’s value theory is really mainly what she uses to justify/explain/rationalize her today’s local intuitions & actions, yet without really driving the actions as much as we think. A fooming human might then be more likely to simply update her theories, to again become compatible what underlying more basic intuitions dictate to her.
So the basic instincts that make us today locally have often reasonably compatible practical aims, might then still keep us more compatible than our individual exicit and more abstract value theories, i.e. rationalizations, would seem to suggest.
I think there are some observatiins that might suggest sth in that direction. Give the humans a new technology , and initially some will call it the devil’s tool to be abstained from—but ultimately we all converge to using it, updating our theories, beliefs.
Survey persons on whether it’s okay to actively put in acute danger one life for the sake of saving 10, and you have people stronlgy diverge on the topic, based on their abstract value theories. Put them in the corrsponding leadership position where that moral question becomes an regular real choice that has to be made, and you might observe them act much more homogenously according to the more fundamental ad pragmatic instincts.
(I think Jonathan Haidt’s Righteous mind wluld also support some of this)
In this case, the Yudkowskian AI alignment challenge may keep a bit more of its specialness in comparison to the human paperclipper challenge.
Love the post. One relevant possibility that I think would be worthy of consideration w.r.t. the discussion about human paperclippers/inter-human compatibility:
Humans may not be as much misaligned as our highlty idiosyncratic value theories might suggest, if a typical individual’s value theory is really mainly what she uses to justify/explain/rationalize her today’s local intuitions & actions, yet without really driving the actions as much as we think. A fooming human might then be more likely to simply update her theories, to again become compatible what underlying more basic intuitions dictate to her. So the basic instincts that make us today locally have often reasonably compatible practical aims, might then still keep us more compatible than our individual exicit and more abstract value theories, i.e. rationalizations, would seem to suggest.
I think there are some observatiins that might suggest sth in that direction. Give the humans a new technology , and initially some will call it the devil’s tool to be abstained from—but ultimately we all converge to using it, updating our theories, beliefs.
Survey persons on whether it’s okay to actively put in acute danger one life for the sake of saving 10, and you have people stronlgy diverge on the topic, based on their abstract value theories. Put them in the corrsponding leadership position where that moral question becomes an regular real choice that has to be made, and you might observe them act much more homogenously according to the more fundamental ad pragmatic instincts.
(I think Jonathan Haidt’s Righteous mind wluld also support some of this)
In this case, the Yudkowskian AI alignment challenge may keep a bit more of its specialness in comparison to the human paperclipper challenge.