I agree that there’s a better, crisper version of this that has those more distinct.
I’m not sure if the end product, for most people, should keep them distinct because by default humans seem to use blurry clusters of concepts to simplify things into something manageable.
But, I think if you’re aiming to be a robust agent, or build a robustly agentic organization, there’s is something valuable about keeping these crisply separate so you can reason about them well. (you’ve previously mentioned that this is analogous to the friendly AI problem and I agree). I think it’s a good project for many people in the rationalsphere to have undertaken to deepen our understanding, even if it turns out not to be practical for the average person.
The “different masters” thing is a special case of the problem of accepting feedback (i.e. learning from approval/disapproval or reward/punishment) from approval functions in conflict with each other or your goals. Multiple humans trying to do the same or compatible things with you aren’t “different masters” in this sense, since the same logical-decision-theoretic perspective (with some noise) is instantiated on both.
But also, there’s all sorts of gathering data from others’ judgment that doesn’t fit the accountability/commitment paradigm.
I agree that there’s a better, crisper version of this that has those more distinct.
I’m not sure if the end product, for most people, should keep them distinct because by default humans seem to use blurry clusters of concepts to simplify things into something manageable.
But, I think if you’re aiming to be a robust agent, or build a robustly agentic organization, there’s is something valuable about keeping these crisply separate so you can reason about them well. (you’ve previously mentioned that this is analogous to the friendly AI problem and I agree). I think it’s a good project for many people in the rationalsphere to have undertaken to deepen our understanding, even if it turns out not to be practical for the average person.
The “different masters” thing is a special case of the problem of accepting feedback (i.e. learning from approval/disapproval or reward/punishment) from approval functions in conflict with each other or your goals. Multiple humans trying to do the same or compatible things with you aren’t “different masters” in this sense, since the same logical-decision-theoretic perspective (with some noise) is instantiated on both.
But also, there’s all sorts of gathering data from others’ judgment that doesn’t fit the accountability/commitment paradigm.