I don’t know why this is downvoted so much without an explanation. The problem from the interaction with sub-agents is real even if already known, but G0W51 may not know that.
I suspect because the Big Wall Of Text writing style is offputting. Perhaps, more specifically, because it’s annoying to slog one’s way through a Big Wall Of Text without an exciting new insight or something as payoff.
I’ve seen not-especially-theoretical discussion about this problem for human organizations, though mostly from the point of view of lower status people complaining that they’re being given impossible, incomprehensible, and/or destructive commands.
This points at another serious problem—you need communication (even if mostly not forceful communication) to flow up the hierarchy as well as down the hierarchy.
You’re pointing at a failure mode (I have a JOB! I wanna do my JOB!) which is quite real, but there are equal and opposite failure modes. For example, obsessive compulsive disorder would be an example of lower level functions insisting on taking charge. Eating disorders are (at least in some cases) examples of higher level functions overriding competent lower level functions.
What I’m concluding from this is that if Friendliness is to work, it has to pervade the hierarchy of agents.
I’ve seen not-especially-theoretical discussion about this problem for human organizations, though mostly from the point of view of lower status people complaining that they’re being given impossible, incomprehensible, and/or destructive commands.
Remember that making humans in organizations cooperate is a rather different from making the many parts of an AI cooperate with the other parts, because people in organizations can’t (feasibly) be reprogrammed to have their values be aligned, but AI HLAs can.
What I’m concluding from this is that if Friendliness is to work, it has to pervade the hierarchy of agents.
True. The real issue is that if you give a lower-level action the same goals and utility functions as the higher ones, you’ve lost all benefit of having HLAs!.
I don’t know why this is downvoted so much without an explanation. The problem from the interaction with sub-agents is real even if already known, but G0W51 may not know that.
I suspect because the Big Wall Of Text writing style is offputting. Perhaps, more specifically, because it’s annoying to slog one’s way through a Big Wall Of Text without an exciting new insight or something as payoff.
(I didn’t downvote it. I’m just speculating.)
Do you happen to know of any good places to read about this issue? I have yet to look into what others have thought of it.
I’ve seen not-especially-theoretical discussion about this problem for human organizations, though mostly from the point of view of lower status people complaining that they’re being given impossible, incomprehensible, and/or destructive commands.
This points at another serious problem—you need communication (even if mostly not forceful communication) to flow up the hierarchy as well as down the hierarchy.
You’re pointing at a failure mode (I have a JOB! I wanna do my JOB!) which is quite real, but there are equal and opposite failure modes. For example, obsessive compulsive disorder would be an example of lower level functions insisting on taking charge. Eating disorders are (at least in some cases) examples of higher level functions overriding competent lower level functions.
What I’m concluding from this is that if Friendliness is to work, it has to pervade the hierarchy of agents.
Remember that making humans in organizations cooperate is a rather different from making the many parts of an AI cooperate with the other parts, because people in organizations can’t (feasibly) be reprogrammed to have their values be aligned, but AI HLAs can.
True. The real issue is that if you give a lower-level action the same goals and utility functions as the higher ones, you’ve lost all benefit of having HLAs!.