I’ve seen not-especially-theoretical discussion about this problem for human organizations, though mostly from the point of view of lower status people complaining that they’re being given impossible, incomprehensible, and/or destructive commands.
Remember that making humans in organizations cooperate is a rather different from making the many parts of an AI cooperate with the other parts, because people in organizations can’t (feasibly) be reprogrammed to have their values be aligned, but AI HLAs can.
What I’m concluding from this is that if Friendliness is to work, it has to pervade the hierarchy of agents.
True. The real issue is that if you give a lower-level action the same goals and utility functions as the higher ones, you’ve lost all benefit of having HLAs!.
Remember that making humans in organizations cooperate is a rather different from making the many parts of an AI cooperate with the other parts, because people in organizations can’t (feasibly) be reprogrammed to have their values be aligned, but AI HLAs can.
True. The real issue is that if you give a lower-level action the same goals and utility functions as the higher ones, you’ve lost all benefit of having HLAs!.