If multiple local agents have a common goal system and share information (keeping in mind Aumann Agreement, and that instrumental values are information like any other and won’t tend to get promoted to terminal values in clean architectures), why can’t you consider the set of them as a single decision-making agent on long enough timescales?
The only problem I can see is that one local agent might choose a policy that would work only if another agent didn’t choose a particular policy (which it ends up doing). However, I can’t imagine that this wouldn’t be noticed and factored in in advanced.
If multiple local agents have a common goal system and share information (keeping in mind Aumann Agreement, and that instrumental values are information like any other and won’t tend to get promoted to terminal values in clean architectures), why can’t you consider the set of them as a single decision-making agent on long enough timescales?
The only problem I can see is that one local agent might choose a policy that would work only if another agent didn’t choose a particular policy (which it ends up doing). However, I can’t imagine that this wouldn’t be noticed and factored in in advanced.