Decomposing Negotiating Value Alignment between multiple agents
Let’s say we want two agents to come to agreement on living with each other. This seems pretty complex to specify; they agree to take each other’s values into account (somewhat), not destroy each other (with some level of confidence), etc.
Neither initially has total dominance over the other. (This implies that neither is corrigible to the other)
A good first step for these agents is to share each’s values with the other. While this could be intractably complex—it’s probably the case that values are compact/finite and can be transmitted eventually in some form.
I think this decomposes pretty clearly into ontology transmission and value assignment.
Ontology transmission is communicating one agents ontology of the objects/concepts to another. Then value assignment is communicating the relative or comparative values to different elements in the ontology.
I think there are LOTS of examples of organisms who cooperate or cohabitate without any level of ontology or conscious valuation. Even in humans, a whole lot of the negotiation is not legible. The spoken/written part is mostly signaling and lies, with a small amount of codifying behavioral expectations at a very coarse grain.
A good first step for these agents is to share each’s values with the other
This is a strong assertion that I do not believe is justified.
If you are an agent with this view, then I can take advantage by sending you an altered version of my values such that the altered version’s Nash equilibrium (or plural) are all in my favor compared to the Nash equilibria of the original game.
(You can mitigate this to an extent by requiring that both parties precommit to their values… in which case I predict what your values will be and use this instead, committing to a version of my values altered according to said prediction. Not perfect, but still arguably better.)
(Of course, this has other issues if the other agent is also doing this...)
Decomposing Negotiating Value Alignment between multiple agents
Let’s say we want two agents to come to agreement on living with each other. This seems pretty complex to specify; they agree to take each other’s values into account (somewhat), not destroy each other (with some level of confidence), etc.
Neither initially has total dominance over the other. (This implies that neither is corrigible to the other)
A good first step for these agents is to share each’s values with the other. While this could be intractably complex—it’s probably the case that values are compact/finite and can be transmitted eventually in some form.
I think this decomposes pretty clearly into ontology transmission and value assignment.
Ontology transmission is communicating one agents ontology of the objects/concepts to another. Then value assignment is communicating the relative or comparative values to different elements in the ontology.
I think there are LOTS of examples of organisms who cooperate or cohabitate without any level of ontology or conscious valuation. Even in humans, a whole lot of the negotiation is not legible. The spoken/written part is mostly signaling and lies, with a small amount of codifying behavioral expectations at a very coarse grain.
This is a strong assertion that I do not believe is justified.
If you are an agent with this view, then I can take advantage by sending you an altered version of my values such that the altered version’s Nash equilibrium (or plural) are all in my favor compared to the Nash equilibria of the original game.
(You can mitigate this to an extent by requiring that both parties precommit to their values… in which case I predict what your values will be and use this instead, committing to a version of my values altered according to said prediction. Not perfect, but still arguably better.)
(Of course, this has other issues if the other agent is also doing this...)