Assume you’re playing as agent A and assume you don’t have a parent agent. You’re trying to coordinate with agent B. You want to not be exploitable, even if agent B has a patent that picked B’s source code adversarially. Consider this a very local/isolated puzzle (this puzzle is not about trying to actually coordinate with all possible parents instead).
Oh then no, that’s obviously not possible. The parent can choose agent B to be a rock with “green” painted on it. The only way to coordinate with a rock is to read what’s painted on it.
Agent B wants to coordinate with you instead of being a rock; the question isn’t “can you always coordinate”, it’s “is there any coordination mechanism robust to adversarially designed counterparties”.
Trivially, you can coordinate with agents with identical architecture, that are different only in the utility functions, by picking the first bit of a hash of the question you want to coordinate on.
Oh, then I’m still confused. Agent B can want to coordinate with A but still be effectively a rock because they are guaranteed to pick the designer’s preferred option no matter what they see. Since agent A can analyze B’s source code arbitrarily powerfully they can determine this, and realize that the only option (if they want to coordinate) is to go along with that.
A’s algorithm can include “if my opponent is a rock, defect” but then we have different scenarios based on whether B’s designer gets to see A’s source code before designing B.
Assume you’re playing as agent A and assume you don’t have a parent agent. You’re trying to coordinate with agent B. You want to not be exploitable, even if agent B has a patent that picked B’s source code adversarially. Consider this a very local/isolated puzzle (this puzzle is not about trying to actually coordinate with all possible parents instead).
Oh then no, that’s obviously not possible. The parent can choose agent B to be a rock with “green” painted on it. The only way to coordinate with a rock is to read what’s painted on it.
Agent B wants to coordinate with you instead of being a rock; the question isn’t “can you always coordinate”, it’s “is there any coordination mechanism robust to adversarially designed counterparties”.
Trivially, you can coordinate with agents with identical architecture, that are different only in the utility functions, by picking the first bit of a hash of the question you want to coordinate on.
Oh, then I’m still confused. Agent B can want to coordinate with A but still be effectively a rock because they are guaranteed to pick the designer’s preferred option no matter what they see. Since agent A can analyze B’s source code arbitrarily powerfully they can determine this, and realize that the only option (if they want to coordinate) is to go along with that.
A’s algorithm can include “if my opponent is a rock, defect” but then we have different scenarios based on whether B’s designer gets to see A’s source code before designing B.