I’m still very confused about the scenario. Agent A and B and their respective environments may have been designed as a proxy by adversarial agents C and D respectively? Both C and D care about coordinating with each other by more than they care about having the sky colour match their preference? A can simulate B + environment, but can’t simulate D (and vice versa)? Presumably this means that D can no longer affect B or B’s environment, otherwise A wouldn’t be able to simulate.
Critical information: Did either C or D know the design of the other’s proxy before designing their own? Did they both know the other’s design and settle on a mutually-agreeable pair of designs?
Assume you’re playing as agent A and assume you don’t have a parent agent. You’re trying to coordinate with agent B. You want to not be exploitable, even if agent B has a patent that picked B’s source code adversarially. Consider this a very local/isolated puzzle (this puzzle is not about trying to actually coordinate with all possible parents instead).
Oh then no, that’s obviously not possible. The parent can choose agent B to be a rock with “green” painted on it. The only way to coordinate with a rock is to read what’s painted on it.
Agent B wants to coordinate with you instead of being a rock; the question isn’t “can you always coordinate”, it’s “is there any coordination mechanism robust to adversarially designed counterparties”.
Trivially, you can coordinate with agents with identical architecture, that are different only in the utility functions, by picking the first bit of a hash of the question you want to coordinate on.
Oh, then I’m still confused. Agent B can want to coordinate with A but still be effectively a rock because they are guaranteed to pick the designer’s preferred option no matter what they see. Since agent A can analyze B’s source code arbitrarily powerfully they can determine this, and realize that the only option (if they want to coordinate) is to go along with that.
A’s algorithm can include “if my opponent is a rock, defect” but then we have different scenarios based on whether B’s designer gets to see A’s source code before designing B.
I’m still very confused about the scenario. Agent A and B and their respective environments may have been designed as a proxy by adversarial agents C and D respectively? Both C and D care about coordinating with each other by more than they care about having the sky colour match their preference? A can simulate B + environment, but can’t simulate D (and vice versa)? Presumably this means that D can no longer affect B or B’s environment, otherwise A wouldn’t be able to simulate.
Critical information: Did either C or D know the design of the other’s proxy before designing their own? Did they both know the other’s design and settle on a mutually-agreeable pair of designs?
Assume you’re playing as agent A and assume you don’t have a parent agent. You’re trying to coordinate with agent B. You want to not be exploitable, even if agent B has a patent that picked B’s source code adversarially. Consider this a very local/isolated puzzle (this puzzle is not about trying to actually coordinate with all possible parents instead).
Oh then no, that’s obviously not possible. The parent can choose agent B to be a rock with “green” painted on it. The only way to coordinate with a rock is to read what’s painted on it.
Agent B wants to coordinate with you instead of being a rock; the question isn’t “can you always coordinate”, it’s “is there any coordination mechanism robust to adversarially designed counterparties”.
Trivially, you can coordinate with agents with identical architecture, that are different only in the utility functions, by picking the first bit of a hash of the question you want to coordinate on.
Oh, then I’m still confused. Agent B can want to coordinate with A but still be effectively a rock because they are guaranteed to pick the designer’s preferred option no matter what they see. Since agent A can analyze B’s source code arbitrarily powerfully they can determine this, and realize that the only option (if they want to coordinate) is to go along with that.
A’s algorithm can include “if my opponent is a rock, defect” but then we have different scenarios based on whether B’s designer gets to see A’s source code before designing B.