Suppose an agent is thinking about whether to cooperate in a Prisoner’s Dilemma. In the counterfactual where it cooperates, it might naturally deduce that other agents like it would also cooperate. Therefore we could hand it a world with weird physics, and see whether in the counterfactual where it cooperates, it can deduce more about that world. Then it has presumably found agents like itself in that world.
Prisoner’s Dilemma? Counterfactual? Agent? Cooperation? We’re talking about starting from low-level physics, there isn’t even a built-in a place to introduce these very-high-level concepts!
The agent I’m talking about is separate from your physics-based world. It’s from toy setups like Robust Cooperation in the Prisoner’s Dilemma. If it can reason about statements like “If my algorithm returns that I cooperate, then I get 3 utility.”, then there may be p for which it can prove “If my algorithm returns that I cooperate, then this strange hypothetical-physics-based world has property p.” but not “This strange hypothetical-physics-based world has property p.”. This would indicate that that strange world contains agents about which that premise is useful, so we can use modal combatants as agent detectors.
Suppose an agent is thinking about whether to cooperate in a Prisoner’s Dilemma. In the counterfactual where it cooperates, it might naturally deduce that other agents like it would also cooperate. Therefore we could hand it a world with weird physics, and see whether in the counterfactual where it cooperates, it can deduce more about that world. Then it has presumably found agents like itself in that world.
Prisoner’s Dilemma? Counterfactual? Agent? Cooperation? We’re talking about starting from low-level physics, there isn’t even a built-in a place to introduce these very-high-level concepts!
The agent I’m talking about is separate from your physics-based world. It’s from toy setups like Robust Cooperation in the Prisoner’s Dilemma. If it can reason about statements like “If my algorithm returns that I cooperate, then I get 3 utility.”, then there may be p for which it can prove “If my algorithm returns that I cooperate, then this strange hypothetical-physics-based world has property p.” but not “This strange hypothetical-physics-based world has property p.”. This would indicate that that strange world contains agents about which that premise is useful, so we can use modal combatants as agent detectors.