Gurkenglas comments on The “Measuring Stick of Utility” Problem

Gurkenglas 25 May 2022 16:34 UTC
2 points
Suppose an agent is thinking about whether to cooperate in a Prisoner’s Dilemma. In the counterfactual where it cooperates, it might naturally deduce that other agents like it would also cooperate. Therefore we could hand it a world with weird physics, and see whether in the counterfactual where it cooperates, it can deduce more about that world. Then it has presumably found agents like itself in that world.
- johnswentworth 25 May 2022 17:06 UTC
  2 points
  Parent
  Prisoner’s Dilemma? Counterfactual? Agent? Cooperation? We’re talking about starting from low-level physics, there isn’t even a built-in a place to introduce these very-high-level concepts!
  - Gurkenglas 25 May 2022 17:20 UTC
    4 points
    Parent
    The agent I’m talking about is separate from your physics-based world. It’s from toy setups like Robust Cooperation in the Prisoner’s Dilemma. If it can reason about statements like “If my algorithm returns that I cooperate, then I get 3 utility.”, then there may be p for which it can prove “If my algorithm returns that I cooperate, then this strange hypothetical-physics-based world has property p.” but not “This strange hypothetical-physics-based world has property p.”. This would indicate that that strange world contains agents about which that premise is useful, so we can use modal combatants as agent detectors.