adamShimi comments on Tradeoff between desirable properties for baseline choices in impact measures

adamShimi 8 Jul 2020 17:22 UTC
LW: 1 AF: 1
AF
When you say “shutdown avoidance incentives”, do you mean that the agent/system will actively try to avoid its own shutdown? I’m not sure why comparing with the current state would cause such a problem: the state with the least impact seems like the one where the agent let itself be shutdown, or it would go against the will of another agent. That’s how I understand it, but I’m very interested in knowing where I’m going wrong.
- TurnTrout 8 Jul 2020 20:17 UTC
  LW: 2 AF: 1
  AF Parent
  The baseline is “I’m not shut off now, and i can avoid shutdown”, so anything like “I let myself be shutdown” would be heavily penalized (big optimal value difference).