Ofer comments on [missing post]

Ofer 27 Apr 2022 15:54 UTC
1 point
Agents that don’t care about influencing our world don’t care about influencing the future weights of the network.
- Not Relevant 27 Apr 2022 17:01 UTC
  1 point
  Parent
  I see, so you’re comparing a purely myopic vs. a long-term optimizing agent; in that case I probably agree. But if the myopic agent cares even about later parts of the episode, and gradients are updated in between, this fails, right?
  - Ofer 27 Apr 2022 18:09 UTC
    1 point
    Parent
    I wouldn’t use the myopic vs. long-term framing here. Suppose a model is trained to play chess via RL, and there are no inner alignment problems. The trained model corresponds to a non-myopic agent (a chess game can last for many time steps). But the environment that the agent “cares” about is an abstract environment that corresponds to a simple chess game. (It’s an environment with less than $13^{64}$ states). The agent doesn’t care about our world. Even if some potential activation values in the network correspond to hacking the computer that runs the model and preventing the computer from being turned off etc., the agent is not interested in doing that. The computer that runs the agent is not part of the agent’s environment.