EJT comments on Towards shutdownable agents via stochastic choice

EJT 19 Nov 2024 10:57 UTC
1 point
0
If the environment is deterministic, the agent is choosing between trajectories. In those environments, we train agents using DREST to satisfy POST:
- The agent chooses stochastically between different available trajectory-lengths.
- Given the choice of a particular trajectory-length, the agent maximizes paperclips made in that trajectory-length.
If the environment is stochastic (as—e.g. - deployment environments will be), the agent is choosing between lotteries, and we expect agents to be neutral: to not pay costs to shift probability mass between different trajectory-lengths. So they won’t perform either of the shutdown-related actions if doing so comes at any cost with respect to lotteries conditional on each trajectory-length. Which of the object-level actions the agent performs will depend on the quantities of paperclips available.