Leon Lang comments on Experiment Idea: RL Agents Evading Learned Shutdownability

Leon Lang 18 Jan 2023 1:31 UTC
3 points
0
Okay, that’s fair. I agree, if we could show that the experiments remain stable even when longer strings of reasoning are required, then the experiments seem more convincing. There might be the added benefit that one can then vary the setting in more ways to demonstrate that the reasoning caused the agent to act in a particular way, instead of the actions just being some kind of coincidence.