In general though, if people do not buy “You can’t fetch the coffee if you’re dead” problem as a thought experiment, then I am not sure if any running code based demo can change their mind.
I have been constructing a set of thought experiments, illustrated with grid worlds, that do not just demo the off-switch problem, but that also demo a solution to it. The whole setup intends to clarify what is really going on here, in a way that makes intuitive sense to a non-mathematical audience. Have not published these thought experiments yet in writing, only gave a talk about it. In theory, somebody could convert the grid world pictures in this talk into running code. If you want to learn more please contact me—I can walk you through my talk slide deck.
I think I disagree with Charlie’s hot take because Charlie seems to be assuming that the essence of the solution to “You can’t fetch the coffee if you’re dead” must be too complicated to show in a grid world. In fact, for the class of solutions I prefer, these solutions can be very easily shown in a grid world. Or at least easy in retrospect.
Thank you Koen. The video by Stuart Armstrong linked in the DeepMind paper is pretty close to what I wanted to do :( The DeepMind paper also does similar things.
While I might be able to improve a bit on these examples, I’m thinking that this probably isn’t the best place for me to invest my efforts. Thanks for letting me know about these.
I’m interested in your solutions, I’ll send an email to you privately about it.
Like Charlie said, there is a demonstration in AI Safety Gridworlds. I also cover these dynamics in a more general and game-theoretical sense in my AGI Agent Safety by Iteratively Improving the Utility Function: this paper also has running code behind it, and it formalises the setup as a two-player/two-agent game.
In general though, if people do not buy “You can’t fetch the coffee if you’re dead” problem as a thought experiment, then I am not sure if any running code based demo can change their mind.
I have been constructing a set of thought experiments, illustrated with grid worlds, that do not just demo the off-switch problem, but that also demo a solution to it. The whole setup intends to clarify what is really going on here, in a way that makes intuitive sense to a non-mathematical audience. Have not published these thought experiments yet in writing, only gave a talk about it. In theory, somebody could convert the grid world pictures in this talk into running code. If you want to learn more please contact me—I can walk you through my talk slide deck.
I think I disagree with Charlie’s hot take because Charlie seems to be assuming that the essence of the solution to “You can’t fetch the coffee if you’re dead” must be too complicated to show in a grid world. In fact, for the class of solutions I prefer, these solutions can be very easily shown in a grid world. Or at least easy in retrospect.
Thank you Koen. The video by Stuart Armstrong linked in the DeepMind paper is pretty close to what I wanted to do :( The DeepMind paper also does similar things.
While I might be able to improve a bit on these examples, I’m thinking that this probably isn’t the best place for me to invest my efforts. Thanks for letting me know about these.
I’m interested in your solutions, I’ll send an email to you privately about it.