Donald Hobson comments on A Shutdown Problem Proposal

Donald Hobson 23 Jan 2024 21:53 UTC
LW: 7 AF: 4
2
AF
Yes. I was assuming a standard conditional for the button.
I can’t currently see any flaws with the CDT style. Other than.
1. Subagents believe in a world where buttons magically press themselves. So this design can’t make coherent statements about the probabilty that the button will be pressed. (one AI believes it’s 1, the other that it’s 0).
2. These AI’s have no incentive to give humans access to the button. To the AI’s, they have a magic button, that might or might not magically press its self. The AI’s have a lot of utility bet on that button. Is that button going to end up in a high security vault, surrounded by sensors and no humans. Both AI’s would like that very much. The AI’s have 0 concern about human’s pressing the button. But the AI’s have lots of concern about humans hiding the button. This design Really wants to know if the button magically presses itself. Humans could cut the wires, could stand between the button and the camera, etc.