Ben Pace comments on A Shutdown Problem Proposal

Ben Pace Feb 3, 2024, 6:55 PM
LW: 4 AF: 4
0
AF
You explicitly assume this stuff away, but I believe under this setup that the subagents would be incentivized to murder each other before the button is pressed (to get rid of that annoying veto).
I also note that if one agent becomes way way smarter than the other, that this balance may not work out.
Even if it works, I don’t see how to set up the utility functions such that humans aren’t disempowered. That’s a complicated term!
Overall a very interesting idea.