Without necessarily contradicting anything you’ve said, what is the specific property you hope to gain by abandoning agents with utility functions, and how does it help us build an agent that will prevent Sam Altman from running something with a wrapper mind? Is it corrigibility, chill, or something different? You’ve mentioned multiple times that humans change their values (which I think is happening in fewer cases than you suggest, but that’s besides the point). What kind of process do you hope a superintelligence could have for changing values that would make it more safe?
Without necessarily contradicting anything you’ve said, what is the specific property you hope to gain by abandoning agents with utility functions, and how does it help us build an agent that will prevent Sam Altman from running something with a wrapper mind? Is it corrigibility, chill, or something different? You’ve mentioned multiple times that humans change their values (which I think is happening in fewer cases than you suggest, but that’s besides the point). What kind of process do you hope a superintelligence could have for changing values that would make it more safe?