Ah, yeah. That’s why I’m not very hopeful about AI alignment. I don’t think anyone’s even defined the problem in a useful way.
Neither humans as a class nor most humans as individuals HAVE preferences that AI is able to fulfill, or even be compatible with as they are conceived today. We MAY have mental frameworks that let our preferences evolve to survive well in an AI-containing world.
The overarching problem is figuring out human preferences so that AI can fulfill them. We’re all on the same page that humans aren’t VNM-consistent.
Ah, yeah. That’s why I’m not very hopeful about AI alignment. I don’t think anyone’s even defined the problem in a useful way.
Neither humans as a class nor most humans as individuals HAVE preferences that AI is able to fulfill, or even be compatible with as they are conceived today. We MAY have mental frameworks that let our preferences evolve to survive well in an AI-containing world.