I think in order to generally model this as disempowering, you need a model of human irrationality, as if you instead model humans as rational utility maximizers, we wouldn’t make major simple avoidable mistakes that we would need protection from.
But modelling human irrationality seems like a difficult and ill-posed problem, which contains most of the difficulty of the alignment problem.
The difficulties this leads to in practice is what to do when writing “empowerment” into the the utility function from your AI; how do you specify that it is human-level rationality that must be empowered, rather than ideal utility maximizers?
My comment began as a discourse of why practical agents are not really utility argmaxers (due to the optimizer’s curse).
You do not need to model human irrationality and it is generally a mistake to do so.
Consider a child who doesn’t understand that the fence is to prevent them from falling off stairs. It would be a mistake to optimize for the child’s empowerment using their limited irrational world model. It is correct to use the AI’s more powerful world model for computing empowerment, which results in putting up the fence (or equivalent) in situations where the AI models that as preventing the child from death or disability.
I think in order to generally model this as disempowering, you need a model of human irrationality, as if you instead model humans as rational utility maximizers, we wouldn’t make major simple avoidable mistakes that we would need protection from.
But modelling human irrationality seems like a difficult and ill-posed problem, which contains most of the difficulty of the alignment problem.
The difficulties this leads to in practice is what to do when writing “empowerment” into the the utility function from your AI; how do you specify that it is human-level rationality that must be empowered, rather than ideal utility maximizers?
My comment began as a discourse of why practical agents are not really utility argmaxers (due to the optimizer’s curse).
You do not need to model human irrationality and it is generally a mistake to do so.
Consider a child who doesn’t understand that the fence is to prevent them from falling off stairs. It would be a mistake to optimize for the child’s empowerment using their limited irrational world model. It is correct to use the AI’s more powerful world model for computing empowerment, which results in putting up the fence (or equivalent) in situations where the AI models that as preventing the child from death or disability.
Likewise for the other scenarios.