My comment began as a discourse of why practical agents are not really utility argmaxers (due to the optimizer’s curse).
You do not need to model human irrationality and it is generally a mistake to do so.
Consider a child who doesn’t understand that the fence is to prevent them from falling off stairs. It would be a mistake to optimize for the child’s empowerment using their limited irrational world model. It is correct to use the AI’s more powerful world model for computing empowerment, which results in putting up the fence (or equivalent) in situations where the AI models that as preventing the child from death or disability.
My comment began as a discourse of why practical agents are not really utility argmaxers (due to the optimizer’s curse).
You do not need to model human irrationality and it is generally a mistake to do so.
Consider a child who doesn’t understand that the fence is to prevent them from falling off stairs. It would be a mistake to optimize for the child’s empowerment using their limited irrational world model. It is correct to use the AI’s more powerful world model for computing empowerment, which results in putting up the fence (or equivalent) in situations where the AI models that as preventing the child from death or disability.
Likewise for the other scenarios.