Then it seems to me that judging the agent’s purity of intentions is also a deep problem. At least for humans it is. For example, a revolutionary may only want to overthrow the unjust hierarchy, but then succeed and end up in power. So they didn’t consciously try to gain power, but maybe evolution gave them some behaviors that happen to gain power, without the agent explicitly encoding “the desire for power” at any level.
Then it seems to me that judging the agent’s purity of intentions is also a deep problem. At least for humans it is. For example, a revolutionary may only want to overthrow the unjust hierarchy, but then succeed and end up in power. So they didn’t consciously try to gain power, but maybe evolution gave them some behaviors that happen to gain power, without the agent explicitly encoding “the desire for power” at any level.
I think this is not so big of a problem, if we have the assumed level of mechinterp.