It seems like evaluating human AU depends on the model. There’s a “black box” sense where you can replace the human’s policy with literally anything in calculating AU for different objectives, and there’s a “transparent box” sense in which you have to choose from a distribution of predicted human behaviors.
The former is closer to what I think you mean by “hasn’t changed the humans’ AU,” but I think it’s the latter that an AI cares about when evaluating the impact of its own actions.
It seems like evaluating human AU depends on the model. There’s a “black box” sense where you can replace the human’s policy with literally anything in calculating AU for different objectives, and there’s a “transparent box” sense in which you have to choose from a distribution of predicted human behaviors.
The former is closer to what I think you mean by “hasn’t changed the humans’ AU,” but I think it’s the latter that an AI cares about when evaluating the impact of its own actions.
I’m discussing a philosophical framework for understanding low impact. I’m not prescribing how the AI actually accomplishes this.