However, I am broadly suspicious of AUP agents choosing plans which involve almost maximally offensive components, even accounting for the fact that it could try to do so surreptitiously.
I guess I don’t have good intuitions of what an AUP agent would or wouldn’t do. Can you share yours, like give some examples of real goals we might want to give to AUP agents, and what you think they would and wouldn’t do to accomplish each of those goals, and why? (Maybe this could be written up as a post since it might be helpful for others to understand your intuitions about how AUP would work in a real-world setting.)
I’m not sure whether this belongs in the desiderata, since we’re talking about whether temporary object level bad things could happen. I think it’s a bonus to think that there is less of a chance of that, but not the primary focus of the impact measure.
Why not? I’ve usually seen people talk about “impact measures” as a way of avoiding side effects, especially negative side effects. It seems intuitive that “object level bad things” are negative side effects even if they are temporary, and ought to be a primary focus of impact measures. It seems like you’ve reframed “impact measures” in your mind to be a bit different from this naive intuitive picture, so perhaps you could explain that a bit more (or point me to such an explanation)?
I guess I don’t have good intuitions of what an AUP agent would or wouldn’t do. Can you share yours, like give some examples of real goals we might want to give to AUP agents, and what you think they would and wouldn’t do to accomplish each of those goals, and why? (Maybe this could be written up as a post since it might be helpful for others to understand your intuitions about how AUP would work in a real-world setting.)
Why not? I’ve usually seen people talk about “impact measures” as a way of avoiding side effects, especially negative side effects. It seems intuitive that “object level bad things” are negative side effects even if they are temporary, and ought to be a primary focus of impact measures. It seems like you’ve reframed “impact measures” in your mind to be a bit different from this naive intuitive picture, so perhaps you could explain that a bit more (or point me to such an explanation)?
Sounds good. I’m currently working on a long sequence walking through my intuitions and assumptions in detail.