The AI cures cancer, but it keeps the cure a secret because if it told us the cure, that would create the side effect of us curing a bunch of people.
Yes, if we told it to develop a cure, it might avoid letting us cure people to minimize impact (although I think there are even less benign failure modes that would be more likely to occur).
Regarding the second framing: perhaps a side effect minimizer using a naive counterfactual would do that, yes. The problem with viewing “manipulation” as high-impact is robustly defining manipulation. There’s heavy value connotations with “free will” there.
The way I would put it is that the naive counterfactual plus whitelisting tries to stop other people from doing things that could lead to side effects, enforcing the impact measure on all actors. This is obviously terrible. Assuming agency allows for a solution* like the one I outline here.
Yes, if we told it to develop a cure, it might avoid letting us cure people to minimize impact (although I think there are even less benign failure modes that would be more likely to occur).
Regarding the second framing: perhaps a side effect minimizer using a naive counterfactual would do that, yes. The problem with viewing “manipulation” as high-impact is robustly defining manipulation. There’s heavy value connotations with “free will” there.
The way I would put it is that the naive counterfactual plus whitelisting tries to stop other people from doing things that could lead to side effects, enforcing the impact measure on all actors. This is obviously terrible. Assuming agency allows for a solution* like the one I outline here.