By looking for manipulation on the basis of counterfactuals, you’re at the mercy of your ability to find such counterfactuals, and that ability can also be manipulated such that you can’t notice either the object level counterfactuals that would make you suspect manipulation of the counterfactuals about your counterfactual reasoning that would make you suspect manipulation. This seems insufficiently robust way to detect manipulation, or even define it since the mechanism of detecting it can itself be manipulated to not notice what would have otherwise been considered manipulation.
Perhaps my point is to generally express doubt that we can cleanly detect manipulation outside the context of the human behavioral norms, and I suspect the cognitive machinery that implements norms is malleable enough that it can be manipulated to not notice what it would have previously thought was manipulation, nor is it clear this is always bad, since in some cases we might be mistaken in some sense about what is really manipulative, although this belies the point that it’s not clear what it means to be mistaken about normative claims.
OK, but there’s a difference between “here’s a definition of manipulation that’s so waterproof you couldn’t break it if you optimized against it with arbitrarily large optimization power” and “here’s my current best way of thinking about manipulation.” I was presenting the latter, because it helps me be less confused than if I just stuck to my previous gut-level, intuitive understanding of manipulation.
Edit: Put otherwise, I was replying more to your point (1) than your point (2) in the original comment. Sorry for the ambiguity!
Hmm, I see some problems here.
By looking for manipulation on the basis of counterfactuals, you’re at the mercy of your ability to find such counterfactuals, and that ability can also be manipulated such that you can’t notice either the object level counterfactuals that would make you suspect manipulation of the counterfactuals about your counterfactual reasoning that would make you suspect manipulation. This seems insufficiently robust way to detect manipulation, or even define it since the mechanism of detecting it can itself be manipulated to not notice what would have otherwise been considered manipulation.
Perhaps my point is to generally express doubt that we can cleanly detect manipulation outside the context of the human behavioral norms, and I suspect the cognitive machinery that implements norms is malleable enough that it can be manipulated to not notice what it would have previously thought was manipulation, nor is it clear this is always bad, since in some cases we might be mistaken in some sense about what is really manipulative, although this belies the point that it’s not clear what it means to be mistaken about normative claims.
OK, but there’s a difference between “here’s a definition of manipulation that’s so waterproof you couldn’t break it if you optimized against it with arbitrarily large optimization power” and “here’s my current best way of thinking about manipulation.” I was presenting the latter, because it helps me be less confused than if I just stuck to my previous gut-level, intuitive understanding of manipulation.
Edit: Put otherwise, I was replying more to your point (1) than your point (2) in the original comment. Sorry for the ambiguity!