I think manipulation is one of those things that makes sense in human terms, but not in objective terms (similarly to what I think of low-impact, corrigibility, etc...).
Therefore I’m using manipulation to mean “looks like manipulative behaviour to humans”; I don’t think we can do much better than that.
How do you define manipulation?
I think manipulation is one of those things that makes sense in human terms, but not in objective terms (similarly to what I think of low-impact, corrigibility, etc...).
Therefore I’m using manipulation to mean “looks like manipulative behaviour to humans”; I don’t think we can do much better than that.