I liked how Rhy’s definition of manipulation specifically included the requirement of the target getting lower utility.
Therefore something like “manipulation = affecting the human’s brain in a way that will reduce their expected utility” does not classify all communication as manipulation.
While the SVP may help in this two human scenario, it many not work in multi-human scenarios. Is it the case that the more humans there are, because I am learning more about human preferences, I can more afford to manipulate some proportion of the humans? i.e. to be sure enough of the preferences of the human that an AI agent is aligned to, they need to observe X un-manipulated humans. But beyond X there is the incentive to poison the additional humans. Of course, the SVP still helps compared to the (original) incentive to manipulate all other humans, but it may not go far enough in a multi-human scenario.