While the SVP may help in this two human scenario, it many not work in multi-human scenarios. Is it the case that the more humans there are, because I am learning more about human preferences, I can more afford to manipulate some proportion of the humans? i.e. to be sure enough of the preferences of the human that an AI agent is aligned to, they need to observe X un-manipulated humans. But beyond X there is the incentive to poison the additional humans. Of course, the SVP still helps compared to the (original) incentive to manipulate all other humans, but it may not go far enough in a multi-human scenario.
That’s true (and I hadn’t considered it!) -- also there’s a social dilemma type situation in the case with many potential manipulators, since if any one manipulates then noone can get value from observing the target’s actions.
While the SVP may help in this two human scenario, it many not work in multi-human scenarios. Is it the case that the more humans there are, because I am learning more about human preferences, I can more afford to manipulate some proportion of the humans? i.e. to be sure enough of the preferences of the human that an AI agent is aligned to, they need to observe X un-manipulated humans. But beyond X there is the incentive to poison the additional humans. Of course, the SVP still helps compared to the (original) incentive to manipulate all other humans, but it may not go far enough in a multi-human scenario.
That’s true (and I hadn’t considered it!) -- also there’s a social dilemma type situation in the case with many potential manipulators, since if any one manipulates then noone can get value from observing the target’s actions.