Stuart_Armstrong comments on Plausibly, almost every powerful algorithm would be manipulative

Stuart_Armstrong 7 Feb 2020 10:45 UTC
LW: 2 AF: 1
AF

If so, then it seems your argument should conclude that even non-powerful algorithms are likely to be manipulative.

I’d conclude that most algorithms used today have the potential to be manipulative; but they may not be able to find the manipulative behaviour, given their limited capabilities.

Is this such a common practice that we can expect “almost every powerful algorithm” to involve it somehow?

No. That was just one example I constructed, one of the easiest to see. But I can build examples in many different situations. I’ll admit that “thinking longer term” is something that makes manipulation much more likely; genuinely episodic algorithms seem much harder to make manipulative. But we have to be sure the algorithm is episodic, and that there is no outer-loop optimisation going on.
- TurnTrout 7 Feb 2020 13:17 UTC
  LW: 7 AF: 4
  AF Parent
  
  I’d conclude that most algorithms used today have the potential to be manipulative; but they may not be able to find the manipulative behaviour, given their limited capabilities.
  
  I’d suspect that’s right, but I don’t think your title has the appropriate epistemic status. I think people in general should be more careful about for-all quantifiers wrt alignment work. There’s the use of the technical term “almost every”, but you did not prove the set of “powerful” algorithms which is not “manipulative” has measure zero. There’s also “would be” instead of “seems” (I think if you made this change, the title would be fine). I think it’s vitally important we use the correct epistemic markers; if not, this can lead to research predicated on obvious-seeming hunches stated as fact.
  
  Not that I disagree with your suspicion here.
  - Stuart_Armstrong 7 Feb 2020 15:19 UTC
    LW: 2 AF: 1
    AF Parent
    Rephrased the title and the intro to make this clearer.