Jan_Kulveit comments on Most People Start With The Same Few Bad Ideas

Jan_Kulveit 12 Sep 2022 20:36 UTC
LW: 19 AF: 5
13
AF
It is not clear to me to what extent this was part of the “training shoulder advisors” exercise, but to me, possibly the most important part of it is to keep the advisors at distance from your own thinking. In particular, in my impression, it seems likely the alignment research has been on average harmed by too many people “training their shoulder Eliezers” and the shoulder advisors pushing them to think in a crude version of Eliezer’s ontology.
- johnswentworth 12 Sep 2022 23:35 UTC
  LW: 9 AF: 5
  1
  AF Parent
  I chose the “train a shoulder advisor” framing specifically to keep my/Eliezer’s models separate from the participants’ own models. And I do think this worked pretty well—I’ve had multiple conversations with a participant where they say something, I disagree with it, and then they say “yup, that’s what my John model said”—implying that they did in fact disagree with their John model. (That’s not quite direct evidence of maintaining a separate ontology, but it’s adjacent.)