It is not clear to me to what extent this was part of the “training shoulder advisors” exercise, but to me, possibly the most important part of it is to keep the advisors at distance from your own thinking. In particular, in my impression, it seems likely the alignment research has been on average harmed by too many people “training their shoulder Eliezers” and the shoulder advisors pushing them to think in a crude version of Eliezer’s ontology.
I chose the “train a shoulder advisor” framing specifically to keep my/Eliezer’s models separate from the participants’ own models. And I do think this worked pretty well—I’ve had multiple conversations with a participant where they say something, I disagree with it, and then they say “yup, that’s what my John model said”—implying that they did in fact disagree with their John model. (That’s not quite direct evidence of maintaining a separate ontology, but it’s adjacent.)
It is not clear to me to what extent this was part of the “training shoulder advisors” exercise, but to me, possibly the most important part of it is to keep the advisors at distance from your own thinking. In particular, in my impression, it seems likely the alignment research has been on average harmed by too many people “training their shoulder Eliezers” and the shoulder advisors pushing them to think in a crude version of Eliezer’s ontology.
I chose the “train a shoulder advisor” framing specifically to keep my/Eliezer’s models separate from the participants’ own models. And I do think this worked pretty well—I’ve had multiple conversations with a participant where they say something, I disagree with it, and then they say “yup, that’s what my John model said”—implying that they did in fact disagree with their John model. (That’s not quite direct evidence of maintaining a separate ontology, but it’s adjacent.)