johnswentworth comments on Most People Start With The Same Few Bad Ideas

johnswentworth 12 Sep 2022 23:35 UTC
LW: 9 AF: 5
1
AF
I chose the “train a shoulder advisor” framing specifically to keep my/Eliezer’s models separate from the participants’ own models. And I do think this worked pretty well—I’ve had multiple conversations with a participant where they say something, I disagree with it, and then they say “yup, that’s what my John model said”—implying that they did in fact disagree with their John model. (That’s not quite direct evidence of maintaining a separate ontology, but it’s adjacent.)