They have a strong belief that in order to do good alignment research, you need to be good at “consequentialist reasoning,” i.e. model-based planning, that allows creatively figuring out paths to achieve goals.
I think this is a misunderstanding, and that approximately zero MIRI-adjacent researchers hold this belief (that good alignment research must be the product of good consequentialist reasoning). What seems more true to me is that they believe that better understanding consequentialist reasoning—e.g., where to expect it to be instantiated, what form it takes, how/why it “works”—is potentially highly relevant to alignment.
I think you might be misunderstanding Jan’s understanding. A big crux in this whole discussion between Eliezer and Richard seems to be: Eliezer believes any AI capable of doing good alignment research—at least good enough to provide a plan that would help humans make an aligned AGI—must be good at consequentialist reasoning in order to generate good alignment plans. (I gather from Nate’s notes in that conversation plus various other posts that he agrees with Eliezer here, but not certain.) I strongly doubt that Jan just mistook MIRI’s focus on understanding consequentialist reasonsing for a belief that alignment research requires being a consequentialist reasoner.
I think you’re right—thanks for this! It makes sense now that I recognise the quote was in a section titled “Alignment research can only be done by AI systems that are too dangerous to run”.
I think this is a misunderstanding, and that approximately zero MIRI-adjacent researchers hold this belief (that good alignment research must be the product of good consequentialist reasoning). What seems more true to me is that they believe that better understanding consequentialist reasoning—e.g., where to expect it to be instantiated, what form it takes, how/why it “works”—is potentially highly relevant to alignment.
I think you might be misunderstanding Jan’s understanding. A big crux in this whole discussion between Eliezer and Richard seems to be: Eliezer believes any AI capable of doing good alignment research—at least good enough to provide a plan that would help humans make an aligned AGI—must be good at consequentialist reasoning in order to generate good alignment plans. (I gather from Nate’s notes in that conversation plus various other posts that he agrees with Eliezer here, but not certain.) I strongly doubt that Jan just mistook MIRI’s focus on understanding consequentialist reasonsing for a belief that alignment research requires being a consequentialist reasoner.
I think you’re right—thanks for this! It makes sense now that I recognise the quote was in a section titled “Alignment research can only be done by AI systems that are too dangerous to run”.