We think it works like this
Who is “we”? Is it:
only you and your team?
the entire Apollo Research org?
the majority of mechinterp researchers worldwide?
some other group/category of people?
Also, this definitely deserves to be made into a high-level post, if you end up finding the time/energy/interest in making one.
[Coming at this a few months late, sorry. This comment by @Steven Byrnes sparked my interest in this topic once again]
Ngl, I find everything you’re written here a bit… baffling, Seth. Your writing in particular and your exposition of your thoughts on AI risk generally does not use evolutionary analogies, but this only means that posts and comments criticizing analogies with evolution (sample: 1, 2, 3, 4, 5, etc) are just not aimed at you and your reasoning. I greatly enjoy reading your writing and pondering the insights you bring up, but you are simply not even close to the most publicly-salient proponent of “somewhat high P(doom)” among the AI alignment community. It makes perfect sense from the perspective of those who disagree with you (or other, more hardcore “doomers”) on the bottom-line question of AI to focus their public discourse primarily on responding to the arguments brought up by the subset of “doomers” who are most salient and also most extreme in their views, namely the MIRI-cluster centered around Eliezer, Nate Soares, and Rob Bensinger.
And when you turn to MIRI and the views that its members have espoused on these topics, I am very surprised to hear that “The arguments for misgeneralization/mis-specification stand on their own” and are not ultimately based on analogies with evolution.
But anyway, to hopefully settle this once and for all, let’s go through all the examples that pop up in my head immediately when I think of this, shall we?
From the section on inner & outer alignment of “AGI Ruin: A List of Lethalities”, by Yudkowsky (I have removed the original emphasis and added my own):
From “A central AI alignment problem: capabilities generalization, and the sharp left turn”, by Nate Soares, which, by the way, quite literally uses the exact phrase “The central analogy”; as before, emphasis is mine:
From “The basic reasons I expect AGI ruin”, by Rob Bensinger:
From “Niceness is unnatural”, by Nate Soares:
From “Superintelligent AI is necessary for an amazing future, but far from sufficient”, by Nate Soares:
From the Eliezer-edited summary of “Ngo and Yudkowsky on alignment difficulty”, by… Ngo and Yudkowsky:
From “Comments on Carlsmith’s “Is power-seeking AI an existential risk?”″, by Nate Soares:
From “Soares, Tallinn, and Yudkowsky discuss AGI cognition”, by… well, you get the point:
From “Humans aren’t fitness maximizers”, by Soares:
From “Shah and Yudkowsky on alignment failures”, by the usual suspects:
From the comments on “Late 2021 MIRI Conversations: AMA / Discussion”, by Yudkowsky:
From Yudkowsky’s appearance on the Bankless podcast (full transcript here):
At this point, I’m tired, so I’m logging off. But I would bet a lot of money that I can find at least 3x the number of these examples if I had the energy to. As Alex Turner put it, it seems clear to me that, for a very high portion of “classic” alignment arguments about inner & outer alignment problems, at least in the form espoused by MIRI, the argumentative bedrock is ultimately based on little more than analogies with evolution.