The total amount of effort that has been invested in understanding which AIs make good successors is very small, even relative to the amount of effort that has been invested in understanding alignment. Moreover, it’s a separate problem that may independently turn out to be much easier or harder.
My sense is that ‘good successors’ are basically AIs who are aligned not on the question of preferences, but on the question of meta-preferences; that is, rather than asking the question “do I want that?” I ask the question of “could I imagine wanting that by only changing non-essential facts?”. The open philosophical question under that framing is “what facts are essential?”, which I don’t pretend to have a good answer to.
It’s not obvious to me that this is consistent with your view of what a ‘good successor’ is. It seems like possibly it’s consistent but the set of essential facts is very small (like whether or not it would participate in the universe shuffle), it’s consistent but the set of essential facts is large (like whether or not it has some instantiation of a list of virtues, even if the instantiation is very different from our own), its consistent but my framing is less helpful (because it places too much emphasis on my imagination instead of the essential facts, or something), or it’s inconsistent (because there are successors that seem good even though you couldn’t imagine wanting what they want without changing essential facts).
My sense is that ‘good successors’ are basically AIs who are aligned not on the question of preferences, but on the question of meta-preferences; that is, rather than asking the question “do I want that?” I ask the question of “could I imagine wanting that by only changing non-essential facts?”. The open philosophical question under that framing is “what facts are essential?”, which I don’t pretend to have a good answer to.
It’s not obvious to me that this is consistent with your view of what a ‘good successor’ is. It seems like possibly it’s consistent but the set of essential facts is very small (like whether or not it would participate in the universe shuffle), it’s consistent but the set of essential facts is large (like whether or not it has some instantiation of a list of virtues, even if the instantiation is very different from our own), its consistent but my framing is less helpful (because it places too much emphasis on my imagination instead of the essential facts, or something), or it’s inconsistent (because there are successors that seem good even though you couldn’t imagine wanting what they want without changing essential facts).