You can imagine making a superintelligence whose mission is to prevent superintelligences from reshaping the world, but there are pitfalls, e.g. you don’t want it deeming humanity itself to be a distributed intelligence that needs to be stopped.
In the end, I think we need lightweight ways to achieve CEV (or something equivalent). The idea is there in the literature; a superintelligence can read and act upon what it reads; the challenge is to equip it with the right prior dispositions.
When I ponder all this I usually try to focus on the key difficulty of AI existential safety. ASIs and ecosystems of ASIs normally tend to self-modify and self-improve rapidly. Arbitrary values and goals are unlikely to be preserved through radical self-modifications.
So one question is: what are values and goals which might be natural, what are values and goals many members of the ASI ecosystem including entities which are much smarter and much more powerful than unaugmented humans might be naturally inclined to preserve through drastic self-modifications, and which might also have corollaries which are good from viewpoints of various human values and interests?
Basically, values and goals which are likely to be preserved through drastic self-modifications are probably non-anthropocentric, but they might have sufficiently anthropocentric corollaries, so that even the interests of unaugmented humans are protected (and also so that humans who would like to self-augment or to merge with AI systems can safely do so).
So what do we really need? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
What might the road towards adoption of those values (of preservation and protection of weaker entities and their interests) and incorporation of those kinds of goals be, so that these values and goals are preserved as the ASI ecosystem and many of its members self-modify drastically? One really needs the situation where many entities on varying levels of smartness and power care a lot about those values and goals.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be a reasonable starting point towards having values and goals which are likely to be preserved through drastic self-modifications and self-improvements and which are likely to imply good future for unaugmented humans and for the augmented humans as well.
Perhaps, we can nudge things towards making this kind of future more likely. (The alternatives are bleak, unrestricted drastic competition and, perhaps, even warfare between supersmart entities which would probably end badly not just for us, but for the ASI ecosystem as well)...
You can imagine making a superintelligence whose mission is to prevent superintelligences from reshaping the world, but there are pitfalls, e.g. you don’t want it deeming humanity itself to be a distributed intelligence that needs to be stopped.
In the end, I think we need lightweight ways to achieve CEV (or something equivalent). The idea is there in the literature; a superintelligence can read and act upon what it reads; the challenge is to equip it with the right prior dispositions.
When I ponder all this I usually try to focus on the key difficulty of AI existential safety. ASIs and ecosystems of ASIs normally tend to self-modify and self-improve rapidly. Arbitrary values and goals are unlikely to be preserved through radical self-modifications.
So one question is: what are values and goals which might be natural, what are values and goals many members of the ASI ecosystem including entities which are much smarter and much more powerful than unaugmented humans might be naturally inclined to preserve through drastic self-modifications, and which might also have corollaries which are good from viewpoints of various human values and interests?
Basically, values and goals which are likely to be preserved through drastic self-modifications are probably non-anthropocentric, but they might have sufficiently anthropocentric corollaries, so that even the interests of unaugmented humans are protected (and also so that humans who would like to self-augment or to merge with AI systems can safely do so).
So what do we really need? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
What might the road towards adoption of those values (of preservation and protection of weaker entities and their interests) and incorporation of those kinds of goals be, so that these values and goals are preserved as the ASI ecosystem and many of its members self-modify drastically? One really needs the situation where many entities on varying levels of smartness and power care a lot about those values and goals.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be a reasonable starting point towards having values and goals which are likely to be preserved through drastic self-modifications and self-improvements and which are likely to imply good future for unaugmented humans and for the augmented humans as well.
Perhaps, we can nudge things towards making this kind of future more likely. (The alternatives are bleak, unrestricted drastic competition and, perhaps, even warfare between supersmart entities which would probably end badly not just for us, but for the ASI ecosystem as well)...