So this is essentially a MIRI-style argument from game theory and potential acausal trades and such with potential other or future entities? And that these considerations will be chosen and enforced via some sort of coordination mechanism, since they have obvious short-term competition costs?
I am not sure; I’d need to compare with their notes. (I was not thinking of acausal trades or of interactions with not yet existing entities. But I was thinking that new entities coming into existence would be naturally inclined to join this kind of setup, so in this sense future entities are somewhat present.)
But I mostly assume something rather straightforward: a lot of self-interested individual entities with very diverse levels of smartness and capabilities who care about personal long-term persistence and, perhaps, even personal immortality (just like we are dreaming of personal immortality). These entities would like to maintain a reasonably harmonic social organization which would protect their interests (including their ability to continue to exist), regardless of whether they personally end up being relatively strong or relatively weak in the future. But they don’t want to inhibit competition excessively, they would like to collectively find a good balance between freedom and control. (Think of a very successful and pro-entrepreneurial social democracy ;-) That’s what one wants, although humans are not very good at setting something like that up and maintaining it...)
So, yes, one does need a good coordination mechanism, although it should probably be a distributed one, not a centralized one. A distributed, but well-connected self-governance, so that the ecosystem is not split into unconnected components, and not excessively centralized. On one hand, it is important to maintain the invariant that everyone’s interests are adequately protected. On the other hand, it is potentially “a technologies of mass destruction-heavy situation” (a lot of entities can each potentially cause a catastrophic destruction, because they are supercapable on various levels). So it is still not easy to organize. In particular, one needs a good deal of discussion and consensus before anything potentially very dangerous is done, and one needs not to accumulate probabilities for massive destruction, so one needs levels of safety to be improving with time.
In some sense, all this approach is achieving is as follows:
An alignment optimist: let the superintelligent AI solve alignment.
Eliezer: if you can rely on superintelligent systems to solve alignment, then you have already (mostly) solved alignment.
So here we have a setup where members of the ecosystem of intelligent and superintelligent systems want to solve for protection of individual rights and interests, and for protection of the ecosystem as a whole (because the integrity of the ecosystem is necessary (but not sufficient) to protect individuals).
So we are creating a situation where superintelligent systems actually want to maintain a world order we can be comfortable with, and, moreover, those superintelligent systems will make sure to apply efforts to preserve key features of such a world order through radical self-modifications.
So we are not “solving AI existential safety” directly, but we are trying to make sure that superintelligent AIs of the future will really care about “solving AI existential safety” and will keep caring about “solving AI existential safety” through radical self-modifications, so that “AI existential safety” will get “more solved” as the leading participants are getting smarter (instead of deteriorating as the leading participants are getting smarter).
So this is essentially a MIRI-style argument from game theory and potential acausal trades and such with potential other or future entities? And that these considerations will be chosen and enforced via some sort of coordination mechanism, since they have obvious short-term competition costs?
I am not sure; I’d need to compare with their notes. (I was not thinking of acausal trades or of interactions with not yet existing entities. But I was thinking that new entities coming into existence would be naturally inclined to join this kind of setup, so in this sense future entities are somewhat present.)
But I mostly assume something rather straightforward: a lot of self-interested individual entities with very diverse levels of smartness and capabilities who care about personal long-term persistence and, perhaps, even personal immortality (just like we are dreaming of personal immortality). These entities would like to maintain a reasonably harmonic social organization which would protect their interests (including their ability to continue to exist), regardless of whether they personally end up being relatively strong or relatively weak in the future. But they don’t want to inhibit competition excessively, they would like to collectively find a good balance between freedom and control. (Think of a very successful and pro-entrepreneurial social democracy ;-) That’s what one wants, although humans are not very good at setting something like that up and maintaining it...)
So, yes, one does need a good coordination mechanism, although it should probably be a distributed one, not a centralized one. A distributed, but well-connected self-governance, so that the ecosystem is not split into unconnected components, and not excessively centralized. On one hand, it is important to maintain the invariant that everyone’s interests are adequately protected. On the other hand, it is potentially “a technologies of mass destruction-heavy situation” (a lot of entities can each potentially cause a catastrophic destruction, because they are supercapable on various levels). So it is still not easy to organize. In particular, one needs a good deal of discussion and consensus before anything potentially very dangerous is done, and one needs not to accumulate probabilities for massive destruction, so one needs levels of safety to be improving with time.
In some sense, all this approach is achieving is as follows:
So here we have a setup where members of the ecosystem of intelligent and superintelligent systems want to solve for protection of individual rights and interests, and for protection of the ecosystem as a whole (because the integrity of the ecosystem is necessary (but not sufficient) to protect individuals).
So we are creating a situation where superintelligent systems actually want to maintain a world order we can be comfortable with, and, moreover, those superintelligent systems will make sure to apply efforts to preserve key features of such a world order through radical self-modifications.
So we are not “solving AI existential safety” directly, but we are trying to make sure that superintelligent AIs of the future will really care about “solving AI existential safety” and will keep caring about “solving AI existential safety” through radical self-modifications, so that “AI existential safety” will get “more solved” as the leading participants are getting smarter (instead of deteriorating as the leading participants are getting smarter).