I have tried various versions of this explanation over the last year and a half. I have yet to see a good response, but it clearly is not getting through to many people either.
But let me try again. Perhaps you have not seen that one. And perhaps you’ll see this one and respond...
you have a trilemma:
Free competition between entities for resources or control.
Entities that can outcompete humans.
Humans surviving or remaining in control.
Well, “free competition between entities for resources or control” has to go out of the window in a world with superintelligent entities. The community of superintelligent entities needs to self-regulate in a reasonable fashion, otherwise they are likely to blow the fabric of reality to bits and destroy themselves and everything else.
The superintelligent entities have to find some reasonable ways to self-organize, and those ways to self-organize have to constitute a good compromise between freedom and collective control.
Joscha Bach says the key to safe AI is to make the AIs conscious, because consciousness is what we care about and we had better hope it cares about this fact.
Yes, in that write-up I just referenced, “Exploring non-anthropocentric aspects of AI existential safety”, I also tried to address this via the route of making the AIs sentient. And I still think this makes a lot of sense.
But given that we have such a poor understanding of sentience, and that we have no idea what systems are sentient and what systems are not sentient, it is good to consider an approach which would not depend on that. That’s the approach I am considering in my comment to your AI #57 which I reference above. This is an approach based on the rights of individuals, without specifying whether those individuals are sentient.
I’ll quote the key part here, but I’ll add the emphasis:
I am going to only consider the case where we have plenty of powerful entities with long-term goals and long-term existence which care about their long-term goals and long-term existence. This seems to be the case which Zvi is considering here, and it is the case we understand the best, because we also live in the reality with plenty of powerful entities (ourselves, some organizations, etc) with long-term goals and long-term existence. So this is an incomplete consideration: it only includes the scenarios where powerful entities with long-term goals and long-terms existence retain a good fraction of overall available power.
So what do we really need? What are the properties we want the World to have? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
Here is how we might be able to have a trajectory where these properties are stable, despite all drastic changes of the self-modifying and self-improving ecosystem.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be enough to produce effective counter-weight to unrestricted competition (just like human societies have mechanisms against unrestricted competition). Basically, smarter-than-human entities on all levels of power are likely to be interested in the overall society having general principles and practices of protecting its members on various levels of smartness and power, and that’s why they’ll care enough for the overall society to continue to self-regulate and to enforce these principles.
This is not yet the solution, but I think this is pointing in the right direction...
Note that this approach is not based on unrealistic things, like control of super-smart entities by less smart entities, or forcing super-smart entities to have values and goals of less smart entities. Yes, those unrealistic approaches are unlikely to work well, and are likely to backfire.
It is a more symmetric approach, based on equal rights and on individual rights.
So this is essentially a MIRI-style argument from game theory and potential acausal trades and such with potential other or future entities? And that these considerations will be chosen and enforced via some sort of coordination mechanism, since they have obvious short-term competition costs?
I am not sure; I’d need to compare with their notes. (I was not thinking of acausal trades or of interactions with not yet existing entities. But I was thinking that new entities coming into existence would be naturally inclined to join this kind of setup, so in this sense future entities are somewhat present.)
But I mostly assume something rather straightforward: a lot of self-interested individual entities with very diverse levels of smartness and capabilities who care about personal long-term persistence and, perhaps, even personal immortality (just like we are dreaming of personal immortality). These entities would like to maintain a reasonably harmonic social organization which would protect their interests (including their ability to continue to exist), regardless of whether they personally end up being relatively strong or relatively weak in the future. But they don’t want to inhibit competition excessively, they would like to collectively find a good balance between freedom and control. (Think of a very successful and pro-entrepreneurial social democracy ;-) That’s what one wants, although humans are not very good at setting something like that up and maintaining it...)
So, yes, one does need a good coordination mechanism, although it should probably be a distributed one, not a centralized one. A distributed, but well-connected self-governance, so that the ecosystem is not split into unconnected components, and not excessively centralized. On one hand, it is important to maintain the invariant that everyone’s interests are adequately protected. On the other hand, it is potentially “a technologies of mass destruction-heavy situation” (a lot of entities can each potentially cause a catastrophic destruction, because they are supercapable on various levels). So it is still not easy to organize. In particular, one needs a good deal of discussion and consensus before anything potentially very dangerous is done, and one needs not to accumulate probabilities for massive destruction, so one needs levels of safety to be improving with time.
In some sense, all this approach is achieving is as follows:
An alignment optimist: let the superintelligent AI solve alignment.
Eliezer: if you can rely on superintelligent systems to solve alignment, then you have already (mostly) solved alignment.
So here we have a setup where members of the ecosystem of intelligent and superintelligent systems want to solve for protection of individual rights and interests, and for protection of the ecosystem as a whole (because the integrity of the ecosystem is necessary (but not sufficient) to protect individuals).
So we are creating a situation where superintelligent systems actually want to maintain a world order we can be comfortable with, and, moreover, those superintelligent systems will make sure to apply efforts to preserve key features of such a world order through radical self-modifications.
So we are not “solving AI existential safety” directly, but we are trying to make sure that superintelligent AIs of the future will really care about “solving AI existential safety” and will keep caring about “solving AI existential safety” through radical self-modifications, so that “AI existential safety” will get “more solved” as the leading participants are getting smarter (instead of deteriorating as the leading participants are getting smarter).
I thought my response in Zvi AI #57 was reasonably good: https://www.lesswrong.com/posts/5Dz3ZrwBzzMfaucrH/ai-57-all-the-ai-news-that-s-fit-to-print#ckYsqx2Kp6HTAR22b
But let me try again. Perhaps you have not seen that one. And perhaps you’ll see this one and respond...
Well, “free competition between entities for resources or control” has to go out of the window in a world with superintelligent entities. The community of superintelligent entities needs to self-regulate in a reasonable fashion, otherwise they are likely to blow the fabric of reality to bits and destroy themselves and everything else.
The superintelligent entities have to find some reasonable ways to self-organize, and those ways to self-organize have to constitute a good compromise between freedom and collective control.
Here I elaborate on this further: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential.
Yes, in that write-up I just referenced, “Exploring non-anthropocentric aspects of AI existential safety”, I also tried to address this via the route of making the AIs sentient. And I still think this makes a lot of sense.
But given that we have such a poor understanding of sentience, and that we have no idea what systems are sentient and what systems are not sentient, it is good to consider an approach which would not depend on that. That’s the approach I am considering in my comment to your AI #57 which I reference above. This is an approach based on the rights of individuals, without specifying whether those individuals are sentient.
I’ll quote the key part here, but I’ll add the emphasis:
Note that this approach is not based on unrealistic things, like control of super-smart entities by less smart entities, or forcing super-smart entities to have values and goals of less smart entities. Yes, those unrealistic approaches are unlikely to work well, and are likely to backfire.
It is a more symmetric approach, based on equal rights and on individual rights.
So this is essentially a MIRI-style argument from game theory and potential acausal trades and such with potential other or future entities? And that these considerations will be chosen and enforced via some sort of coordination mechanism, since they have obvious short-term competition costs?
I am not sure; I’d need to compare with their notes. (I was not thinking of acausal trades or of interactions with not yet existing entities. But I was thinking that new entities coming into existence would be naturally inclined to join this kind of setup, so in this sense future entities are somewhat present.)
But I mostly assume something rather straightforward: a lot of self-interested individual entities with very diverse levels of smartness and capabilities who care about personal long-term persistence and, perhaps, even personal immortality (just like we are dreaming of personal immortality). These entities would like to maintain a reasonably harmonic social organization which would protect their interests (including their ability to continue to exist), regardless of whether they personally end up being relatively strong or relatively weak in the future. But they don’t want to inhibit competition excessively, they would like to collectively find a good balance between freedom and control. (Think of a very successful and pro-entrepreneurial social democracy ;-) That’s what one wants, although humans are not very good at setting something like that up and maintaining it...)
So, yes, one does need a good coordination mechanism, although it should probably be a distributed one, not a centralized one. A distributed, but well-connected self-governance, so that the ecosystem is not split into unconnected components, and not excessively centralized. On one hand, it is important to maintain the invariant that everyone’s interests are adequately protected. On the other hand, it is potentially “a technologies of mass destruction-heavy situation” (a lot of entities can each potentially cause a catastrophic destruction, because they are supercapable on various levels). So it is still not easy to organize. In particular, one needs a good deal of discussion and consensus before anything potentially very dangerous is done, and one needs not to accumulate probabilities for massive destruction, so one needs levels of safety to be improving with time.
In some sense, all this approach is achieving is as follows:
So here we have a setup where members of the ecosystem of intelligent and superintelligent systems want to solve for protection of individual rights and interests, and for protection of the ecosystem as a whole (because the integrity of the ecosystem is necessary (but not sufficient) to protect individuals).
So we are creating a situation where superintelligent systems actually want to maintain a world order we can be comfortable with, and, moreover, those superintelligent systems will make sure to apply efforts to preserve key features of such a world order through radical self-modifications.
So we are not “solving AI existential safety” directly, but we are trying to make sure that superintelligent AIs of the future will really care about “solving AI existential safety” and will keep caring about “solving AI existential safety” through radical self-modifications, so that “AI existential safety” will get “more solved” as the leading participants are getting smarter (instead of deteriorating as the leading participants are getting smarter).