I think that a recent tweet thread by Michael Nielsen and the quoted one by Emmett Shear represent genuine progress towards making AI existential safety more tractable.
Michael Nielsen observes, in particular:
As far as I can see, alignment isn’t a property of an AI system. It’s a property of the entire world, and if you are trying to discuss it as a system property you will inevitably end up making bad mistakes
Since AI existential safety is a property of the whole ecosystem (and is, really, not too drastically different from World existential safety), this should be the starting point, rather than stand-alone properties of any particular AI system.
Emmett Shear writes:
Hopefully you’ve validated whatever your approach is, but only one of these is stable long term: care. Because care can be made stable under reflection, people are careful (not a coincidence, haha) when it comes to decisions that might impact those they care about.
And Zvi responds
Technically I would say: Powerful entities generally caring about X tends not to be a stable equilibrium, even if it is stable ‘on reflection’ within a given entity. It will only hold if caring more about X provides a competitive advantage against other similarly powerful entities, or if there can never be a variation in X-caring levels between such entities that arises other than through reflection, and also reflection never causes reductions in X-caring despite this being competitively advantageous. Also note that variation in what else you care about to what extent is effectively variation in X-caring.
Or more bluntly: The ones that don’t care, or care less, outcompete the ones that care.
Even the best case scenarios here, when they play out the ways we would hope, do not seem all that hopeful.
That all, of course, sets aside the question of whether we could get this ‘caring’ thing to operationally work in the first place. That seems very hard.
Let’s now consider this in light of what Michael Nielsen is saying.
I am going to only consider the case where we have plenty of powerful entities with long-term goals and long-term existence which care about their long-term goals and long-term existence. This seems to be the case which Zvi is considering here, and it is the case we understand the best, because we also live in the reality with plenty of powerful entities (ourselves, some organizations, etc) with long-term goals and long-term existence. So this is an incomplete consideration: it only includes the scenarios where powerful entities with long-term goals and long-terms existence retain a good fraction of overall available power.
So what do we really need? What are the properties we want the World to have? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
Here is how we might be able to have a trajectory where these properties are stable, despite all drastic changes of the self-modifying and self-improving ecosystem.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be enough to produce effective counter-weight to unrestricted competition (just like human societies have mechanisms against unrestricted competition). Basically, smarter-than-human entities on all levels of power are likely to be interested in the overall society having general principles and practices of protecting its members on various levels of smartness and power, and that’s why they’ll care enough for the overall society to continue to self-regulate and to enforce these principles.
This is not yet the solution, but I think this is pointing in the right direction...
I think that a recent tweet thread by Michael Nielsen and the quoted one by Emmett Shear represent genuine progress towards making AI existential safety more tractable.
Michael Nielsen observes, in particular:
Since AI existential safety is a property of the whole ecosystem (and is, really, not too drastically different from World existential safety), this should be the starting point, rather than stand-alone properties of any particular AI system.
Emmett Shear writes:
And Zvi responds
Let’s now consider this in light of what Michael Nielsen is saying.
I am going to only consider the case where we have plenty of powerful entities with long-term goals and long-term existence which care about their long-term goals and long-term existence. This seems to be the case which Zvi is considering here, and it is the case we understand the best, because we also live in the reality with plenty of powerful entities (ourselves, some organizations, etc) with long-term goals and long-term existence. So this is an incomplete consideration: it only includes the scenarios where powerful entities with long-term goals and long-terms existence retain a good fraction of overall available power.
So what do we really need? What are the properties we want the World to have? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
Here is how we might be able to have a trajectory where these properties are stable, despite all drastic changes of the self-modifying and self-improving ecosystem.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be enough to produce effective counter-weight to unrestricted competition (just like human societies have mechanisms against unrestricted competition). Basically, smarter-than-human entities on all levels of power are likely to be interested in the overall society having general principles and practices of protecting its members on various levels of smartness and power, and that’s why they’ll care enough for the overall society to continue to self-regulate and to enforce these principles.
This is not yet the solution, but I think this is pointing in the right direction...