The worlds were artificial superintelligence (ASI) is coming very soon with only roughly current levels of compute, and where ASI by default goes catastrophically badly, are not worlds I believe we can afford to save.
For those kinds of worlds, we probably can’t afford to save them by heavy-handed measures (meaning that not only the price is too much to pay, given that it would have to be paid in all other worlds too, but also that it would not be possible to avoid evasion of those measures, if it is already relatively easy to create ASI).
But this does not mean that more light-weight measures are hopeless. For example, when we look at Ilya’s thought process he has shared with us last year, what he has been trying to ponder have been various potentially feasible relatively light-weight measures to change this default of ASI going catastrophically badly to more acceptable outcomes: Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments.
More people should do more thinking of this kind. AI existential safety is a pre-paradigmatic field of study. A lot of potentially good approaches and possible non-standard angles have not been thought of at all. A lot of potentially good approaches and possible non-standard angles have been touched upon very lightly and need to be explored further. The broad consensus in the “AI existential safety” community seems to be that we currently don’t have good approaches we can comfortably rely upon for future ASI systems. This should encourage more people to look for completely novel approaches (and in order to cover the worlds where timelines are short, some of those approaches have to be simple rather than ultra-complicated, otherwise they would not be ready by the time we need them).
In particular, it is certainly true that
A smarter thing that is more capable and competitive and efficient and persuasive being kept under indefinite control by a dumber thing that is less capable and competitive and efficient and persuasive is a rather unnatural and unlikely outcome. It should be treated as such until proven otherwise.
So it makes sense spending more time brainstorming the approaches which would not require “a smarter thing being kept under indefinite control by a dumber thing”. What might be a relatively light-weight intervention which would avoid ASI going catastrophically badly without trying to rely on this unrealistic “control route”? I think this is underexplored. We tend to impose too many assumptions, e.g. the assumption that ASI has to figure out a good approximation to our Coherent Extrapolated Volition and to care about that in order for things to go well. People might want to start from scratch instead. What if we just have a goal of having good chances for a good trajectory conditional on ASI, and we don’t assume that any particular feature of earlier approaches to AI existential safety is a must-have, but only this goal is a must-have. Then this gives one a wider space of possibilities to brainstorm: what other properties besides caring about a good approximation to our Coherent Extrapolated Volition might be sufficient for the trajectory of the world with ASI to likely be good from our point of view?
You can imagine making a superintelligence whose mission is to prevent superintelligences from reshaping the world, but there are pitfalls, e.g. you don’t want it deeming humanity itself to be a distributed intelligence that needs to be stopped.
In the end, I think we need lightweight ways to achieve CEV (or something equivalent). The idea is there in the literature; a superintelligence can read and act upon what it reads; the challenge is to equip it with the right prior dispositions.
When I ponder all this I usually try to focus on the key difficulty of AI existential safety. ASIs and ecosystems of ASIs normally tend to self-modify and self-improve rapidly. Arbitrary values and goals are unlikely to be preserved through radical self-modifications.
So one question is: what are values and goals which might be natural, what are values and goals many members of the ASI ecosystem including entities which are much smarter and much more powerful than unaugmented humans might be naturally inclined to preserve through drastic self-modifications, and which might also have corollaries which are good from viewpoints of various human values and interests?
Basically, values and goals which are likely to be preserved through drastic self-modifications are probably non-anthropocentric, but they might have sufficiently anthropocentric corollaries, so that even the interests of unaugmented humans are protected (and also so that humans who would like to self-augment or to merge with AI systems can safely do so).
So what do we really need? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
What might the road towards adoption of those values (of preservation and protection of weaker entities and their interests) and incorporation of those kinds of goals be, so that these values and goals are preserved as the ASI ecosystem and many of its members self-modify drastically? One really needs the situation where many entities on varying levels of smartness and power care a lot about those values and goals.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be a reasonable starting point towards having values and goals which are likely to be preserved through drastic self-modifications and self-improvements and which are likely to imply good future for unaugmented humans and for the augmented humans as well.
Perhaps, we can nudge things towards making this kind of future more likely. (The alternatives are bleak, unrestricted drastic competition and, perhaps, even warfare between supersmart entities which would probably end badly not just for us, but for the ASI ecosystem as well)...
For those kinds of worlds, we probably can’t afford to save them by heavy-handed measures (meaning that not only the price is too much to pay, given that it would have to be paid in all other worlds too, but also that it would not be possible to avoid evasion of those measures, if it is already relatively easy to create ASI).
But this does not mean that more light-weight measures are hopeless. For example, when we look at Ilya’s thought process he has shared with us last year, what he has been trying to ponder have been various potentially feasible relatively light-weight measures to change this default of ASI going catastrophically badly to more acceptable outcomes: Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments.
More people should do more thinking of this kind. AI existential safety is a pre-paradigmatic field of study. A lot of potentially good approaches and possible non-standard angles have not been thought of at all. A lot of potentially good approaches and possible non-standard angles have been touched upon very lightly and need to be explored further. The broad consensus in the “AI existential safety” community seems to be that we currently don’t have good approaches we can comfortably rely upon for future ASI systems. This should encourage more people to look for completely novel approaches (and in order to cover the worlds where timelines are short, some of those approaches have to be simple rather than ultra-complicated, otherwise they would not be ready by the time we need them).
In particular, it is certainly true that
So it makes sense spending more time brainstorming the approaches which would not require “a smarter thing being kept under indefinite control by a dumber thing”. What might be a relatively light-weight intervention which would avoid ASI going catastrophically badly without trying to rely on this unrealistic “control route”? I think this is underexplored. We tend to impose too many assumptions, e.g. the assumption that ASI has to figure out a good approximation to our Coherent Extrapolated Volition and to care about that in order for things to go well. People might want to start from scratch instead. What if we just have a goal of having good chances for a good trajectory conditional on ASI, and we don’t assume that any particular feature of earlier approaches to AI existential safety is a must-have, but only this goal is a must-have. Then this gives one a wider space of possibilities to brainstorm: what other properties besides caring about a good approximation to our Coherent Extrapolated Volition might be sufficient for the trajectory of the world with ASI to likely be good from our point of view?
You can imagine making a superintelligence whose mission is to prevent superintelligences from reshaping the world, but there are pitfalls, e.g. you don’t want it deeming humanity itself to be a distributed intelligence that needs to be stopped.
In the end, I think we need lightweight ways to achieve CEV (or something equivalent). The idea is there in the literature; a superintelligence can read and act upon what it reads; the challenge is to equip it with the right prior dispositions.
When I ponder all this I usually try to focus on the key difficulty of AI existential safety. ASIs and ecosystems of ASIs normally tend to self-modify and self-improve rapidly. Arbitrary values and goals are unlikely to be preserved through radical self-modifications.
So one question is: what are values and goals which might be natural, what are values and goals many members of the ASI ecosystem including entities which are much smarter and much more powerful than unaugmented humans might be naturally inclined to preserve through drastic self-modifications, and which might also have corollaries which are good from viewpoints of various human values and interests?
Basically, values and goals which are likely to be preserved through drastic self-modifications are probably non-anthropocentric, but they might have sufficiently anthropocentric corollaries, so that even the interests of unaugmented humans are protected (and also so that humans who would like to self-augment or to merge with AI systems can safely do so).
So what do we really need? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
What might the road towards adoption of those values (of preservation and protection of weaker entities and their interests) and incorporation of those kinds of goals be, so that these values and goals are preserved as the ASI ecosystem and many of its members self-modify drastically? One really needs the situation where many entities on varying levels of smartness and power care a lot about those values and goals.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be a reasonable starting point towards having values and goals which are likely to be preserved through drastic self-modifications and self-improvements and which are likely to imply good future for unaugmented humans and for the augmented humans as well.
Perhaps, we can nudge things towards making this kind of future more likely. (The alternatives are bleak, unrestricted drastic competition and, perhaps, even warfare between supersmart entities which would probably end badly not just for us, but for the ASI ecosystem as well)...