One thing to keep in mind is that “getting it right on the first try” is a good framing if one is actually going to create an AI system which would take over the world (which is a very risky proposition).
If one is not aiming for that, and instead thinks in terms of making sure AI systems don’t try to take over the world as one of their safety properties, then things are somewhat different:
on one hand, one needs to avoid the catastrophe not just on the first try, but on every try, which is a much higher bar;
on the other hand, one needs to ponder the collective dynamics of the AI ecosystem (and the AI-human ecosystem); things are getting rather non-trivial in the absence of the dominant actor.
When we ponder the questions of AI existential safety, we should consider both models (“singleton” vs “multi-polar”).
It’s traditional for the AI alignment community to mostly focus on the “single AI” scenario, but since avoiding the singleton takeover is usually considered to be one of the goals, we should also pay more attention to the multi-polar track which is the default fall-back in the absence of a singleton takeover (at some point I scribbled a bit of notes reflecting my thoughts with regard to the multi-polar track, Exploring non-anthropocentric aspects of AI existential safety)
But many people are hoping that our collaborations with emerging AI systems, thinking together with those AI systems about all these issues, will lead to more insights and, perhaps, to different fruitful approaches (assuming that we have enough time to take advantage of this stronger joint thinking power, that is assuming that things develop and become more smart at a reasonable pace, without rapid blow-ups). So there is reason for hope in this sense...
I pondered this more...
One thing to keep in mind is that “getting it right on the first try” is a good framing if one is actually going to create an AI system which would take over the world (which is a very risky proposition).
If one is not aiming for that, and instead thinks in terms of making sure AI systems don’t try to take over the world as one of their safety properties, then things are somewhat different:
on one hand, one needs to avoid the catastrophe not just on the first try, but on every try, which is a much higher bar;
on the other hand, one needs to ponder the collective dynamics of the AI ecosystem (and the AI-human ecosystem); things are getting rather non-trivial in the absence of the dominant actor.
When we ponder the questions of AI existential safety, we should consider both models (“singleton” vs “multi-polar”).
It’s traditional for the AI alignment community to mostly focus on the “single AI” scenario, but since avoiding the singleton takeover is usually considered to be one of the goals, we should also pay more attention to the multi-polar track which is the default fall-back in the absence of a singleton takeover (at some point I scribbled a bit of notes reflecting my thoughts with regard to the multi-polar track, Exploring non-anthropocentric aspects of AI existential safety)
But many people are hoping that our collaborations with emerging AI systems, thinking together with those AI systems about all these issues, will lead to more insights and, perhaps, to different fruitful approaches (assuming that we have enough time to take advantage of this stronger joint thinking power, that is assuming that things develop and become more smart at a reasonable pace, without rapid blow-ups). So there is reason for hope in this sense...