I think until recently, I’ve been consistently more pessimistic than Eliezer about AI existential safety. Here’s a 2004 SL4 post for example where I tried to argue against MIRI (SIAI at the time) trying to build a safe AI (and again in 2011). I’ve made my own list of sources of AI risk that’s somewhat similar to this list. But it seems to me that there are still various “outs” from certain doom, such that my probability of a good outcome is closer to 20% (maybe a range of 10-30% depending on my mood) than 1%.
Human thought partially exposes only a partially scrutable outer surface layer. Words only trace our real thoughts. Words are not an AGI-complete data representation in its native style. The underparts of human thought are not exposed for direct imitation learning and can’t be put in any dataset. This makes it hard and probably impossible to train a powerful system entirely on imitation of human words or other human-legible contents, which are only impoverished subsystems of human thoughts; unless that system is powerful enough to contain inner intelligences figuring out the humans, and at that point it is no longer really working as imitative human thought.
One of the biggest “outs” I see is that it turns out to be not that hard “to train a powerful system entirely on imitation of human words or other human-legible contents”, we (e.g., a relatively responsible AI lab) train such a system and then use it to differentially accelerate AI safety research. I definitely think that it’s very risky to rely on such black-box human imitations for existential safety, and that a competent civilization would be pursuing other plans where they can end up with greater certainty of success, but it seems there’s something like a 20% chance that it just works out anyway.
To explain my thinking a bit more, human children have to learn how to think human thoughts through “imitation of human words or other human-legible contents”. It’s possible that they can only do this successfully because their genes contain certain key ingredients that enable human thinking, but it also seems possible that children are just implementations of some generic imitation learning algorithm, so our artificial learning algorithms (once they become advanced/powerful enough) won’t be worse at learning to think like humans. I don’t know how to rule out the latter possibility with very high confidence. Eliezer, if you do, can you please explain this more?
I think until recently, I’ve been consistently more pessimistic than Eliezer about AI existential safety. Here’s a 2004 SL4 post for example where I tried to argue against MIRI (SIAI at the time) trying to build a safe AI (and again in 2011). I’ve made my own list of sources of AI risk that’s somewhat similar to this list. But it seems to me that there are still various “outs” from certain doom, such that my probability of a good outcome is closer to 20% (maybe a range of 10-30% depending on my mood) than 1%.
One of the biggest “outs” I see is that it turns out to be not that hard “to train a powerful system entirely on imitation of human words or other human-legible contents”, we (e.g., a relatively responsible AI lab) train such a system and then use it to differentially accelerate AI safety research. I definitely think that it’s very risky to rely on such black-box human imitations for existential safety, and that a competent civilization would be pursuing other plans where they can end up with greater certainty of success, but it seems there’s something like a 20% chance that it just works out anyway.
To explain my thinking a bit more, human children have to learn how to think human thoughts through “imitation of human words or other human-legible contents”. It’s possible that they can only do this successfully because their genes contain certain key ingredients that enable human thinking, but it also seems possible that children are just implementations of some generic imitation learning algorithm, so our artificial learning algorithms (once they become advanced/powerful enough) won’t be worse at learning to think like humans. I don’t know how to rule out the latter possibility with very high confidence. Eliezer, if you do, can you please explain this more?