I don’t think the way you imagine perspective inversion captures typical ways how to arrive at e.g. 20% doom probability. For example, I do believe that there are multiple good things which can happen/be true, decrease p(doom) and I put some weight on them - we do discover some relatively short description of something like “harmony and kindness”; this works as an alignment target - enough of morality is convergent - AI progress helps with human coordination (could be in costly way, eg warning shot) - it’s convergent to massively scale alignment efforts with AI power, and these solve some of the more obvious problems
I would expect prevailing doom conditional on only small efforts to avoid it, but I do think the actual efforts will be substantial, and this moves the chances to ~20-30%. (Also I think most of the risk comes from not being able to deal with complex systems of many AIs and economy decoupling from humans, and single-single alignment to be solved sufficiently to prevent single system takeover by default.)
Thanks for this comment. I’d be generally interested to hear more about how one could get to 20% doom (or less).
The list you give above is cool but doesn’t do it for me; going down the list I’d guess something like: 1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren’t out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from. 2. 5% likely? Would want to think about this more. I could imagine myself being very wrong here actually, I haven’t thought about it enough. But it sure does sound like wishful thinking. 3. This is already happening to some extent, but the question is, will it happen enough? My overall “humans coordinate to not build the dangerous kinds of AI for several years, long enough to figure out how to end the acute risk period” is where most of my hope comes from. I guess it’s the remaining 2/3rds basically. So, I guess I can say 20% likely. 4. What does this mean?
I would be much more optimistic if I thought timelines were longer.
I don’t think the way you imagine perspective inversion captures typical ways how to arrive at e.g. 20% doom probability. For example, I do believe that there are multiple good things which can happen/be true, decrease p(doom) and I put some weight on them
- we do discover some relatively short description of something like “harmony and kindness”; this works as an alignment target
- enough of morality is convergent
- AI progress helps with human coordination (could be in costly way, eg warning shot)
- it’s convergent to massively scale alignment efforts with AI power, and these solve some of the more obvious problems
I would expect prevailing doom conditional on only small efforts to avoid it, but I do think the actual efforts will be substantial, and this moves the chances to ~20-30%. (Also I think most of the risk comes from not being able to deal with complex systems of many AIs and economy decoupling from humans, and single-single alignment to be solved sufficiently to prevent single system takeover by default.)
Thanks for this comment. I’d be generally interested to hear more about how one could get to 20% doom (or less).
The list you give above is cool but doesn’t do it for me; going down the list I’d guess something like:
1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren’t out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from.
2. 5% likely? Would want to think about this more. I could imagine myself being very wrong here actually, I haven’t thought about it enough. But it sure does sound like wishful thinking.
3. This is already happening to some extent, but the question is, will it happen enough? My overall “humans coordinate to not build the dangerous kinds of AI for several years, long enough to figure out how to end the acute risk period” is where most of my hope comes from. I guess it’s the remaining 2/3rds basically. So, I guess I can say 20% likely.
4. What does this mean?
I would be much more optimistic if I thought timelines were longer.