Daniel Kokotajlo comments on Adumbrations on AGI from an outsider

Daniel Kokotajlo 25 May 2023 18:12 UTC
9 points
5
I agree that logodds space is the right way to think about how close probabilities are. However, my epistemic situation right now is basically this:

”It sure seems like Doom is more likely than Safety, for a bunch of reasons. However, I feel sufficiently uncertain about stuff, and humble, that I don’t want to say e.g. 99% chance of doom, or even 90%. I can in fact imagine things being OK, in a couple different ways, even if those ways seem unlikely to me. … OK, now if I imagine someone having the flipped perspective, and thinking that things being OK is more likely than doom, but being humble and thinking that they should assign at least 10% credence (but less than 20%) to doom… I’d be like “what are you smoking? What world are you living in, where it seems like things will be fine by default but there are a few unlikely ways things could go badly, instead of a world where it seems like things will go badly by default but there are a few unlikely ways things could go well? I mean I can see how you’d think this is you weren’t aware of how short timelines to ASI are, or if you hadn’t thought much about the alignment problem...”

If you think this is unreasonable, I’d be interested to hear it!
- Jan_Kulveit 29 May 2023 15:19 UTC
  7 points
  2
  Parent
  I don’t think the way you imagine perspective inversion captures typical ways how to arrive at e.g. 20% doom probability. For example, I do believe that there are multiple good things which can happen/be true, decrease p(doom) and I put some weight on them
  - we do discover some relatively short description of something like “harmony and kindness”; this works as an alignment target
  - enough of morality is convergent
  - AI progress helps with human coordination (could be in costly way, eg warning shot)
  - it’s convergent to massively scale alignment efforts with AI power, and these solve some of the more obvious problems
  
  I would expect prevailing doom conditional on only small efforts to avoid it, but I do think the actual efforts will be substantial, and this moves the chances to ~20-30%. (Also I think most of the risk comes from not being able to deal with complex systems of many AIs and economy decoupling from humans, and single-single alignment to be solved sufficiently to prevent single system takeover by default.)
  - Daniel Kokotajlo 31 May 2023 20:38 UTC
    4 points
    0
    Parent
    Thanks for this comment. I’d be generally interested to hear more about how one could get to 20% doom (or less).
    
    The list you give above is cool but doesn’t do it for me; going down the list I’d guess something like:
    1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren’t out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from.
    2. 5% likely? Would want to think about this more. I could imagine myself being very wrong here actually, I haven’t thought about it enough. But it sure does sound like wishful thinking.
    3. This is already happening to some extent, but the question is, will it happen enough? My overall “humans coordinate to not build the dangerous kinds of AI for several years, long enough to figure out how to end the acute risk period” is where most of my hope comes from. I guess it’s the remaining 2/3rds basically. So, I guess I can say 20% likely.
    4. What does this mean?
    
    I would be much more optimistic if I thought timelines were longer.