Jan_Kulveit comments on Adumbrations on AGI from an outsider

Jan_Kulveit 25 May 2023 10:10 UTC
10 points
4
As a minor nitpick, 70% likely and 20% are quite close in logodds space, so it seems odd you think what you believe is reasonable and something so close is “very unreasonable”.
- Daniel Kokotajlo 25 May 2023 18:12 UTC
  9 points
  5
  Parent
  I agree that logodds space is the right way to think about how close probabilities are. However, my epistemic situation right now is basically this:
  
  ”It sure seems like Doom is more likely than Safety, for a bunch of reasons. However, I feel sufficiently uncertain about stuff, and humble, that I don’t want to say e.g. 99% chance of doom, or even 90%. I can in fact imagine things being OK, in a couple different ways, even if those ways seem unlikely to me. … OK, now if I imagine someone having the flipped perspective, and thinking that things being OK is more likely than doom, but being humble and thinking that they should assign at least 10% credence (but less than 20%) to doom… I’d be like “what are you smoking? What world are you living in, where it seems like things will be fine by default but there are a few unlikely ways things could go badly, instead of a world where it seems like things will go badly by default but there are a few unlikely ways things could go well? I mean I can see how you’d think this is you weren’t aware of how short timelines to ASI are, or if you hadn’t thought much about the alignment problem...”
  
  If you think this is unreasonable, I’d be interested to hear it!
  - Jan_Kulveit 29 May 2023 15:19 UTC
    7 points
    2
    Parent
    I don’t think the way you imagine perspective inversion captures typical ways how to arrive at e.g. 20% doom probability. For example, I do believe that there are multiple good things which can happen/be true, decrease p(doom) and I put some weight on them
    - we do discover some relatively short description of something like “harmony and kindness”; this works as an alignment target
    - enough of morality is convergent
    - AI progress helps with human coordination (could be in costly way, eg warning shot)
    - it’s convergent to massively scale alignment efforts with AI power, and these solve some of the more obvious problems
    
    I would expect prevailing doom conditional on only small efforts to avoid it, but I do think the actual efforts will be substantial, and this moves the chances to ~20-30%. (Also I think most of the risk comes from not being able to deal with complex systems of many AIs and economy decoupling from humans, and single-single alignment to be solved sufficiently to prevent single system takeover by default.)
    - Daniel Kokotajlo 31 May 2023 20:38 UTC
      4 points
      0
      Parent
      Thanks for this comment. I’d be generally interested to hear more about how one could get to 20% doom (or less).
      
      The list you give above is cool but doesn’t do it for me; going down the list I’d guess something like:
      1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren’t out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from.
      2. 5% likely? Would want to think about this more. I could imagine myself being very wrong here actually, I haven’t thought about it enough. But it sure does sound like wishful thinking.
      3. This is already happening to some extent, but the question is, will it happen enough? My overall “humans coordinate to not build the dangerous kinds of AI for several years, long enough to figure out how to end the acute risk period” is where most of my hope comes from. I guess it’s the remaining 2/3rds basically. So, I guess I can say 20% likely.
      4. What does this mean?
      
      I would be much more optimistic if I thought timelines were longer.
- nicholashalden 25 May 2023 10:26 UTC
  3 points
  0
  Parent
  This seems to violate common sense. Why would you think about this in log space? 99% and 1% are identical in if(>0) space, but they have massively different implications for how you think about a risk (just like 20 and 70% do!)
  - Jan_Kulveit 25 May 2023 11:28 UTC
    11 points
    3
    Parent
    It’s much more natural way how to think about it (cf eg TE Janes, Probability theory, examples in Chapter IV)
    
    In this specific case of evaluating hypothesis, the distance in the logodds space indicates the strength the evidence you would need to see to update. Close distance implies you don’t that much evidence to update between the positions (note the distance between 0.7 and 0.2 is closer than 0.9 and 0.99). If you need only a small amount of evidence to update, it is easy to imagine some other observer as reasonable as you had accumulated a bit or two somewhere you haven’t seen.
    
    Because working in logspace is way more natural, it is almost certainly also what our brains do—the “common sense” is almost certainly based on logspace representations.