Daniel Kokotajlo comments on Why I’m Optimistic About Near-Term AI Risk

Daniel Kokotajlo 17 May 2022 2:35 UTC
18 points
You may be right that the recent pessimistic takes aren’t representative of the field as a whole… but I think you also may be wrong. I say, instead of speculating about it, let’s do some surveys! I am curious.

I for one think that existential risk from AI is quite high in the next decade or so—maybe like 50% or more. I don’t know what you mean by “can’t see this path leading to high existential risk” but if by “high” you mean “10% or more” then I’d bet that most people in AI safety disagree with you and if you mean “around 50% or more” then there’s a substantial minority who disagree with you.
- Rohin Shah 17 May 2022 8:25 UTC
  9 points
  Parent
  Here’s a survey, though not for the next decade.
  I’d bet at even odds that a survey sent to the same population would have < 50% respondents claiming >= 10% probability to the question
  How likely do you think that, by the end of 2032, fewer than a million humans are alive, primarily as a result of AI systems not doing/optimizing what the people deploying them wanted/intended?
  (I’ve replaced “the overall value of the future will be drastically less than it could have been” with “fewer than a thousand humans are alive” because the former is not a discrete event that happens by 2032. That being said it is a clearly more strong condition which I don’t really like, I’d be keen on good operationalizations of the former.)
  - Michaël Trazzi 17 May 2022 9:31 UTC
    14 points
    Parent
    Thanks for the survey. Few nitpicks:
    - the survey you mention is ~1y old (May 3-May 26 2021). I would expect those researchers to have updated from the scaling laws trend continuing with Chinchilla, PaLM, Gato, etc. (Metaculus at least did update significantly, though one could argue that people taking the survey at CHAI, FHI, DeepMind etc. would be less surprised by the recent progress.)
    - I would prefer the question to mention “1M humans alive on the surface on the earth” to avoid people surviving inside “mine shafts” or on Mars/the Moon (similar to the Bryan Caplan / Yudkowsky bet).
  - Michaël Trazzi 17 May 2022 9:48 UTC
    5 points
    Parent
    For anyone of those (supposedly) > 50% respondents claiming a < 10% probability, I am happy to take 1:10 odds $1000 bet for:
    “by the end of 2032, fewer than a million humans are alive on the surface of the earth, primarily as a result of AI systems not doing/optimizing what the people deploying them wanted/intended”
    ~~Where, similar to Bryan Caplan’s bet with Yudwkosky, I get paid like $1000 now, and at the end of 2032 I give them back, adding 100 dollars.~~
    (Given inflation and interest, this seems like a bad deal for the one giving the money now, though I find it hard to predict 10y inflation and I do not want to have extra pressure to invest those $1000 for 10y. If someone has another deal in mind that would sound more interesting, do let me know here or by DM).
    To make the bet fair, the size of the bet would be the equivalent of the value in 2032 of $1000 worth in SPY ETF bought today (400.09 at May 16 close). And to mitigate the issue of not being around to receive the money, I would receive a payment of $1000 now. If I lose I give back whatever $1000 of SPY ETF from today is worth in 2032, adding 10% to that value.
    - Rohin Shah 17 May 2022 12:47 UTC
      5 points
      Parent
      This seems like a terrible deal even if I’m 100% guaranteed to win, I could do way better than a ~1% rate of return per year (e.g. buying Treasury bonds). You’d have to offer > $2000 before it seemed plausibly worth it.
      (In practice I’m not going to take you up on this even then, because the time cost in handling the bet is too high. I’d be a lot more likely to accept if there were a reliable third-party service that I strongly expected to still exist in 10 years that would deal with remembering to follow up in 10 years time and would guarantee to pay out even if you reneged or went bankrupt etc.)
      - Michaël Trazzi 17 May 2022 13:40 UTC
        5 points
        Parent
        Note: I updated the parent comment to take into account interest rates.
        In general, the way to mitigate trust would be to use an escrow, though when betting on doom-ish scenarios there would be little benefits in having $1000 in escrow if I “win”.
        For anyone reading this who also thinks that it would need to be >$2000 to be worth it, I am happy to give $2985 at the end of 2032, aka an additional 10% to the average annual return of the S&P 500 (ie 1.1 * (1.105^10 * 1000)), if that sounds less risky than the SPY ETF bet.
  - Daniel Kokotajlo 17 May 2022 16:51 UTC
    4 points
    Parent
    Thanks! OK, happy to bet. FWIW I’m not confident I’ll win; even odds sounds good to me. :)
    
    I don’t like that operationalization though; I prefer the original. I don’t think the discrete event thing is much of a problem, but if it is, here are some suggestions to fix it:
    ”The overall value of the future is drastically less than it could have been, and by 2032 there’s pretty much nothing we AI-risk-reducers can do about it—we blew our chance, it’s game over.”
    
    Or:
    
    ”At some point before 2032 a hypothetical disembodied, uninfluenced, rational version of yourself observing events unfold will become >90% confident that the overall value of the future will be drastically less than it could have been.”
    - Rohin Shah 18 May 2022 15:25 UTC
      4 points
      Parent
      I definitely like the second operationalization better. That being said I think that is pretty meaningfully different and I’m not willing to bet on it. I was expecting timelines to be a major objection to your initial claim, but it’s totally plausible that accumulating additional evidence gets people to believe in doom before doom actually occurs.
      Also we’d need someone to actually run the survey (I’m not likely to).
      I guess when you say “>= 10% x-risk in the next decade” you mean >= 10% chance that our actions don’t matter after that. I think it’s plausible a majority of the survey population would say that. If you also include the conjunct “and our actions matter between now and then” then I’m back to thinking that it’s less plausible.
      - Daniel Kokotajlo 18 May 2022 16:21 UTC
        4 points
        Parent
        How about we do a lazy bet: Neither of us runs the survey, but we agree that if such a survey is run and brought to our attention, the loser pays the winner?
        
        Difficulty with this is that we don’t get to pick the operationalization. Maybe our meta-operationalization can be “<50% of respondents claim >10% probability of X, where X is some claim that strongly implies AI takeover or other irreversible loss of human control / influence of human values, by 2032.” How’s that sound?
        ...but actually though I guess my credences aren’t that different from yours here so it’s maybe not worth our time to bet on. I actually have very little idea what the community thinks, I was just pushing back against the OP who seemed to be asserting a consensus without evidence.
        
        Rohin Shah 18 May 2022 18:31 UTC
        4 points
        Parent
        Sure, I’m happy to do a lazy bet of this form. (I’ll note that if we want to maintain the original point we should also require that the survey happen soon, e.g. in the next year or two, so that we avoid the case where someone does a survey in 2030 at which point it’s obvious how things go, but I’m also happy not putting a time bound on when the survey happens since given my beliefs on p(doom by 2032) I think this benefits me.)
        $100 at even odds?
        Daniel Kokotajlo 18 May 2022 20:45 UTC
        3 points
        Parent
        Deal! :)