Pongo comments on Preparing for “The Talk” with AI projects

Pongo 17 Jun 2020 4:52 UTC
3 points
Though the world this points at is pretty scary (a powerful AI system ready to go, only held back by the implementors buying safety concerns), the intervention does seem cheap and good.
I wonder whether 1 will be easy. I think it relies on the first AI systems being made by one of a small selection of easily-identifiable orgs
- Daniel Kokotajlo 17 Jun 2020 14:05 UTC
  2 points
  Parent
  Though the world this points at is pretty scary (a powerful AI system ready to go, only held back by the implementors buying safety concerns), the intervention does seem cheap and good.
  By scary, do you mean (or mean to imply) unlikely?
  I think that if AI happens soon (<10 years) it’ll likely happen at an org we already know about, so 1 is feasible. If AI doesn’t happen soon, all bets are off and 1 will be very difficult.
  - Pongo 17 Jun 2020 15:52 UTC
    3 points
    Parent
    By scary, do you mean (or mean to imply) unlikely?
    No. Sorry, I suspect starting with “Though” was confusing. I think I meant ‘this seems like one of the harder worlds to get a win in, but given that world, this seems like a good intervention’.
    I think I have an intuition where (a) we may only win if we stop things getting as bad as this situation and (b) extra expected utility is mostly cheaply purchased by plans that condition on worlds that are not this bad.
    I dunno whether that’s true though. I haven’t thought about it a bunch.
    - Daniel Kokotajlo 17 Jun 2020 17:58 UTC
      2 points
      Parent
      Interesting. I’d love to hear more about the sorts of worlds conditioned on in your (b). For my part, the worlds I described in the original post seem both the most likely and also not completely hopeless—maybe with a month of extra effort we can actually come up with a solution, or else a convincing argument that we need another month, etc. Or maybe we already have a mostly-working solution by the time The Talk happens and with another month we can iron out the bugs.
      - Pongo 19 Jun 2020 21:59 UTC
        1 point
        Parent
        I just wanted to say that this is a good question, but I’m not sure I know the answer yet.
        Worlds that appear most often in my musings (but I’m not sure they’re likely enough to count) are:
        an aligned group getting a decisive strategic advantage
        safety concerns being clearly demonstrated and part of mainstream AI research
        Perhaps general reasoning about agents and intelligence improves, and we can apply these techniques to AI designs
        Perhaps things contiguous with alignment concerns cause failures in capable AI systems early on
        A more alignable paradigm overtaking ML
        This seems like a fantasy
        Could be because ML gets bottlenecked or a different approach makes rapid progress
        Daniel Kokotajlo 19 Jun 2020 22:10 UTC
        3 points
        Parent
        Thanks, that was an illuminating answer. I feel like those three worlds are decently likely, but that if those worlds occur purchasing additional expected utility in them will be hard, precisely because things will be so much easier. For example, if safety concerns are part of mainstream AI research, then safety research won’t be neglected anymore.
        Pongo 19 Jun 2020 22:13 UTC
        1 point
        Parent
        You can purchase additional EU by pumping up their probability as well EDIT: I know I originally said to condition on these worlds, but I guess that’s not what I actually do. Instead, I think I condition on not-doomed worlds
        Daniel Kokotajlo 19 Jun 2020 22:49 UTC
        3 points
        Parent
        Ah, that sounds much better to me. Yeah, maybe the cheapest EU lies in trying to make these worlds more likely. I doubt we have much control over which paradigms overtake ML, and I think that the intervention I’m proposing might help make the first and second kinds of world more likely (because maybe with a month of extra time to analyze their system, the relevant people will become convinced that the problem is real)