paulfchristiano comments on AMA: Paul Christiano, alignment researcher

paulfchristiano 30 Apr 2021 19:22 UTC
LW: 49 AF: 15
AF
As an aside, I think that the possibility of “work doesn’t matter” is typically way more important then “work was net bad,” at least once you are making a serious effort to do something good rather than bad for the world (I agree that for the “average” project in the world the negative impacts are actually pretty large relative to the positive impacts).
EAs/rationalists often focus on the chance of a big downside clawing back value. I think that makes sense to think seriously about, and sometimes it’s a big deal, but most of the time the quantitative estimates just don’t seem to add up at all to me and I think people are making a huge quantitative error. I’m not sure exactly where we disagree, I think a lot of it is just that I’m way more skeptical about the ability to incidentally change the world a huge amount—I think that changing the world a lot usually just takes quite a bit of effort.
I guess in some sense I agree that the downside is big for normal butterfly-effect-y reasons (probably 50% of well-intentioned actions make the world worse ex post), so it’s also possible that I’m just answering this question in a slightly different way.
My big caveat is that I think the numbers typically come out different (and the prior presumption can be different) when you are trying to e.g. grab political power or influence, or doing something that undermines other people’s plans / is deliberately designed to hurt them. I don’t think these are the main times EAs end up worrying about this though, and of course in particular my research isn’t really trying to fight anyone or grab power.)
- DanielFilan 1 May 2021 3:41 UTC
  LW: 6 AF: 3
  AF Parent
  I guess I feel like we’re in a domain where some people were like “we have concretely-specifiable tasks, intelligence is good, what if we figured how to create artificial intelligence to do those tasks”, which is the sort of thing that someone trying to do good for the world would do, but had some serious chance of being very bad for the world. So in that domain, it seems to me that we should keep our eyes out for things that might be really bad for the world, because all the things in that domain are kind of similar.
  
  That being said, I agree that the possibility that the work doesn’t matter is more important once you’re making a thoughtful effort to do good. But I see much more effort and thought into trying to address that part, such that the occasional nudge to consider negative impacts seems appropriate to me.
  - paulfchristiano 1 May 2021 16:44 UTC
    LW: 7 AF: 4
    AF Parent
    I think it’s good to sometimes meditate on whether you are making the world worse (and get others’ advice), and I’d more often recommend it for crowds other than EA and certainly wouldn’t discourage people from doing it sometimes.
    I’m sympathetic to arguments that you should be super paranoid in domains like biosecurity since it honestly does seem asymmetrically easier to make things worse rather than better. But when people talk about it in the context of e.g. AI or policy interventions or gathering better knowledge about the world that might also have some negative side-effects, I often feel like there’s little chance that predictable negative effects they are imagining loom large in the cost-benefit unless the whole thing is predictably pointless. Which isn’t a reason not to consider those effects, just a push-back against the conclusion (and a heuristic push-back against the state of affairs where people are paralyzed by the possibility of negative consequences based on kind of tentative arguments).
    For advancing or deploying AI I generally have an attitude like “Even if actively trying to push the field forward full-time I’d be a small part of that effort, whereas I’m a much larger fraction of the stuff-that-we-would-be-sad-about-not-happening-if-the-field-went-faster, and I’m not trying to push the field forward,” so while I’m on board with being particularly attentive to harms if you’re in a field you think can easily cause massive harms, in this case I feel pretty comfortable about the expected cost-benefit unless alignment work isn’t really helping much (in which case I have more important reasons not to work on it). I would feel differently about this if pushing AI faster was net bad on e.g. some common-sense perspective on which alignment was not very helpful, but I feel like I’ve engaged enough with those perspectives to be mostly not having it.
    - Beth Barnes 4 May 2021 18:12 UTC
      LW: 16 AF: 8
      AF Parent
      “Even if actively trying to push the field forward full-time I’d be a small part of that effort”
      I think conditioning on something like ‘we’re broadly correct about AI safety’ implies ‘we’re right about some important things about how AI development will go that the rest of the ML community is surprisingly wrong about’. In that world we’re maybe able to contribute as much as a much larger fraction of the field, due to being correct about some things that everyone else is wrong about.
      I think your overall point still stands, but it does seem like you sometimes overestimate how obvious things are to the rest of the ML community