As an aside, I think that the possibility of “work doesn’t matter” is typically way more important then “work was net bad,” at least once you are making a serious effort to do something good rather than bad for the world (I agree that for the “average” project in the world the negative impacts are actually pretty large relative to the positive impacts).
EAs/rationalists often focus on the chance of a big downside clawing back value. I think that makes sense to think seriously about, and sometimes it’s a big deal, but most of the time the quantitative estimates just don’t seem to add up at all to me and I think people are making a huge quantitative error. I’m not sure exactly where we disagree, I think a lot of it is just that I’m way more skeptical about the ability to incidentally change the world a huge amount—I think that changing the world a lot usually just takes quite a bit of effort.
I guess in some sense I agree that the downside is big for normal butterfly-effect-y reasons (probably 50% of well-intentioned actions make the world worse ex post), so it’s also possible that I’m just answering this question in a slightly different way.
My big caveat is that I think the numbers typically come out different (and the prior presumption can be different) when you are trying to e.g. grab political power or influence, or doing something that undermines other people’s plans / is deliberately designed to hurt them. I don’t think these are the main times EAs end up worrying about this though, and of course in particular my research isn’t really trying to fight anyone or grab power.)
I guess I feel like we’re in a domain where some people were like “we have concretely-specifiable tasks, intelligence is good, what if we figured how to create artificial intelligence to do those tasks”, which is the sort of thing that someone trying to do good for the world would do, but had some serious chance of being very bad for the world. So in that domain, it seems to me that we should keep our eyes out for things that might be really bad for the world, because all the things in that domain are kind of similar.
That being said, I agree that the possibility that the work doesn’t matter is more important once you’re making a thoughtful effort to do good. But I see much more effort and thought into trying to address that part, such that the occasional nudge to consider negative impacts seems appropriate to me.
I think it’s good to sometimes meditate on whether you are making the world worse (and get others’ advice), and I’d more often recommend it for crowds other than EA and certainly wouldn’t discourage people from doing it sometimes.
I’m sympathetic to arguments that you should be super paranoid in domains like biosecurity since it honestly does seem asymmetrically easier to make things worse rather than better. But when people talk about it in the context of e.g. AI or policy interventions or gathering better knowledge about the world that might also have some negative side-effects, I often feel like there’s little chance that predictable negative effects they are imagining loom large in the cost-benefit unless the whole thing is predictably pointless. Which isn’t a reason not to consider those effects, just a push-back against the conclusion (and a heuristic push-back against the state of affairs where people are paralyzed by the possibility of negative consequences based on kind of tentative arguments).
For advancing or deploying AI I generally have an attitude like “Even if actively trying to push the field forward full-time I’d be a small part of that effort, whereas I’m a much larger fraction of the stuff-that-we-would-be-sad-about-not-happening-if-the-field-went-faster, and I’m not trying to push the field forward,” so while I’m on board with being particularly attentive to harms if you’re in a field you think can easily cause massive harms, in this case I feel pretty comfortable about the expected cost-benefit unless alignment work isn’t really helping much (in which case I have more important reasons not to work on it). I would feel differently about this if pushing AI faster was net bad on e.g. some common-sense perspective on which alignment was not very helpful, but I feel like I’ve engaged enough with those perspectives to be mostly not having it.
“Even if actively trying to push the field forward full-time I’d be a small part of that effort”
I think conditioning on something like ‘we’re broadly correct about AI safety’ implies ‘we’re right about some important things about how AI development will go that the rest of the ML community is surprisingly wrong about’. In that world we’re maybe able to contribute as much as a much larger fraction of the field, due to being correct about some things that everyone else is wrong about.
I think your overall point still stands, but it does seem like you sometimes overestimate how obvious things are to the rest of the ML community
Some plausible and non-exhaustive options, in roughly descending order of plausibility:
I crowd out other people who would have done a better job of working on alignment (either by being better or just by being more). People feel like in order to be taken seriously they have to engage with Paul’s writing and ideas and that’s annoying. Or the space seems like a confused mess with sloppy standards in part because of my influence. Or more charitably maybe they are more likely to feel like it’s “under control.” Or maybe I claim ideas and make it harder for others to get credit even if they would have developed the ideas further or better (or even end up stealing the credit for others’ ideas and disincentivizing them from entering the field).
I convincingly or at least socially-forcefully argue for conclusions that turn out to be wrong (and maybe I should have understood as wrong) and so everyone ends up wronger and makes mistakes that have a negative effect. I mean ex post I think this kind of thing is pretty likely in some important cases (if I’m 80-20 and convince people to update in my favor I still think there’s a 20% chance that I pushed people in the wrong direction and across many issues this is definitely going to happen)
I contribute to social cover for irresponsible projects that want to pretend they are contributing to alignment, making it harder for the world to coordinate to block such projects.
I convince people to be less worried about alignment and therefore undermine investment in alignment.
What I describe as “alignment” actually significantly hastens the arrival of catastrophically risky AI—either because these techniques are needed even to build any AI systems that have a big impact on the world, or because they hold out promise of letting the developer actually benefit of AI and so incentivize more development or deployment.
Pre-hindsight: 100 years from now, it is clear that your research has been net bad for the long-term future. What happened?
As an aside, I think that the possibility of “work doesn’t matter” is typically way more important then “work was net bad,” at least once you are making a serious effort to do something good rather than bad for the world (I agree that for the “average” project in the world the negative impacts are actually pretty large relative to the positive impacts).
EAs/rationalists often focus on the chance of a big downside clawing back value. I think that makes sense to think seriously about, and sometimes it’s a big deal, but most of the time the quantitative estimates just don’t seem to add up at all to me and I think people are making a huge quantitative error. I’m not sure exactly where we disagree, I think a lot of it is just that I’m way more skeptical about the ability to incidentally change the world a huge amount—I think that changing the world a lot usually just takes quite a bit of effort.
I guess in some sense I agree that the downside is big for normal butterfly-effect-y reasons (probably 50% of well-intentioned actions make the world worse ex post), so it’s also possible that I’m just answering this question in a slightly different way.
My big caveat is that I think the numbers typically come out different (and the prior presumption can be different) when you are trying to e.g. grab political power or influence, or doing something that undermines other people’s plans / is deliberately designed to hurt them. I don’t think these are the main times EAs end up worrying about this though, and of course in particular my research isn’t really trying to fight anyone or grab power.)
I guess I feel like we’re in a domain where some people were like “we have concretely-specifiable tasks, intelligence is good, what if we figured how to create artificial intelligence to do those tasks”, which is the sort of thing that someone trying to do good for the world would do, but had some serious chance of being very bad for the world. So in that domain, it seems to me that we should keep our eyes out for things that might be really bad for the world, because all the things in that domain are kind of similar.
That being said, I agree that the possibility that the work doesn’t matter is more important once you’re making a thoughtful effort to do good. But I see much more effort and thought into trying to address that part, such that the occasional nudge to consider negative impacts seems appropriate to me.
I think it’s good to sometimes meditate on whether you are making the world worse (and get others’ advice), and I’d more often recommend it for crowds other than EA and certainly wouldn’t discourage people from doing it sometimes.
I’m sympathetic to arguments that you should be super paranoid in domains like biosecurity since it honestly does seem asymmetrically easier to make things worse rather than better. But when people talk about it in the context of e.g. AI or policy interventions or gathering better knowledge about the world that might also have some negative side-effects, I often feel like there’s little chance that predictable negative effects they are imagining loom large in the cost-benefit unless the whole thing is predictably pointless. Which isn’t a reason not to consider those effects, just a push-back against the conclusion (and a heuristic push-back against the state of affairs where people are paralyzed by the possibility of negative consequences based on kind of tentative arguments).
For advancing or deploying AI I generally have an attitude like “Even if actively trying to push the field forward full-time I’d be a small part of that effort, whereas I’m a much larger fraction of the stuff-that-we-would-be-sad-about-not-happening-if-the-field-went-faster, and I’m not trying to push the field forward,” so while I’m on board with being particularly attentive to harms if you’re in a field you think can easily cause massive harms, in this case I feel pretty comfortable about the expected cost-benefit unless alignment work isn’t really helping much (in which case I have more important reasons not to work on it). I would feel differently about this if pushing AI faster was net bad on e.g. some common-sense perspective on which alignment was not very helpful, but I feel like I’ve engaged enough with those perspectives to be mostly not having it.
“Even if actively trying to push the field forward full-time I’d be a small part of that effort”
I think conditioning on something like ‘we’re broadly correct about AI safety’ implies ‘we’re right about some important things about how AI development will go that the rest of the ML community is surprisingly wrong about’. In that world we’re maybe able to contribute as much as a much larger fraction of the field, due to being correct about some things that everyone else is wrong about.
I think your overall point still stands, but it does seem like you sometimes overestimate how obvious things are to the rest of the ML community
Some plausible and non-exhaustive options, in roughly descending order of plausibility:
I crowd out other people who would have done a better job of working on alignment (either by being better or just by being more). People feel like in order to be taken seriously they have to engage with Paul’s writing and ideas and that’s annoying. Or the space seems like a confused mess with sloppy standards in part because of my influence. Or more charitably maybe they are more likely to feel like it’s “under control.” Or maybe I claim ideas and make it harder for others to get credit even if they would have developed the ideas further or better (or even end up stealing the credit for others’ ideas and disincentivizing them from entering the field).
I convincingly or at least socially-forcefully argue for conclusions that turn out to be wrong (and maybe I should have understood as wrong) and so everyone ends up wronger and makes mistakes that have a negative effect. I mean ex post I think this kind of thing is pretty likely in some important cases (if I’m 80-20 and convince people to update in my favor I still think there’s a 20% chance that I pushed people in the wrong direction and across many issues this is definitely going to happen)
I contribute to social cover for irresponsible projects that want to pretend they are contributing to alignment, making it harder for the world to coordinate to block such projects.
I convince people to be less worried about alignment and therefore undermine investment in alignment.
What I describe as “alignment” actually significantly hastens the arrival of catastrophically risky AI—either because these techniques are needed even to build any AI systems that have a big impact on the world, or because they hold out promise of letting the developer actually benefit of AI and so incentivize more development or deployment.