Some plausible and non-exhaustive options, in roughly descending order of plausibility:
I crowd out other people who would have done a better job of working on alignment (either by being better or just by being more). People feel like in order to be taken seriously they have to engage with Paul’s writing and ideas and that’s annoying. Or the space seems like a confused mess with sloppy standards in part because of my influence. Or more charitably maybe they are more likely to feel like it’s “under control.” Or maybe I claim ideas and make it harder for others to get credit even if they would have developed the ideas further or better (or even end up stealing the credit for others’ ideas and disincentivizing them from entering the field).
I convincingly or at least socially-forcefully argue for conclusions that turn out to be wrong (and maybe I should have understood as wrong) and so everyone ends up wronger and makes mistakes that have a negative effect. I mean ex post I think this kind of thing is pretty likely in some important cases (if I’m 80-20 and convince people to update in my favor I still think there’s a 20% chance that I pushed people in the wrong direction and across many issues this is definitely going to happen)
I contribute to social cover for irresponsible projects that want to pretend they are contributing to alignment, making it harder for the world to coordinate to block such projects.
I convince people to be less worried about alignment and therefore undermine investment in alignment.
What I describe as “alignment” actually significantly hastens the arrival of catastrophically risky AI—either because these techniques are needed even to build any AI systems that have a big impact on the world, or because they hold out promise of letting the developer actually benefit of AI and so incentivize more development or deployment.
Some plausible and non-exhaustive options, in roughly descending order of plausibility:
I crowd out other people who would have done a better job of working on alignment (either by being better or just by being more). People feel like in order to be taken seriously they have to engage with Paul’s writing and ideas and that’s annoying. Or the space seems like a confused mess with sloppy standards in part because of my influence. Or more charitably maybe they are more likely to feel like it’s “under control.” Or maybe I claim ideas and make it harder for others to get credit even if they would have developed the ideas further or better (or even end up stealing the credit for others’ ideas and disincentivizing them from entering the field).
I convincingly or at least socially-forcefully argue for conclusions that turn out to be wrong (and maybe I should have understood as wrong) and so everyone ends up wronger and makes mistakes that have a negative effect. I mean ex post I think this kind of thing is pretty likely in some important cases (if I’m 80-20 and convince people to update in my favor I still think there’s a 20% chance that I pushed people in the wrong direction and across many issues this is definitely going to happen)
I contribute to social cover for irresponsible projects that want to pretend they are contributing to alignment, making it harder for the world to coordinate to block such projects.
I convince people to be less worried about alignment and therefore undermine investment in alignment.
What I describe as “alignment” actually significantly hastens the arrival of catastrophically risky AI—either because these techniques are needed even to build any AI systems that have a big impact on the world, or because they hold out promise of letting the developer actually benefit of AI and so incentivize more development or deployment.