Not that you said otherwise, but just to be clear: it is not the case that most capabilities researchers at DeepMind or OpenAI have similar beliefs as people at EleutherAI (that alignment is very important to work on). I would not expect it to go well if you said “it seems like you guys are speeding up the deaths of everyone on the planet” at DeepMind.
Obviously there are other possible strategies; I don’t mean to say that nothing like this could ever work.
Completely understood here. It’d be different for OpenAI, even more different for DeepMind. We’d have to tailor outreach. But I would like to try experimentation.
An institution could do A/B testing on interventions like these. It can talk to people more than once.
We can’t take this for granted: when A tells B that B’s views are inconsistent, the standard response (afaict) is for B to default in one direction (and which direction is often heavily influenced by their status quo), make that direction their consistent view, and then double down every time they’re pressed.
It’s possible that we have ~1 shot per person at convincing them.
it seems like you guys are speeding up the deaths of everyone on the planet
I’ve found over the years that people only ever get really angry at you for saying curious things if they think those ideas might be true. Maybe we’re already half-way there at DeepMind!
Not that you said otherwise, but just to be clear: it is not the case that most capabilities researchers at DeepMind or OpenAI have similar beliefs as people at EleutherAI (that alignment is very important to work on). I would not expect it to go well if you said “it seems like you guys are speeding up the deaths of everyone on the planet” at DeepMind.
Obviously there are other possible strategies; I don’t mean to say that nothing like this could ever work.
Completely understood here. It’d be different for OpenAI, even more different for DeepMind. We’d have to tailor outreach. But I would like to try experimentation.
We can’t take this for granted: when A tells B that B’s views are inconsistent, the standard response (afaict) is for B to default in one direction (and which direction is often heavily influenced by their status quo), make that direction their consistent view, and then double down every time they’re pressed.
It’s possible that we have ~1 shot per person at convincing them.
I’ve found over the years that people only ever get really angry at you for saying curious things if they think those ideas might be true. Maybe we’re already half-way there at DeepMind!