Could you explain your model here of how outreach to typical employees becomes net negative?
The path of: [low level OpenAI employees think better about x-risk → improved general OpenAI reasoning around x-risk → improved decisions] seems high EV to me.
I think the obvious way this becomes net negative is if the first (unstated) step in the causal chain is actually false: [People who don’t have any good ideas for making progress on alignment try to ‘buy time’ by pitching people who work at big ML labs on AI x-risk → low level OpenAI employees think better about x-risk]
A concern of mine, especially when ideas about this kind of untargeted outreach are framed as “this is the thing to do if you can’t make technical progress”, is that [low level OpenAI employees think better about x-risk] will often instead be something like [low level employees’ suspicion that the “AI doomer crowd” doesn’t really know what it’s talking about is reinforced], or [low level employee now thinks worse about x-risk].
I think the obvious way this becomes net negative is if the first (unstated) step in the causal chain is actually false:
[People who don’t have any good ideas for making progress on alignment try to ‘buy time’ by pitching people who work at big ML labs on AI x-risk → low level OpenAI employees think better about x-risk]
A concern of mine, especially when ideas about this kind of untargeted outreach are framed as “this is the thing to do if you can’t make technical progress”, is that [low level OpenAI employees think better about x-risk] will often instead be something like [low level employees’ suspicion that the “AI doomer crowd” doesn’t really know what it’s talking about is reinforced], or [low level employee now thinks worse about x-risk].