Ben Pace comments on Sabotage Evaluations for Frontier Models

Ben Pace 16 Nov 2024 5:14 UTC
4 points
0
I think the point about them not engaging with critics is also a bit too harsh. Here is DeepMind’s alignment team response to concerns raised by Yudkowski. I am not saying that their response is flawless or even correct, but it is a response nonetheless. They are engaging with this work. DeepMind’s alignment team also seemed to engage with concerns raised by critics in their (relatively) recent work.
I don’t disagree that it is good of the DeepMind alignment team to engage with arguments on LessWrong. I don’t know that a few researchers at an org engaging with these arguments is meeting the basic standard here. The first post explicitly says it doesn’t represent the leadership, and my sense is that the leadership have avoided engaging publicly with critics on that subject, and that the people involved do not have the political power to push for the leadership to engage in open debate.
That said I do concede the point that DeepMind has generally been more cautious than OpenAI and Anthropic, and never created the race to building potential omnicidal machines (in that they were first – it was OpenAI and Anthropic who added major competitors).