Ben Pace comments on Sabotage Evaluations for Frontier Models

Ben Pace 17 Nov 2024 21:37 UTC
5 points
0
I have found it fruitful to argue this case back and forth with you, thank you for explaining and defending your perspective.
I will restate my overall position, I invite you to do the same, and then I propose that we consider this 40+ comment thread concluded for now.
———
The comment of yours that (to me) started this thread was the following.
If the default path is AI’s taking over control from humans, then what is the current plan in leading AI labs? Surely all the work they put in AI safety is done to prevent exactly such scenarios. I would find it quite hard to believe that a large group of people would vigorously do something if they believed that their efforts will go to vain.
I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so. One should not assume that the incentives are aligned – anyone who is risking omnicide-level outcomes via investing in novel tech development currently faces no criminal penalties, fines, or international sanctions.
Given the current intellectual elite scene where a substantial number of prestigious people care about extinction level outcomes, it is also not surprising that glory-seeking companies have large departments focused on ‘ethics’ and ‘safety’ in order to look respectable to such people. Separately from any intrinsic interest, it has been a useful political chip for enticing a great deal of talent from scientific communities and communities interested in ethics to work for them (not dissimilar to how Sam Bankman-Fried managed to cause a lot of card-carrying members of the Effective Altruist philosophy and scene to work very hard to help build his criminal empire by talking a good game about utilitarianism, veganism, and the rest).
Looking at a given company’s plan for preventing doom, and noticing it does not check out, should not be followed by an assumption of adequacy and good incentives such that surely this company would not exist nor do work on AI safety if it did not have a strong plan, I must be mistaken. I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not. Given the lack of accountability, and my belief that alignment is clearly unsolved and we fundamentally do not knowing what we’re doing, I believe the people involved are getting rich risking all of our lives and there is (currently) no justice here.
We have agreed on many points, and from the outset I believe you felt my position had some truth to it (e.g. “I do get that point that you are making, but I think this is a little bit unfair to these organizations.”). I will leave you to outline whichever overall thoughts and remaining points of agreement or disagreement that you wish.
- Satron 18 Nov 2024 9:49 UTC
  3 points
  0
  Parent
  Sure, it sounds like a good idea! Below I will write my thoughts on your overall summarized position.
  
  ———
  
  “I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so.”
  
  I do think that I could maybe agree with this if it was 1 small corporation. In your previous comment you suggested that you are describing not the intentional contribution to the omnicide, but the bit of rationalization. I don’t think I would agree that that many people working on AI are successfully engaged in that bit of rationalization or that it would be enough to keep them doing it. The big factor is that in case of their failure, they personally (and all of their loved ones) will suffer the consequences.
  
  “It is also not surprising that glory-seeking companies have large departments focused on ‘ethics’ and ‘safety’ in order to look respectable to such people.”
  
  I don’t disagree with this, because it seems plausible that one of the reasons for creating safety departments is ulterior. However, I believe that this reason is probably not the main one and that AI safety labs are making genuinely good research papers. To take an example of Anthropic, I’ve seen safety papers that got LessWrong community excited (at least judging by upvotes). Like this one.
  
  “I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not… I believe that the people involved are getting rich risking all of our lives and there is (currently) no justice here”
  
  For the reasons, that I mentioned in my first paragraph I would probably disagree with this. Relatedly, while I do think wealth in general can be somewhat motivating, I also think that AI developers are aware that all their wealth would mean nothing if AI kills everyone.
  
  ———
  
  Overall, I am really happy with this discussion. Our disagreements came down to a few points and we agree on quite a bit of issues. I am similarly happy to conclude this big comment thread.