Satron comments on Sabotage Evaluations for Frontier Models

Satron 17 Nov 2024 8:36 UTC
1 point
0
To modify my example to include an accountability mechanism that’s also similar to the real life, the King takes exactly the same vaccines as everyone else. So if he messed up with the chemicals, he also dies.

I believe similar accountability mechanism works in our real world case. If CEOs build unsafe AI, they and everyone they valued in this life die. This seems like a really good incentive for them to not build unsafe AI.

At the end of the day, voluntary commitment such as debating with the critics are not as strong in my option. Imagine that they agree with you and go to the debate. Without the incentive of “if I mess up, everyone dies”, the CEOs could just go right back to doing what they were doing. As far as I know voluntary debates have few (if any) actual legal mechanisms to hold CEOs accountable.
- Ben Pace 17 Nov 2024 18:45 UTC
  6 points
  4
  Parent
  If a medicine literally kills everyone who takes it within a week of taking it, sure, it will not get widespread adoption amongst thousands of people.
  
  If the medicine has bad side effects for 1 in 10 people and no upsides, or it only kills people 10 years later, and at the same time there is some great financial deal the ruler can make for himself in accepting this trade with the neighboring nation who is offering the vaccines, then yes I think that could easily be enough pressure for a human being to rationalize that actually the vaccines are good.
  
  The relevant question is rarely ‘how high stakes is the decision’. The question is what is in need of rationalizing, how hard is it to support the story, and how strong are the pressures on the person to do that. Typically when the stakes are higher, the pressures on people to rationalize are higher, not lower.
  
  Politicians often enact policies that make the world worse for everyone (including themselves) while thinking they’re doing their job well, due to the various pressures and forces on them. The fact that it arguably increases their personal chance of death isn’t going to stop them, especially when they can easily rationalize it away because it’s abstract. In recent years politicians in many countries enacted terrible policies during a pandemic that extended the length of the pandemic (there were no challenge trials, there was inefficient handing out of vaccines, there were false claims about how long the pandemic would last, there were false claims about mask effectiveness, there were forced lockdown policies that made no sense, etc). These policies hurt people and messed up the lives of ~everyone in the country I live in (the US), which includes the politicians who enacted them and all of their families and friends. Yet this was not remotely sufficient to cause them to snap out of it.
  
  What is needed to rationalize AI development when the default outcome is doom? Here’s a brief attempt:
  - A lot of people who write about AI are focused on current AI capabilities and have a hard time speculating about future AI capabilities. Talk with these people. This helps you keep the downsides in far mode and the upsides in near mode (which helps because current AI capabilities are ~all upside, and pose no existential threat to civilization). The downsides can be pushed further into far mode with phrases like ‘sci-fi’ and ‘unrealistic’.
  - Avoid arguing with people or talking with people who have thought a great deal about this and believe the default outcome is doom and the work should be stopped (e.g. Hinton, Bengio, Russell, Yudkowsky, etc). This would put pressure on you to keep those perspectives alive while making decisions, which would cause you to consider quitting.
  - Instead of acknowledging that we don’t fundamentally don’t know what we’re doing, instead focus on the idea that other people are going to plough ahead. Then you can say that you have a better chance than them, rather than admitting neither of you have a good chance.
  This puts you into a mental world where you’re basically doing a good thing and you’re not personally responsible for much of the extinction-level outcomes.
  
  Intentionally contributing to omnicide is not what I am describing. I am describing a bit of rationalization in order to receive immense glory, and that leading to omnicide-level outcomes 5-15 years down the line. This sort of rationalizing why you should take power and glory is frequent and natural amongst humans.
  - Satron 17 Nov 2024 20:57 UTC
    2 points
    −1
    Parent
    I don’t think it would be necessarily be easy to rationalize that vaccine that negatively affects 10% of the population and has no effect on 90% is good. It seems possible to rationalize that it is good for you (if you don’t care about other people), but quite hard to rationalize that it is good in general.
    Given how few politicians die as the result of their policies (at least in the Western world), the increase in their chance of death does seem negligible (compared to something that presumably increases this risk by a lot). Most bad policies that I have in mind don’t seem to happen during human catastrophes (like pandemics).
    However, I suspect that the main point of your last comment was that potential glory can, in principle, be one of the reasons for why people rationalize stuff, and if that is the case, then I can broadly agree with you!
    - Ben Pace 17 Nov 2024 21:37 UTC
      5 points
      0
      Parent
      I have found it fruitful to argue this case back and forth with you, thank you for explaining and defending your perspective.
      I will restate my overall position, I invite you to do the same, and then I propose that we consider this 40+ comment thread concluded for now.
      ———
      The comment of yours that (to me) started this thread was the following.
      If the default path is AI’s taking over control from humans, then what is the current plan in leading AI labs? Surely all the work they put in AI safety is done to prevent exactly such scenarios. I would find it quite hard to believe that a large group of people would vigorously do something if they believed that their efforts will go to vain.
      I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so. One should not assume that the incentives are aligned – anyone who is risking omnicide-level outcomes via investing in novel tech development currently faces no criminal penalties, fines, or international sanctions.
      Given the current intellectual elite scene where a substantial number of prestigious people care about extinction level outcomes, it is also not surprising that glory-seeking companies have large departments focused on ‘ethics’ and ‘safety’ in order to look respectable to such people. Separately from any intrinsic interest, it has been a useful political chip for enticing a great deal of talent from scientific communities and communities interested in ethics to work for them (not dissimilar to how Sam Bankman-Fried managed to cause a lot of card-carrying members of the Effective Altruist philosophy and scene to work very hard to help build his criminal empire by talking a good game about utilitarianism, veganism, and the rest).
      Looking at a given company’s plan for preventing doom, and noticing it does not check out, should not be followed by an assumption of adequacy and good incentives such that surely this company would not exist nor do work on AI safety if it did not have a strong plan, I must be mistaken. I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not. Given the lack of accountability, and my belief that alignment is clearly unsolved and we fundamentally do not knowing what we’re doing, I believe the people involved are getting rich risking all of our lives and there is (currently) no justice here.
      We have agreed on many points, and from the outset I believe you felt my position had some truth to it (e.g. “I do get that point that you are making, but I think this is a little bit unfair to these organizations.”). I will leave you to outline whichever overall thoughts and remaining points of agreement or disagreement that you wish.
      - Satron 18 Nov 2024 9:49 UTC
        3 points
        0
        Parent
        Sure, it sounds like a good idea! Below I will write my thoughts on your overall summarized position.
        
        ———
        
        “I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so.”
        
        I do think that I could maybe agree with this if it was 1 small corporation. In your previous comment you suggested that you are describing not the intentional contribution to the omnicide, but the bit of rationalization. I don’t think I would agree that that many people working on AI are successfully engaged in that bit of rationalization or that it would be enough to keep them doing it. The big factor is that in case of their failure, they personally (and all of their loved ones) will suffer the consequences.
        
        “It is also not surprising that glory-seeking companies have large departments focused on ‘ethics’ and ‘safety’ in order to look respectable to such people.”
        
        I don’t disagree with this, because it seems plausible that one of the reasons for creating safety departments is ulterior. However, I believe that this reason is probably not the main one and that AI safety labs are making genuinely good research papers. To take an example of Anthropic, I’ve seen safety papers that got LessWrong community excited (at least judging by upvotes). Like this one.
        
        “I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not… I believe that the people involved are getting rich risking all of our lives and there is (currently) no justice here”
        
        For the reasons, that I mentioned in my first paragraph I would probably disagree with this. Relatedly, while I do think wealth in general can be somewhat motivating, I also think that AI developers are aware that all their wealth would mean nothing if AI kills everyone.
        
        ———
        
        Overall, I am really happy with this discussion. Our disagreements came down to a few points and we agree on quite a bit of issues. I am similarly happy to conclude this big comment thread.