Ben Pace comments on Sabotage Evaluations for Frontier Models

Ben Pace 16 Nov 2024 5:13 UTC
2 points
0
I don’t think that money alone would’ve convinced CEOs of big companies to run this enterprise. Altman and Amodei, they both have families. If they don’t care about their own families, then they at least care about themselves. After all, we are talking about scenarios where these guys would die the same deaths as the rest of us. No amounts of hoarded money would save them. They would have little motivation to do any of this if they believed that they would die as the result of their own actions. And that’s not mentioning all of the other researchers working at their labs. Just Anthropic and OpenAI together have almost 2000 employees. Do they all not care about their and their families’ well-being?
I’m not sure how quite to explain that I think a mass of people can do something that they each know on some level is the wrong thing and will hurt them later, but I believe it is common. I think partly it is a mistake to think of a mass of people as having the sum of the agency of all the people involved, or even the maximum.
I think it is easier than you do to simply not think about far away dangers that one can say one is not really responsible for. Does every trader involve in the ’08 financial crisis take personal responsibility for it? Does every voter for a politician who turns out to ultimately be corrupt take personal responsibility for it? Do all the tens of thousands of people involved in various genocides take personal responsibility for stopping it as soon as they see it coming? I think it is very easy for people to erect a cartesian boundary between themselves and the levers of power. People are often aware that they are doing the wrong thing. I broke my diet two days ago and regret it, and on some level I knew I’d end up regretting it. And the was a situation I had complete agency over. The more indirectness, the more things are in far-mode, the less people take action on it or feel like they can do anything based on it today.
I agree it is not money alone. These people get to work alongside some of the most innovative and competent people of our age, connect with extremely prestigious journalists and institutions, be invited to halls of power in senior parts of government, and build systems mankind has never seen. All further incentive to find a good rationalization (rather than to stay home and not do that).
- Satron 16 Nov 2024 6:20 UTC
  1 point
  0
  Parent
  I think that these examples are quite different. People who were voting for a president who ultimately turned out to be corrupt most likely predicted that the most likely scenario won’t be totally horrible for them personally. And most of the times they are right, corrupt presidents are bad for countries, but you could live happily under their rule. People would at least expect to retain whatever valuable stuff they already have.
  
  With AI the story is different. If a person thinks that the most likely scenario is the scenario where everyone dies, then they would not only expect to not win anything, but to lose everything they have. And media is actively talking about it, so it’s not like they are not at least aware of the possibility. Even if they don’t see them as personally responsible for everything, they sure don’t want to feel the consequence of everyone dying.
  
  This is not the case of someone doing something that they know is wrong but benefits them. If they truly believe that we are all going to die, then it’s doing something they know is wrong and actively hurts them. And allegedly hurts them much more than any scenario that has happened so far (since we are all still alive).
  
  So I believe that most people working on it, actually believe that we are going to be just fine. Whether this belief is warranted is another question.
  - Ben Pace 16 Nov 2024 6:32 UTC
    2 points
    0
    Parent
    Yes, it is very rare to see someone to unflinchingly kill themselves and their family. But all of this glory and power is sufficient to get someone to do a little self-deception, a little looking away from the unpleasant thoughts, and that is often sufficient to cause people to make catastrophic decisions.
    “Who can really say how the future will go? Think of all the good things that might happen! And I like all the people around me. I’m sure they’re working as hard as they can. Let’s keep building.”
    - Satron 16 Nov 2024 6:39 UTC
      1 point
      0
      Parent
      That depends on how bad things are perceived. They might be more optimistic than is warranted, but probably not by that much. I don’t believe that so many people could decrease their worries by 40% for example (just my opinion). Not to mention all of the government monitoring people who approve of the models.
      - Ben Pace 16 Nov 2024 6:57 UTC
        2 points
        0
        Parent
        What probability do you think the ~25,000 Enron employees had about it collapsing and being fraudulent? Do you think it was above 10%? As another case study, here’s a link to one of the biggest recent examples of people having numbers way off.
        Re: government, I will point out that non-zero of the people in these governmental departments who are on track to being the people approving those models had secret agreements with one of the leading companies to never criticize them, so I’d take some of their supposed reliability as external auditors with a grain of salt.
        Satron 16 Nov 2024 7:04 UTC
        1 point
        0
        Parent
        I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with. I currently haven’t seen any evidence of widespread corruption in that particular department.
        
        And it’s not like I am being unfair. You are basically saying that because some people might’ve had some secret agreements we should take their supposed trustworthiness as external auditors with a grain of salt. I can make a similar argument about a lot of external auditors.
        Ben Pace 16 Nov 2024 7:08 UTC
        2 points
        0
        Parent
        I don’t believe that last claim. I believe there is no industry where external auditors are known to have secret, legal contracts showing them to be liable for damages for criticizing the companies that they regulate. Or if there is, it’s probably in a nation rife with corruption (e.g. some African countries, or perhaps Russia).
        Satron 16 Nov 2024 7:21 UTC
        1 point
        0
        Parent
        Having some shady deals in the past isn’t evidence that there are currently shady deals on the scale that we are talking about going on between government committees and AI companies.
        
        If there is no evidence for that happening in our particular case (on the necessary scale), then I don’t see why I can’t make a similar claim about other auditors who similarly had less than ideal history.
        Ben Pace 16 Nov 2024 8:26 UTC
        4 points
        0
        Parent
        I’m having a hard time following this argument. To be clear, I’m saying that while certain people were in regulatory bodies in the US & UK govts, they actively had secret legal contracts to not criticize the leading industry player, else (prseumably) they could be sued for damages. This is not a past shady deals, this is about current people during their current tenure having been corrupted.
        Satron 16 Nov 2024 9:02 UTC
        3 points
        0
        Parent
        I haven’t heard of any such corrupt deals with OpenAI or Anthropic concerning governmental oversight over AI technology on the scale that would make me worried. Do you have any links to articles about government employees (who are responsible for oversight) recently signing secret contracts with OpenAI or Anthropic that would prohibit them from giving real feedback on a big enough scale to make it concerning?
        Ben Pace 16 Nov 2024 23:55 UTC
        4 points
        0
        Parent
        Unfortunately a fair chunk of my information comes from non-online sources, so I do not have links to share.
        I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with.
        Corruption is rarely blatant or overt. See this thread for what I believe to be an example for the CEO of RAND misleading a senate committee about his beliefs about the existential threat posed by AI. See this discussion about a time when an AI company attempted (Conjecture) to get critical comments about another AI company (OpenAI) taken down from LessWrong. I am not proposing a large conspiracy, I am describing lots of small bits of corruption and failures of integrity summing to a system failure.
        There will be millions of words of regulatory documents, and it is easy for things to slip such that some particular model class is not considered worth evaluating, or where the consequences of a failed evaluation is pretty weak.
        Expand this thread
        Satron 17 Nov 2024 8:50 UTC
        1 point
        0
        Parent
        Looking at the 2 examples that you gave me, I can see a few issues. I wouldn’t really say that saying “I don’t know” once is necessarily a lie. If anything, I could find such an answer somewhat more honest in some contexts. Other than that, there is also the issue of both of the examples being of a much different scope and scale. Saying IDK to the committee and trying to take down someone’s comment on the forum on the internet and definitely not on the same scale as the elaborate scheme of tricking or bribing/silencing multiple government employees who have access to your model. But even with all that aside, these 2 examples are only tangential to the topic of governmental oversight over OpenAI or Anthropic and don’t necessarily provide direct evidence.
        
        I can believe that you genuinely have information from private sources, but without any way of me verifying them, I am fine to leave this one at that.