My intuition also tells me that the distinction might just lack the necessary robustness. I do wonder if Buck’s intuition is different. In any case, it would be very interesting to know his opinion.
Satron
I also wonder what Buck thinks about CoTs becoming less transparent (especially in light of recent o1 developments).
Great post, very clearly written. Going to share it in my spaces.
Sure, it sounds like a good idea! Below I will write my thoughts on your overall summarized position.
———
“I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so.”
I do think that I could maybe agree with this if it was 1 small corporation. In your previous comment you suggested that you are describing not the intentional contribution to the omnicide, but the bit of rationalization. I don’t think I would agree that that many people working on AI are successfully engaged in that bit of rationalization or that it would be enough to keep them doing it. The big factor is that in case of their failure, they personally (and all of their loved ones) will suffer the consequences.
“It is also not surprising that glory-seeking companies have large departments focused on ‘ethics’ and ‘safety’ in order to look respectable to such people.”
I don’t disagree with this, because it seems plausible that one of the reasons for creating safety departments is ulterior. However, I believe that this reason is probably not the main one and that AI safety labs are making genuinely good research papers. To take an example of Anthropic, I’ve seen safety papers that got LessWrong community excited (at least judging by upvotes). Like this one.
“I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not… I believe that the people involved are getting rich risking all of our lives and there is (currently) no justice here”
For the reasons, that I mentioned in my first paragraph I would probably disagree with this. Relatedly, while I do think wealth in general can be somewhat motivating, I also think that AI developers are aware that all their wealth would mean nothing if AI kills everyone.
———
Overall, I am really happy with this discussion. Our disagreements came down to a few points and we agree on quite a bit of issues. I am similarly happy to conclude this big comment thread.
I don’t think it would be necessarily be easy to rationalize that vaccine that negatively affects 10% of the population and has no effect on 90% is good. It seems possible to rationalize that it is good for you (if you don’t care about other people), but quite hard to rationalize that it is good in general.
Given how few politicians die as the result of their policies (at least in the Western world), the increase in their chance of death does seem negligible (compared to something that presumably increases this risk by a lot). Most bad policies that I have in mind don’t seem to happen during human catastrophes (like pandemics).
However, I suspect that the main point of your last comment was that potential glory can, in principle, be one of the reasons for why people rationalize stuff, and if that is the case, then I can broadly agree with you!
Looking at the 2 examples that you gave me, I can see a few issues. I wouldn’t really say that saying “I don’t know” once is necessarily a lie. If anything, I could find such an answer somewhat more honest in some contexts. Other than that, there is also the issue of both of the examples being of a much different scope and scale. Saying IDK to the committee and trying to take down someone’s comment on the forum on the internet and definitely not on the same scale as the elaborate scheme of tricking or bribing/silencing multiple government employees who have access to your model. But even with all that aside, these 2 examples are only tangential to the topic of governmental oversight over OpenAI or Anthropic and don’t necessarily provide direct evidence.
I can believe that you genuinely have information from private sources, but without any way of me verifying them, I am fine to leave this one at that.
To modify my example to include an accountability mechanism that’s also similar to the real life, the King takes exactly the same vaccines as everyone else. So if he messed up with the chemicals, he also dies.
I believe similar accountability mechanism works in our real world case. If CEOs build unsafe AI, they and everyone they valued in this life die. This seems like a really good incentive for them to not build unsafe AI.
At the end of the day, voluntary commitment such as debating with the critics are not as strong in my option. Imagine that they agree with you and go to the debate. Without the incentive of “if I mess up, everyone dies”, the CEOs could just go right back to doing what they were doing. As far as I know voluntary debates have few (if any) actual legal mechanisms to hold CEOs accountable.
[Question] What (if anything) made your p(doom) go down in 2024?
“If a model trained on synthetic data is expected to have good performance out of distribution (on real-world problems) then I think that it would also be expected to have high performance at assessing whether it’s in a simulation.”
Noosphere89, you have marked this sentence with a “disagree” emoji. Would you mind expanding on that? I think it is a pretty important point and I’d love to see why you disagree with Ben.
What about the following red team strategy:
Give an honest assessment of the suspicion level, unless there is a very well hidden backdoor, then give a low score. Also, only create backdoors if it is possible to hide them well.
Wouldn’t this defeat the self-checking strategy?
I haven’t heard of any such corrupt deals with OpenAI or Anthropic concerning governmental oversight over AI technology on the scale that would make me worried. Do you have any links to articles about government employees (who are responsible for oversight) recently signing secret contracts with OpenAI or Anthropic that would prohibit them from giving real feedback on a big enough scale to make it concerning?
I will try to provide another similar analogy. Let’s say that a King got tired of his people dying from diseases, so he decided to try a novel method of vaccination.
However, some people were really concerned about that. As far as they were concerned, the default outcome of injecting viruses into the bodies of people is death. And the King wants to vaccinate everyone, so these people create a council of great and worthy thinkers who after thinking for a while come up with a list of reasons why vaccines are going to doom everyone.
However, some other great and worthy thinkers (let’s call them “hopefuls”) come to the council and give reasons to think that aforementioned reasons are mistaken. Maybe they have done their own research, which seems to vindicate King’s plan or at least undermine council’s arguments.
And now imagine that King comes down from the castle points his finger at hopefuls’ giving arguments for why the arguments proposed by the council is wrong and says “yeah, basically this” and then turns around and goes back to the castle. To me it seems like King’s official endorsement of the arguments proposed by hopefuls doesn’t really change the ethicality of the situation, as long as King is acting according to hopefuls’ plan.
Furthermore, imagine if the one of the hopefuls who come to argue with the council was actually an undercover King. And he gave exactly the same arguments as people before him. This still IMO doesn’t change the ethicality of the situation.
Then we are actually broadly in agreement. I just think that instead of CEOs responding to the public, having anyone at their side (the side of AI alignment being possible) responding is enough. Just as an example that I came up with, if a critic says that some detail is a reason for why AI will be dangerous, I do agree that someone needs to respond to the argument. But I would be fine with it being someone other than the CEO.
That’s why I am relatively optimistic about Anthropic hiring the guy who has been engaged with critic’s argument for years.
I similarly don’t see the need for any official endorsement of the arguments. For example if a critic says that such and such technicality will prevent us from building safe AI and someone responds that here are the reasons for why such and such technicality will not prevent us from building safe AI (maybe this particular one is unlikely by default for various reasons), then such and such technicality will just not prevent us from building safe AI. I don’t see a need for a CEO to officially endorse the response.
There is a different type of technicalities which you actively need to work against. But even in this case, as long as someone has found a way to combat them, as long as relevant people in your company are aware of the solution, it is fine by me.
Even if there are technicalities that can’t be solved in principle, they should be evaluated by technical people and discussed by the same technical people (like they are on Less wrong for example).
I am definitely not saying that I can pinpoint an exact solution to AI alignment, but there have been attempts so promising that leading skeptics (like Yudkowski) have said “Not obviously stupid on a very quick skim. I will have to actually read it to figure out where it’s stupid. (I rarely give any review this positive on a first skim. Congrats.)”
Whether companies actually follow promising alignment techniques is an entirely different question. But having CEOs officially endorse such solutions as opposed to relevant specialists evaluating and implementing them doesn’t seem strictly necessary to me.
Having some shady deals in the past isn’t evidence that there are currently shady deals on the scale that we are talking about going on between government committees and AI companies.
If there is no evidence for that happening in our particular case (on the necessary scale), then I don’t see why I can’t make a similar claim about other auditors who similarly had less than ideal history.
I think then we just fundamentally disagree with the ethical role of CEO in the company. I believe that it is to find and gather people who are engaged with the arguments of the critic’s (like that guy from this forum who was hired by Anthropic). If you have people on your side who are able to engage with the arguments, then this is good enough for me. I don’t see the role of CEO is publicly engaging with critic’s arguments even in the moral sense. In the moral sense, my requirements would actually be even lesser. IMO, it would be enough just to have people broadly on your side (optimists for example) to engage with the critics.
I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with. I currently haven’t seen any evidence of widespread corruption in that particular department.
And it’s not like I am being unfair. You are basically saying that because some people might’ve had some secret agreements we should take their supposed trustworthiness as external auditors with a grain of salt. I can make a similar argument about a lot of external auditors.
I do disagree with your proposed standard. It is good enough for me, that the critic’s argument are engaged by someone on your side. Going there personally seems unnecessary. After all, if the goal is to build safe AI, you personally knowing a niche technical solution isn’t necessary, if you have people on your team who are aware of publicly produced solutions as well as internal ones.
And there is engagement with people like Yudkowski from the optimistic side. There are at least proposed solutions to problems he is presenting. Just because Sam Altman personally didn’t invent or voice them doesn’t really make him unethical (I accept that he personally may be unethical for a different reason).
The founders of the companies accept money for the same reason any other business accepts money. You can build something genuinely good for humanity while making yourself richer at the same time. This has happened many times already (Apple phones for example).
I concede that the founders of the companies didn’t personally publicly engage with the arguments of people like Yudkowski, but that’s a really high bar. Historically, CEOs aren’t usually known for getting into technical debates. For that reason, they create security councils that monitor the perceived threats.
And it’s not like there was no response whatsoever from people who are optimistic about AI. There was plenty of them. I am a believer that arguments matter and not people who make them. If a successful argument has been made, then I don’t see a need for CEOs to repeat it. And I certainly don’t think that just because CEOs don’t go to debates, that makes them unethical.
Given the lack of response, should I assume the answer is “no”?