I have no doubt at all — and of course he would disagree — that Yudkowsky would do the same in Sam’s situation. What Yudkowsky would miss, in his potential disagreement, is the way in which participating in OpenAI over eight years, as leader, shapes your sense of control over the situation.
Maybe, maybe not. It is difficult to predict people’s reactions to hypothetical situations. Yes, strong temptations exist, but they are not as strict as the laws of physics; there are people out there doing all kinds of things that most people wouldn’t do.
And even where the predictive power of human behavior is quite strong, it seems to me that the usual pattern is not “virtually all people would do X in a situation Y” but rather “the kind of people who wouldn’t do X are very unlikely to end up in the situation Y”, either because they don’t want to, or because actions very similar to X are already required halfway towards the situation Y.
Imagine if you were personally in charge of the Manhattan project, most of your friends had turned to foes, and they were all off trying to complete the project before you did, on the same national soil as you.
I think the analogy would be even better if it hypothetically turned out that building a nuke is actually surprisingly simple once you understand the trick how to do it, so your former friends are leaving to start their own private nuclear armies, because it is obviously way more profitable than getting a salary as a scientist.
He needs to believe that AI will just become tool AI — the safety problem isn’t real — and that humanity will just immediately solve all of its own problems with it, rather harmoniously.
Very speculatively, he might be sacrificing the Tegmark universes that he already considers doomed anyway, in return for having more power in the ones where it turns out that there is a reason for the AI to remain a tool.
(That of course would be a behavior comparable to playing quantum-suicide Russian roulette.)
Okay, this was quite interesting!
Maybe, maybe not. It is difficult to predict people’s reactions to hypothetical situations. Yes, strong temptations exist, but they are not as strict as the laws of physics; there are people out there doing all kinds of things that most people wouldn’t do.
And even where the predictive power of human behavior is quite strong, it seems to me that the usual pattern is not “virtually all people would do X in a situation Y” but rather “the kind of people who wouldn’t do X are very unlikely to end up in the situation Y”, either because they don’t want to, or because actions very similar to X are already required halfway towards the situation Y.
I think the analogy would be even better if it hypothetically turned out that building a nuke is actually surprisingly simple once you understand the trick how to do it, so your former friends are leaving to start their own private nuclear armies, because it is obviously way more profitable than getting a salary as a scientist.
Very speculatively, he might be sacrificing the Tegmark universes that he already considers doomed anyway, in return for having more power in the ones where it turns out that there is a reason for the AI to remain a tool.
(That of course would be a behavior comparable to playing quantum-suicide Russian roulette.)