The consensus among alignment researchers is that if AGI were developed right now it would be almost certainly a negative.
Is that true? I thought that I had read Yudkowsky estimating that the probability of an AGI being unfriendly was 30% and that he was working to bring that 30% to 0%. If alignment researchers are convinced that this is more like 90+%, I agree that the argument becomes much more convincing.
I agree that these two questions are the cruxes in our positions.
I thought that I had read Yudkowsky estimating that the probability of an AGI being unfriendly was 30% and that he was working to bring that 30% to 0%.
Also, look at his bet with Bryan Caplan. He’s not joking.
And, also, Jesus, Everyone! Gradient Descent, is just, like, a deadly architecture. When I think about current architectures, they make Azathoth look smart and cuddly. There’s nothing friendly in there, even if we can get cool stuff out right now.
I don’t even know anymore what it is like to not see it this way. Does anyone have a good defense that current ML techniques can be stopped from having a deadly range of action?
Probably not; Eliezer addressed this in Q6 of the post, and while it’s a little ambiguous, I think Eliezer’s interactions with people who overwhelmingly took it seriously basically prove that it was serious; see in particular this interaction.
(But can we not downvote everyone into oblivion just for drawing the obvious conclusion without checking?)
Is that true? I thought that I had read Yudkowsky estimating that the probability of an AGI being unfriendly was 30% and that he was working to bring that 30% to 0%. If alignment researchers are convinced that this is more like 90+%, I agree that the argument becomes much more convincing.
I am not sure if he’s given another number explicitly, but I’m almost positive that Yudkowsky does not believe that. The probability that an AGI will be end up being aligned “by default” is epsilon. Maybe he said at one point that there was a 30% chance that AGI will be what destroys the world if it’s developed, given alignment efforts, but that doesn’t sound to me like him either.
You should read the most recent post he made on the subject; it’s extraordinarily pessimistic about our future. He mentions multiple times that he thinks the probability of success here need to be measured in log-odds. He very sarcastically uses april fools at the end as a sort of ambiguity shield, but I don’t think anybody believes he isn’t being serious.
I’m not convinced that the odds mentioned in that post are meant to be taken literally, given it being an April Fools post, as opposed to just metaphorically and pointed in a direction.
He does also mention in that post that in the past he thought the odds were 50%, so perhaps I’m just remembering an old post from sometime between the 50% days and the epsilon days.
The most optimistic view I’ve heard recently is Vanessa Kosoy claiming 30% chance of pulling it off. Not sure where consensus would be, but I read MIRI as ‘almost certain doom’. And I can’t speak for Eliezer, but if he ever thought that there was ever any hope that AGI might be aligned ‘by chance’, that thought is well concealed in everything he’s written for the last 15 years.
What he did once think was that it might be possible, with heroic effort, to solve the alignment problem.
There is no reason why my personal opinion should matter to you, but it is: “We are fucked beyond hope. There is no way out. The only question is when.”
I don’t know what his earliest writing may have said, but his writing in the past few years has definitely not assigned anywhere near as high a probability as 70% to friendly AI.
Even if he had, and it was true, do you think a 30% chance of killing every human in existence (and possibly all life in the future universe) is in any way a sane risk to take? Is it even sane at 1%?
I personally don’t think advancing a course of action that has even an estimated 1% chance of permanent extinction is sane. While I have been interested in artificial intelligence for decades and even started my PhD study in the field, I left it long ago and have quite deliberately not attempted to advance it in any way. If I could plausibly hinder further research, I would.
Even alignment research seems akin to theorizing a complicated way of poking a sleeping dragon-god prophesied to eat the world, in such a manner that it will wake up friendly instead. Rather than just not poking it at all and making sure that nobody else does either, regardless of how tempting the wealth in its hoard might be.
Even many of the comparatively good outcomes in which superintelligent AI faithfully serves human goals seem likely to be terrible in practice.
It’s worth it to poke the dragon with a stick if you have only a 28% chance of making it destroy the world while the person who’s planning to poke it tomorrow has a 30% chance. If we can prevent those people in a different way then great, but I’m not convinced that we can.
It doesn’t help at all in the case where the research you’re doing makes it significantly more likely that they will be equipped with stronger sticks and have greater confidence in poking the dragon tomorrow.
Is that true? I thought that I had read Yudkowsky estimating that the probability of an AGI being unfriendly was 30% and that he was working to bring that 30% to 0%. If alignment researchers are convinced that this is more like 90+%, I agree that the argument becomes much more convincing.
I agree that these two questions are the cruxes in our positions.
That’s not Yudkowsky’s current position. https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy describes the current view and in the comments, you see the views of other people at MIRI.
Yudkoskwy is at 99+% that AGI right now would kill humanity.
April Fools!
Also, look at his bet with Bryan Caplan. He’s not joking.
And, also, Jesus, Everyone! Gradient Descent, is just, like, a deadly architecture. When I think about current architectures, they make Azathoth look smart and cuddly. There’s nothing friendly in there, even if we can get cool stuff out right now.
I don’t even know anymore what it is like to not see it this way. Does anyone have a good defense that current ML techniques can be stopped from having a deadly range of action?
Probably not; Eliezer addressed this in Q6 of the post, and while it’s a little ambiguous, I think Eliezer’s interactions with people who overwhelmingly took it seriously basically prove that it was serious; see in particular this interaction.
(But can we not downvote everyone into oblivion just for drawing the obvious conclusion without checking?)
I first heard Eliezer describe “dying with dignity” as a strategy in October 2021. I’m pretty sure he really means it.
I am not sure if he’s given another number explicitly, but I’m almost positive that Yudkowsky does not believe that. The probability that an AGI will be end up being aligned “by default” is epsilon. Maybe he said at one point that there was a 30% chance that AGI will be what destroys the world if it’s developed, given alignment efforts, but that doesn’t sound to me like him either.
You should read the most recent post he made on the subject; it’s extraordinarily pessimistic about our future. He mentions multiple times that he thinks the probability of success here need to be measured in log-odds. He very sarcastically uses april fools at the end as a sort of ambiguity shield, but I don’t think anybody believes he isn’t being serious.
I’m not convinced that the odds mentioned in that post are meant to be taken literally, given it being an April Fools post, as opposed to just metaphorically and pointed in a direction.
He does also mention in that post that in the past he thought the odds were 50%, so perhaps I’m just remembering an old post from sometime between the 50% days and the epsilon days.
The most optimistic view I’ve heard recently is Vanessa Kosoy claiming 30% chance of pulling it off. Not sure where consensus would be, but I read MIRI as ‘almost certain doom’. And I can’t speak for Eliezer, but if he ever thought that there was ever any hope that AGI might be aligned ‘by chance’, that thought is well concealed in everything he’s written for the last 15 years.
What he did once think was that it might be possible, with heroic effort, to solve the alignment problem.
There is no reason why my personal opinion should matter to you, but it is: “We are fucked beyond hope. There is no way out. The only question is when.”
I don’t know what his earliest writing may have said, but his writing in the past few years has definitely not assigned anywhere near as high a probability as 70% to friendly AI.
Even if he had, and it was true, do you think a 30% chance of killing every human in existence (and possibly all life in the future universe) is in any way a sane risk to take? Is it even sane at 1%?
I personally don’t think advancing a course of action that has even an estimated 1% chance of permanent extinction is sane. While I have been interested in artificial intelligence for decades and even started my PhD study in the field, I left it long ago and have quite deliberately not attempted to advance it in any way. If I could plausibly hinder further research, I would.
Even alignment research seems akin to theorizing a complicated way of poking a sleeping dragon-god prophesied to eat the world, in such a manner that it will wake up friendly instead. Rather than just not poking it at all and making sure that nobody else does either, regardless of how tempting the wealth in its hoard might be.
Even many of the comparatively good outcomes in which superintelligent AI faithfully serves human goals seem likely to be terrible in practice.
It’s worth it to poke the dragon with a stick if you have only a 28% chance of making it destroy the world while the person who’s planning to poke it tomorrow has a 30% chance. If we can prevent those people in a different way then great, but I’m not convinced that we can.
It doesn’t help at all in the case where the research you’re doing makes it significantly more likely that they will be equipped with stronger sticks and have greater confidence in poking the dragon tomorrow.