I think Yudkowskian foom is just implausible, the arguments in favour of it aren’t particularly strong/make questionable assumptions and that the model of an intelligence explosion in IEM is bad.
I have like very strong objections to it and I’ve often mentioned them in online discussions (admittedly I haven’t written up a long form treatment of it that I endorse, but that’s due to generalised executive dysfunction).
It’s not at all the case that there are no counterarguments. I’ve engaged Yudkowsky’s arguments for foom, and I’ve found them lacking.
And there are like several people in the LW/AI safety community that find Yudkowskian foom unlikely/implausible? Some of them have debated him at length.
Conditioning on that particular scenario is just unwarranted. Like if I don’t expect it to be useful, I wouldn’t want to engage with it much, and I’ll be satisfied with others not engaging with it much. I think you just have to accept that people are legitimately unpersuaded here.
Like, I’m trying to do AI Safety research, and I don’t think Yudkowskian foom is a plausible scenario/I don’t think that’s what failure looks like, and I don’t expect trying to address it to be all that useful.
I endorse people who buy into Yudkowskian foom working on alleviating risk under those scenarios for epistemic pluralism reasons, but like if I personally think such work is a waste of time, then it doesn’t make sense for me to privilege it/conditions my strategy on that implausible (to me) scenario.
If I think that Sam is adopting an implausible and suspiciously rosy picture, then I should say so, right? And if Sam hasn’t made arguments that address the worries, then it’s at least among the top hypotheses that he’s just not taking them seriously, right? My original comment said that (on the basis of the essay, and lack of linked arguments). It sounds like you took that to mean that anyone who doesn’t think fast surprising takeoff is likely, must not understand the arguments. That’s not what I said.
I’m confused here, since while I definitely agree that AGI companies have terrible incentives for safety, I don’t see how this undermines DragonGod’s key point, exactly.
A better example of the problem with incentives is the incentive to downplay alignment difficulties.
What do you think DragonGod’s key point is? They haven’t argued against fast takeoff here. (Which is fine.) They seem to have misunderstood me as saying that no one who understands fast takeoff arguments would disagree that fast takeoff is likely, and then they’ve been defending their right to know about fast takeoff arguments and disagree that it’s likely.
I think a key point of DragonGod here is that the majority of the effort should go to scenarios that are likely to happen, and while fast takeoff deserves some effort, at this point it’s a mistake to expect Sam Altman to condition heavily on the fast takeoff, and not conditioning on it doesn’t make him irrational or ruled by incentives.
I think making claims without substantiating them, including claims that are in contradiction with claims others have made, is a more virtuous move than calling (acceptance of) other claims unwarranted. It’s invisible whether some claims are unwarranted in the sense of actually not having even a secret/illegible good justification, if relevant reasoning hasn’t been published. Which is infeasible for many informal-theory-laden claims, like those found in philosophy and forecasting.
(b) the only example of mesa-optimization we have is evolution, and even that succeeds in alignment, people:
still want to have kids for the sake of having kids
the evolution’s biggest objective (thrive and proliferate) is being executed quite well, even “outside training distribution”
yes, there are local counterexamples, but we gonna look on the causes and consequences – and we’re at 8 billion already, effectively destroying or enslaving all the other DNA reproductors
I am not claiming this will be the method used, I am claiming it is obviously achievable and something at least this good will very likely be tried by current AI labs within 3-5 years, conditional on sufficient funding. (if a large llm costs 2 million to train, each AGI candidate would take probably 10 million to train, though I expect many AGI candidates will reuse modules from failed candidates to lower the cost. So a search of 1000 candidates would cost 10 billion, maybe a bit less. Easily possible if AI labs can show revenue in the next 3-5 years)
I do not think this will foom overall, and in the comment thread I explain why, but the intelligence component is self amplifying. It would foom if compute, accurate scientific data, and robotics were all available in unlimited quanties.
I think Yudkowskian foom is just implausible, the arguments in favour of it aren’t particularly strong/make questionable assumptions and that the model of an intelligence explosion in IEM is bad.
I have like very strong objections to it and I’ve often mentioned them in online discussions (admittedly I haven’t written up a long form treatment of it that I endorse, but that’s due to generalised executive dysfunction).
It’s not at all the case that there are no counterarguments. I’ve engaged Yudkowsky’s arguments for foom, and I’ve found them lacking.
And there are like several people in the LW/AI safety community that find Yudkowskian foom unlikely/implausible? Some of them have debated him at length.
Conditioning on that particular scenario is just unwarranted. Like if I don’t expect it to be useful, I wouldn’t want to engage with it much, and I’ll be satisfied with others not engaging with it much. I think you just have to accept that people are legitimately unpersuaded here.
Like, I’m trying to do AI Safety research, and I don’t think Yudkowskian foom is a plausible scenario/I don’t think that’s what failure looks like, and I don’t expect trying to address it to be all that useful.
I endorse people who buy into Yudkowskian foom working on alleviating risk under those scenarios for epistemic pluralism reasons, but like if I personally think such work is a waste of time, then it doesn’t make sense for me to privilege it/conditions my strategy on that implausible (to me) scenario.
If I think that Sam is adopting an implausible and suspiciously rosy picture, then I should say so, right? And if Sam hasn’t made arguments that address the worries, then it’s at least among the top hypotheses that he’s just not taking them seriously, right? My original comment said that (on the basis of the essay, and lack of linked arguments). It sounds like you took that to mean that anyone who doesn’t think fast surprising takeoff is likely, must not understand the arguments. That’s not what I said.
I’m confused here, since while I definitely agree that AGI companies have terrible incentives for safety, I don’t see how this undermines DragonGod’s key point, exactly.
A better example of the problem with incentives is the incentive to downplay alignment difficulties.
What do you think DragonGod’s key point is? They haven’t argued against fast takeoff here. (Which is fine.) They seem to have misunderstood me as saying that no one who understands fast takeoff arguments would disagree that fast takeoff is likely, and then they’ve been defending their right to know about fast takeoff arguments and disagree that it’s likely.
I think a key point of DragonGod here is that the majority of the effort should go to scenarios that are likely to happen, and while fast takeoff deserves some effort, at this point it’s a mistake to expect Sam Altman to condition heavily on the fast takeoff, and not conditioning on it doesn’t make him irrational or ruled by incentives.
It does if he hasn’t engaged with the arguments.
I think making claims without substantiating them, including claims that are in contradiction with claims others have made, is a more virtuous move than calling (acceptance of) other claims unwarranted. It’s invisible whether some claims are unwarranted in the sense of actually not having even a secret/illegible good justification, if relevant reasoning hasn’t been published. Which is infeasible for many informal-theory-laden claims, like those found in philosophy and forecasting.
It’s unclear to me what:
(1) You consider the Yudowskian argument for FOOM to be
(2) Which of the premises in the argument you find questionable
A while ago, I tried to (badly) summarise my objections:
https://www.lesswrong.com/posts/jdLmC46ZuXS54LKzL/why-i-m-sceptical-of-foom
There’s a lot that post doesn’t capture (or only captures poorly), but I’m unable to write a good post that captures all my objections well.
I mostly rant about particular disagreements as the need arises (usually in Twitter discussions).
to (2): (a) Simulators are not agents, (b) mesa-optimizers are still “aligned”
(a) amazing https://astralcodexten.substack.com/p/janus-simulators post, utility function is a wrong way to think about intelligence, humans themselves don’t have any utility function, even the most rational ones
(b) the only example of mesa-optimization we have is evolution, and even that succeeds in alignment, people:
still want to have kids for the sake of having kids
the evolution’s biggest objective (thrive and proliferate) is being executed quite well, even “outside training distribution”
yes, there are local counterexamples, but we gonna look on the causes and consequences – and we’re at 8 billion already, effectively destroying or enslaving all the other DNA reproductors
I am curious if you think the intelligence self amplifying is possible or not.
In this post below I outline what I think is a grounded, constructible RSI algorithm using current techniques:
https://www.lesswrong.com/posts/Aq82XqYhgqdPdPrBA/?commentId=Mvyq996KxiE4LR6ii
This post I cite the papers I drew from to construct the method: https://www.lesswrong.com/posts/Aq82XqYhgqdPdPrBA/full-transcript-eliezer-yudkowsky-on-the-bankless-podcast?commentId=3AJiGHnweC7z52D6v
I am not claiming this will be the method used, I am claiming it is obviously achievable and something at least this good will very likely be tried by current AI labs within 3-5 years, conditional on sufficient funding. (if a large llm costs 2 million to train, each AGI candidate would take probably 10 million to train, though I expect many AGI candidates will reuse modules from failed candidates to lower the cost. So a search of 1000 candidates would cost 10 billion, maybe a bit less. Easily possible if AI labs can show revenue in the next 3-5 years)
I do not think this will foom overall, and in the comment thread I explain why, but the intelligence component is self amplifying. It would foom if compute, accurate scientific data, and robotics were all available in unlimited quanties.