I’m as surprised by your summary as you were by the outcome.
You saw Tegmark and Bengio having strong arguments from years of alignment research on their side. I saw Bengio with a few good points that however ignore all of that (LLMs work much better than most did expect then timeline to adapt is shorter, oil compagnies knew about anthropic warming then we need regulations, we can’t prepare if we don’t first acknowledge the risk) and Tegmark repeating obvious strawmans (appeal to authority, reverse charge of the proof, etc) that made me regret they invited him. You saw LeCun and Mitchell having far weaker arguments. I saw them with the best points (AIs will likely help with other existential risks, foom/paperclip are incoherent bullshit, intelligence seems to negatively correlate with power trip), including an audacious concrete prediction by YL (that LLMs will soon become obsolete). We’ll see about that.
In the end and contrary to the public I did update toward Bengio (mostly because the other side only made points I already knew and agreed with). but your own post make me update toward LW being a failure of rational thinking, e.g. it’s an echo chamber that makes your ability to evaluate reality weaker, at least on this topic.
but your own post make me update toward LW being a failure of rational thinking, e.g. it’s an echo chamber that makes your ability to evaluate reality weaker, at least on this topic.
I don’t see you giving strong arguments for this. It reminds me of the way Melanie Mitchell argued: “This is all ungrounded speculation”, without giving any supporting arguments for this strong claim.
Concerning the “strong arguments” of LeCun/Mitchell you cite:
AIs will likely help with other existential risks
Yes, but that’s irrelevant to the question of whether AI may pose an x-risk in itself.
foom/paperclip are incoherent bullshit
Nobody argued pro foom, although whether this is “incoherent bullshit” remains to be seen. The orthogonality thesis is obviously true, as demonstrated by humans every day.
intelligence seems to negatively correlate with power trip
I can’t see any evidence for that. The smartest people may not always be the ones in power, but the smartest species on earth definitely is. Instrumental goals are a logical necessity for any rational agent, including power-seeking.
That’s the kind of sentence that I see as arguments for believing your assessment is biased.
Yes, my assessment is certainly biased, I admitted as much in the post. However, I was referring to your claim that LW (in this case, me) was “a failure in rational thinking”, which sounds a lot like Mitchell’s “ungrounded speculations” in my ears.
Of course she gave supporting arguments, you just refuse to hear them
Could you name one? Not any of Mitchell’s argument, but a support for the claim that AI x-risk is just “ungrounded speculation” despite decades of alignment research and lots of papers proving various failures in existing AIs?
In other words you side with Tegmark on insisting to take the question literally, without noticing that both Lecun and Mitchell admit there’s no zero risk
I do side with Tegmark. LeCun compared the risk to an asteroid x-risk, which Tegmark quantified as 1:100,000,000. Mitchell refused to give a number, but it was obvious that she would have put it even below that. If that were true, I’d agree that there is no reason to worry. However, I don’t think it is true. I don’t have a specific estimate, but it is certainly above 1% IMO, high enough to worry about in any case.
As for the style and tone of this exchange, instead of telling me that I’m not listening/not seeing Mitchell’s arguments, it would be helpful if you could tell me what exactly I don’t see.
Thank you for your reply and the clarifications! To briefly comment on your points concerning the examples for blind spots:
superintelligence does not magically solve physical problems
I and everyone I know on LessWrong agree.
evolution don’t believe in instrumental convergence
I disagree. Evolution is all about instrumental convergence IMO. The “goal” of evolution, or rather the driving force behind it, is reproduction. This leads to all kinds of instrumental goals, like developing methods for food acquisition, attack and defense, impressing the opposite sex, etc. “A chicken is an egg’s way of making another egg”, as Samuel Butler put it.
orthogonality thesis equates there’s no impact on intelligence of holding incoherent values
I’m not sure what you mean by “incoherent”. Intelligence tells you what to do, not what to want. Even complicated constructs of seemingly “objective” or “absolute” values in philosophy are really based on the basic needs we humans have, like being part of a social group or caring for our offspring. Some species of octopuses, for example, which are not social animals, might find the idea of caring for others and helping them when in need ridiculous if they could understand it.
the more intelligent human civilization is becoming, the gentler we are
I wish that were so. We have invented some mechanisms to keep power-seeking and deception in check, so we can live together in large cities, but this carries only so far. What I currently see is a global deterioration of democratic values. In terms of the “gentleness” of the human species, I can’t see much progress since the days of Buddha, Socrates, and Jesus. The number of violent conflicts may have decreased, but their scale and brutality have only grown worse. The way we treat animals in today’s factory farms certainly doesn’t speak for general human gentleness.
oilI: Could you name one reason (not from Mitchell) for questioning the validity of many works on x-risk in AIs?
Thanks for that. However, my definition of “intelligence” would be “the ability to find solutions for complex decision problems”. It’s unclear whether the ability of slime molds to find the shortest path through a maze or organize in seemingly “intelligent” ways has anything to do with intelligence, although the underlying principles may be similar.
I haven’t read the article you linked in full, but at first glance, it seems to refer to consciousness, not intelligence. Maybe that is a key to understanding the difference in thinking between me, Melanie Mitchell, and possibly you: If she assumes that for AI to present an x-risk, it has to be conscious in the way we humans are, that would explain Mitchell’s low estimate for achieving this anytime soon. However, I don’t believe that. To become uncontrollable and develop instrumental goals, an advanced AI would probably need what Joseph Carlsmith calls “strategic awareness”—a world model that includes the AI itself as a part of its plan to achieve its goals. That is nothing like human experience, emotions, or “qualia”. Arguably, GPT-4 may display early signs of this kind of awareness.
That’s all important points and I’d glad to discuss them. However I’m also noticing a wave of downvotes, so maybe we should go half private with whoever signal they want to read more? Or you think I should just ignore that and go forward with my answers? Both are ok but I’d like to follow your lead as you know the house better.
I’ve received my fair share of downvotes, see for example this post, which got 15 karma out of 24 votes. :) It’s a signal, but not more than that. As long as you remain respectful, you shouldn’t be discouraged from posting your opinion in comments even if people downvote it. I’m always for open discussions as they help me understand how and why I’m not understood.
Yes, that’s literally the problem I’m seeing. You’re not saying you disagree. You’re saying you can’t see any reason for this position (appart ignorance and stupidity, I’d guess 😉).
I’d actually agree with Karl von Wendt here, in that the orthogonality thesis is almost certainly right. Where I disagree with Karl von Wendt and LW in general is in how much evidence this buys us. It’s a claim that it’s possible to have an AI that has essentially any goal while still being very intelligent. And this is almost certainly true, because any adversarial example is a win for the orthogonality thesis. Problem is, that also makes it very, very little evidence for or against AI risk, and you shouldn’t shift your priors.
In other words you side with Tegmark on insisting to take the question literally, without noticing that both Lecun and Mitchell admit there’s no zero risk, and without noticing Bengio explained he would side with Mitchell if he thought the timelines were decades-centuries rather than maybe-in-a-few-years.
Let’s try an analogy outside of this topic: if I reject nuclear plants or vaccines because I insist nobody can prove there’s no risk, I’m making a strawman where I fail to look at the risks I could avoid.
Hm, I think the point is that even if AI helps with other existential risks, we need it to not be also very existentially risky in itself, unless something weird happens.
To be clear, I do think AI isn’t very risky, but the point is that unless AI isn’t very existentially risky, then it’s probably too risky to use, unless other existential risks are so high that gambling is worth it.
Yes and no. As for superintelligence there’s a game of motte-and-bailey between the explicit definition (superintelligence is reasonable because it just means « at least as good as a team of the best humans plus speed ») and LW actual usage of the term in practice (superintelligence is what can solve all physical problems in negligible time).
The points of disagreement I have with LWers on AI existential risk are mostly invariant to how capable AIs and superintelligences are in reality, though how they get the capabilities can matter for my disagreement points, so I’m trying to avoid relying on capabilities limitations for my disagreement points on AI extinction/existential risk.
For orthogonality, LW use of the term in practice is « an intelligence [would likely] jump to arbitrary values whatever the values it started from ». Am I right this is in disguise your own reason for saying we shouldn’t update based on OT?
Not really. The issue is that even accepting the orthogonality thesis is still compatible with a wide range of observations, and in particular is compatible with a view that views the AI safety problem is mostly a non-problem in practice ala Yann LeCun, as even if it’s possible to get an AI that values inhuman goals while still being very intelligent, we can optimize it fairly easily such that we can in practice not have to deal with rogue AI values while being very smart. In essence, it’s not narrow enough, which is why we shouldn’t update much without other assumptions.
In essence, it only claims that this is a possible outcome, but under that standard, logical omniscience is possible, too, and even infinite computation is possible, but we correctly don’t devote much resources to it. It doesn’t make any claim about it’s likelihood, remember that very clearly.
Could you tell more? I happen to count adversarial examples as an argument (weakly) against OT, because it’s not random but looks like an objective property from the dataset. What’s your own reasoning here?
I’m willing to concede this point, but from my perspective the Orthogonality thesis was talking about all possible intelligences, and I suspected that it was very difficult to ensure that the values of an AI couldn’t be say, paper clip maximization.
Keep in mind that the Orthogonality thesis is a really weak claim in terms of evidence, at least how I interpreted it, so it’s not very surprising that it’s probably true. This means it’s not enough to change our priors. That’s the problem I have with the orthogonality thesis and instrumental convergence assumptions: They don’t give enough evidence to justify AI risk from a skeptical prior, even assuming they’re true.
I’m as surprised by your summary as you were by the outcome.
You saw Tegmark and Bengio having strong arguments from years of alignment research on their side. I saw Bengio with a few good points that however ignore all of that (LLMs work much better than most did expect then timeline to adapt is shorter, oil compagnies knew about anthropic warming then we need regulations, we can’t prepare if we don’t first acknowledge the risk) and Tegmark repeating obvious strawmans (appeal to authority, reverse charge of the proof, etc) that made me regret they invited him. You saw LeCun and Mitchell having far weaker arguments. I saw them with the best points (AIs will likely help with other existential risks, foom/paperclip are incoherent bullshit, intelligence seems to negatively correlate with power trip), including an audacious concrete prediction by YL (that LLMs will soon become obsolete). We’ll see about that.
In the end and contrary to the public I did update toward Bengio (mostly because the other side only made points I already knew and agreed with). but your own post make me update toward LW being a failure of rational thinking, e.g. it’s an echo chamber that makes your ability to evaluate reality weaker, at least on this topic.
I don’t see you giving strong arguments for this. It reminds me of the way Melanie Mitchell argued: “This is all ungrounded speculation”, without giving any supporting arguments for this strong claim.
Concerning the “strong arguments” of LeCun/Mitchell you cite:
Yes, but that’s irrelevant to the question of whether AI may pose an x-risk in itself.
Nobody argued pro foom, although whether this is “incoherent bullshit” remains to be seen. The orthogonality thesis is obviously true, as demonstrated by humans every day.
I can’t see any evidence for that. The smartest people may not always be the ones in power, but the smartest species on earth definitely is. Instrumental goals are a logical necessity for any rational agent, including power-seeking.
[canceled]
Yes, my assessment is certainly biased, I admitted as much in the post. However, I was referring to your claim that LW (in this case, me) was “a failure in rational thinking”, which sounds a lot like Mitchell’s “ungrounded speculations” in my ears.
Could you name one? Not any of Mitchell’s argument, but a support for the claim that AI x-risk is just “ungrounded speculation” despite decades of alignment research and lots of papers proving various failures in existing AIs?
I do side with Tegmark. LeCun compared the risk to an asteroid x-risk, which Tegmark quantified as 1:100,000,000. Mitchell refused to give a number, but it was obvious that she would have put it even below that. If that were true, I’d agree that there is no reason to worry. However, I don’t think it is true. I don’t have a specific estimate, but it is certainly above 1% IMO, high enough to worry about in any case.
As for the style and tone of this exchange, instead of telling me that I’m not listening/not seeing Mitchell’s arguments, it would be helpful if you could tell me what exactly I don’t see.
[canceled]
Thank you for your reply and the clarifications! To briefly comment on your points concerning the examples for blind spots:
I and everyone I know on LessWrong agree.
I disagree. Evolution is all about instrumental convergence IMO. The “goal” of evolution, or rather the driving force behind it, is reproduction. This leads to all kinds of instrumental goals, like developing methods for food acquisition, attack and defense, impressing the opposite sex, etc. “A chicken is an egg’s way of making another egg”, as Samuel Butler put it.
I’m not sure what you mean by “incoherent”. Intelligence tells you what to do, not what to want. Even complicated constructs of seemingly “objective” or “absolute” values in philosophy are really based on the basic needs we humans have, like being part of a social group or caring for our offspring. Some species of octopuses, for example, which are not social animals, might find the idea of caring for others and helping them when in need ridiculous if they could understand it.
I wish that were so. We have invented some mechanisms to keep power-seeking and deception in check, so we can live together in large cities, but this carries only so far. What I currently see is a global deterioration of democratic values. In terms of the “gentleness” of the human species, I can’t see much progress since the days of Buddha, Socrates, and Jesus. The number of violent conflicts may have decreased, but their scale and brutality have only grown worse. The way we treat animals in today’s factory farms certainly doesn’t speak for general human gentleness.
Thanks for that. However, my definition of “intelligence” would be “the ability to find solutions for complex decision problems”. It’s unclear whether the ability of slime molds to find the shortest path through a maze or organize in seemingly “intelligent” ways has anything to do with intelligence, although the underlying principles may be similar.
I haven’t read the article you linked in full, but at first glance, it seems to refer to consciousness, not intelligence. Maybe that is a key to understanding the difference in thinking between me, Melanie Mitchell, and possibly you: If she assumes that for AI to present an x-risk, it has to be conscious in the way we humans are, that would explain Mitchell’s low estimate for achieving this anytime soon. However, I don’t believe that. To become uncontrollable and develop instrumental goals, an advanced AI would probably need what Joseph Carlsmith calls “strategic awareness”—a world model that includes the AI itself as a part of its plan to achieve its goals. That is nothing like human experience, emotions, or “qualia”. Arguably, GPT-4 may display early signs of this kind of awareness.
That’s all important points and I’d glad to discuss them. However I’m also noticing a wave of downvotes, so maybe we should go half private with whoever signal they want to read more? Or you think I should just ignore that and go forward with my answers? Both are ok but I’d like to follow your lead as you know the house better.
I’ve received my fair share of downvotes, see for example this post, which got 15 karma out of 24 votes. :) It’s a signal, but not more than that. As long as you remain respectful, you shouldn’t be discouraged from posting your opinion in comments even if people downvote it. I’m always for open discussions as they help me understand how and why I’m not understood.
[canceled]
I’d actually agree with Karl von Wendt here, in that the orthogonality thesis is almost certainly right. Where I disagree with Karl von Wendt and LW in general is in how much evidence this buys us. It’s a claim that it’s possible to have an AI that has essentially any goal while still being very intelligent. And this is almost certainly true, because any adversarial example is a win for the orthogonality thesis. Problem is, that also makes it very, very little evidence for or against AI risk, and you shouldn’t shift your priors.
Hm, I think the point is that even if AI helps with other existential risks, we need it to not be also very existentially risky in itself, unless something weird happens.
To be clear, I do think AI isn’t very risky, but the point is that unless AI isn’t very existentially risky, then it’s probably too risky to use, unless other existential risks are so high that gambling is worth it.
[canceled]
The points of disagreement I have with LWers on AI existential risk are mostly invariant to how capable AIs and superintelligences are in reality, though how they get the capabilities can matter for my disagreement points, so I’m trying to avoid relying on capabilities limitations for my disagreement points on AI extinction/existential risk.
Not really. The issue is that even accepting the orthogonality thesis is still compatible with a wide range of observations, and in particular is compatible with a view that views the AI safety problem is mostly a non-problem in practice ala Yann LeCun, as even if it’s possible to get an AI that values inhuman goals while still being very intelligent, we can optimize it fairly easily such that we can in practice not have to deal with rogue AI values while being very smart. In essence, it’s not narrow enough, which is why we shouldn’t update much without other assumptions.
In essence, it only claims that this is a possible outcome, but under that standard, logical omniscience is possible, too, and even infinite computation is possible, but we correctly don’t devote much resources to it. It doesn’t make any claim about it’s likelihood, remember that very clearly.
I’m willing to concede this point, but from my perspective the Orthogonality thesis was talking about all possible intelligences, and I suspected that it was very difficult to ensure that the values of an AI couldn’t be say, paper clip maximization.
Keep in mind that the Orthogonality thesis is a really weak claim in terms of evidence, at least how I interpreted it, so it’s not very surprising that it’s probably true. This means it’s not enough to change our priors. That’s the problem I have with the orthogonality thesis and instrumental convergence assumptions: They don’t give enough evidence to justify AI risk from a skeptical prior, even assuming they’re true.