Very useful post, thanks. While the ‘talking past each other’ is frustrating, the ‘not necessarily disagreeing’ suggests the possibility of establishing surprising areas of consensus. And it might be interesting to explore further what exactly that consensus is. For example:
Yann suggested that there was no existential risk because we will solve it
I’m sure the air of paradox here (because you can’t solve a problem that doesn’t exist) is intentional, but if we drill down, should we conclude that Yann actually agrees that there is an existential risk (just that the probabilities are lower than other estimates, and less worth worrying about)? Yann sometimes compares the situation to the risk of designing a car without brakes, but if the car is big enough that crashing it would destroy civilization that still kinda sounds like an existential risk.
I’m also not sure the fire department analogy helps here—as you note later in the paper Yann thinks he knows in outline how to solve the problem and ‘put out the fires’, so it’s not an exogenous view. It seems like the difference between the fire chief who thinks their job is easy vs the one who thinks it’s hard, though everyone agrees fires spreading would be a big problem.
Yeah, like I said, I don’t think that one in particular was a major dynamic, just one thing I thought worth mentioning. I think one could rephrase what they said slightly to get a very similar disagreement minus the talking-past-each-other.
Like, for example, if every human jumped off a cliff simultaneously, that would cause extinction. Is that an “x-risk”? No, because it’s never going to happen. We don’t need any “let’s not all simultaneously jump off a cliff” activist movements, or any “let’s not all simultaneously jump off a cliff” laws, or any “let’s not all simultaneously jump off a cliff” fields of technical research, or anything like that.
That’s obviously a parody, but Yann is kinda in that direction regarding AI. I think his perspective is: We don’t need activists, we don’t need laws, we don’t need research. Without any of those things, AI extinction is still not going to happen, just because that’s the natural consequence of normal human behavior and institutions doing normal stuff that they’ve always done.
I think “Yann thinks he knows in outline how to solve the problem” is maybe giving the wrong impression here. I think he thinks the alignment problem is just a really easy problem with a really obvious solution. I don’t think he’s giving himself any credit. I think he thinks anyone looking at the source code for a future human-level AI would be equally capable of making it subservient with just a moment’s thought. His paper didn’t really say “here’s my brilliant plan for how to solve the alignment problem”, the vibe was more like “oh and by the way you should obviously choose a cost function that makes your AI kind and subservient” as a side-comment sentence or two. (Details here)
Re how hard alignment is, I suspect it’s harder on my median world than Yann LeCun thinks, but probably a lot easier than most LWers think, and in particular I assign some probability mass to the hypothesis that in practice, AI alignment of superhuman AI is a non-problem.
This means that while I disagree with Yann LeCun, and in general find my side (anti-doomerism/accelerationism) to be quite epistemically unsound at best, I do suspect that there are object level reasons to believe the alignment of superhuman systems is way easier than most LWers think, including you.
To give the short version, instrumental convergence is probably going to be constrained for pure capabilities reasons, and in particular unconstrained instrumental goals via RL usually fail, or at best produce something useless. This means that it will be reasonably easy to add constraints to instrumental goals, and the usual thought experiment of a squiggle maximizer basically collapses, because they won’t have the capabilities. This implies that we are dealing with an easier problem, since we aren’t starting from zero, and in the strongest form, basically transforms it into a non-adversarial problem like nuclear safety, which we solve easily and rather well.
Or in slogan form “Less constraints on instrumental goals is usually not good for capabilities, and the human case is probably a result of just not caring about efficiency, plus time scales. We should expect AI systems to have more constraints on instrumental goals for capability and alignment reasons.”
Very useful post, thanks. While the ‘talking past each other’ is frustrating, the ‘not necessarily disagreeing’ suggests the possibility of establishing surprising areas of consensus. And it might be interesting to explore further what exactly that consensus is. For example:
I’m sure the air of paradox here (because you can’t solve a problem that doesn’t exist) is intentional, but if we drill down, should we conclude that Yann actually agrees that there is an existential risk (just that the probabilities are lower than other estimates, and less worth worrying about)? Yann sometimes compares the situation to the risk of designing a car without brakes, but if the car is big enough that crashing it would destroy civilization that still kinda sounds like an existential risk.
I’m also not sure the fire department analogy helps here—as you note later in the paper Yann thinks he knows in outline how to solve the problem and ‘put out the fires’, so it’s not an exogenous view. It seems like the difference between the fire chief who thinks their job is easy vs the one who thinks it’s hard, though everyone agrees fires spreading would be a big problem.
Yeah, like I said, I don’t think that one in particular was a major dynamic, just one thing I thought worth mentioning. I think one could rephrase what they said slightly to get a very similar disagreement minus the talking-past-each-other.
Like, for example, if every human jumped off a cliff simultaneously, that would cause extinction. Is that an “x-risk”? No, because it’s never going to happen. We don’t need any “let’s not all simultaneously jump off a cliff” activist movements, or any “let’s not all simultaneously jump off a cliff” laws, or any “let’s not all simultaneously jump off a cliff” fields of technical research, or anything like that.
That’s obviously a parody, but Yann is kinda in that direction regarding AI. I think his perspective is: We don’t need activists, we don’t need laws, we don’t need research. Without any of those things, AI extinction is still not going to happen, just because that’s the natural consequence of normal human behavior and institutions doing normal stuff that they’ve always done.
I think “Yann thinks he knows in outline how to solve the problem” is maybe giving the wrong impression here. I think he thinks the alignment problem is just a really easy problem with a really obvious solution. I don’t think he’s giving himself any credit. I think he thinks anyone looking at the source code for a future human-level AI would be equally capable of making it subservient with just a moment’s thought. His paper didn’t really say “here’s my brilliant plan for how to solve the alignment problem”, the vibe was more like “oh and by the way you should obviously choose a cost function that makes your AI kind and subservient” as a side-comment sentence or two. (Details here)
Re how hard alignment is, I suspect it’s harder on my median world than Yann LeCun thinks, but probably a lot easier than most LWers think, and in particular I assign some probability mass to the hypothesis that in practice, AI alignment of superhuman AI is a non-problem.
This means that while I disagree with Yann LeCun, and in general find my side (anti-doomerism/accelerationism) to be quite epistemically unsound at best, I do suspect that there are object level reasons to believe the alignment of superhuman systems is way easier than most LWers think, including you.
To give the short version, instrumental convergence is probably going to be constrained for pure capabilities reasons, and in particular unconstrained instrumental goals via RL usually fail, or at best produce something useless. This means that it will be reasonably easy to add constraints to instrumental goals, and the usual thought experiment of a squiggle maximizer basically collapses, because they won’t have the capabilities. This implies that we are dealing with an easier problem, since we aren’t starting from zero, and in the strongest form, basically transforms it into a non-adversarial problem like nuclear safety, which we solve easily and rather well.
Or in slogan form “Less constraints on instrumental goals is usually not good for capabilities, and the human case is probably a result of just not caring about efficiency, plus time scales. We should expect AI systems to have more constraints on instrumental goals for capability and alignment reasons.”