Yeah, like I said, I don’t think that one in particular was a major dynamic, just one thing I thought worth mentioning. I think one could rephrase what they said slightly to get a very similar disagreement minus the talking-past-each-other.
Like, for example, if every human jumped off a cliff simultaneously, that would cause extinction. Is that an “x-risk”? No, because it’s never going to happen. We don’t need any “let’s not all simultaneously jump off a cliff” activist movements, or any “let’s not all simultaneously jump off a cliff” laws, or any “let’s not all simultaneously jump off a cliff” fields of technical research, or anything like that.
That’s obviously a parody, but Yann is kinda in that direction regarding AI. I think his perspective is: We don’t need activists, we don’t need laws, we don’t need research. Without any of those things, AI extinction is still not going to happen, just because that’s the natural consequence of normal human behavior and institutions doing normal stuff that they’ve always done.
I think “Yann thinks he knows in outline how to solve the problem” is maybe giving the wrong impression here. I think he thinks the alignment problem is just a really easy problem with a really obvious solution. I don’t think he’s giving himself any credit. I think he thinks anyone looking at the source code for a future human-level AI would be equally capable of making it subservient with just a moment’s thought. His paper didn’t really say “here’s my brilliant plan for how to solve the alignment problem”, the vibe was more like “oh and by the way you should obviously choose a cost function that makes your AI kind and subservient” as a side-comment sentence or two. (Details here)
Re how hard alignment is, I suspect it’s harder on my median world than Yann LeCun thinks, but probably a lot easier than most LWers think, and in particular I assign some probability mass to the hypothesis that in practice, AI alignment of superhuman AI is a non-problem.
This means that while I disagree with Yann LeCun, and in general find my side (anti-doomerism/accelerationism) to be quite epistemically unsound at best, I do suspect that there are object level reasons to believe the alignment of superhuman systems is way easier than most LWers think, including you.
To give the short version, instrumental convergence is probably going to be constrained for pure capabilities reasons, and in particular unconstrained instrumental goals via RL usually fail, or at best produce something useless. This means that it will be reasonably easy to add constraints to instrumental goals, and the usual thought experiment of a squiggle maximizer basically collapses, because they won’t have the capabilities. This implies that we are dealing with an easier problem, since we aren’t starting from zero, and in the strongest form, basically transforms it into a non-adversarial problem like nuclear safety, which we solve easily and rather well.
Or in slogan form “Less constraints on instrumental goals is usually not good for capabilities, and the human case is probably a result of just not caring about efficiency, plus time scales. We should expect AI systems to have more constraints on instrumental goals for capability and alignment reasons.”
Yeah, like I said, I don’t think that one in particular was a major dynamic, just one thing I thought worth mentioning. I think one could rephrase what they said slightly to get a very similar disagreement minus the talking-past-each-other.
Like, for example, if every human jumped off a cliff simultaneously, that would cause extinction. Is that an “x-risk”? No, because it’s never going to happen. We don’t need any “let’s not all simultaneously jump off a cliff” activist movements, or any “let’s not all simultaneously jump off a cliff” laws, or any “let’s not all simultaneously jump off a cliff” fields of technical research, or anything like that.
That’s obviously a parody, but Yann is kinda in that direction regarding AI. I think his perspective is: We don’t need activists, we don’t need laws, we don’t need research. Without any of those things, AI extinction is still not going to happen, just because that’s the natural consequence of normal human behavior and institutions doing normal stuff that they’ve always done.
I think “Yann thinks he knows in outline how to solve the problem” is maybe giving the wrong impression here. I think he thinks the alignment problem is just a really easy problem with a really obvious solution. I don’t think he’s giving himself any credit. I think he thinks anyone looking at the source code for a future human-level AI would be equally capable of making it subservient with just a moment’s thought. His paper didn’t really say “here’s my brilliant plan for how to solve the alignment problem”, the vibe was more like “oh and by the way you should obviously choose a cost function that makes your AI kind and subservient” as a side-comment sentence or two. (Details here)
Re how hard alignment is, I suspect it’s harder on my median world than Yann LeCun thinks, but probably a lot easier than most LWers think, and in particular I assign some probability mass to the hypothesis that in practice, AI alignment of superhuman AI is a non-problem.
This means that while I disagree with Yann LeCun, and in general find my side (anti-doomerism/accelerationism) to be quite epistemically unsound at best, I do suspect that there are object level reasons to believe the alignment of superhuman systems is way easier than most LWers think, including you.
To give the short version, instrumental convergence is probably going to be constrained for pure capabilities reasons, and in particular unconstrained instrumental goals via RL usually fail, or at best produce something useless. This means that it will be reasonably easy to add constraints to instrumental goals, and the usual thought experiment of a squiggle maximizer basically collapses, because they won’t have the capabilities. This implies that we are dealing with an easier problem, since we aren’t starting from zero, and in the strongest form, basically transforms it into a non-adversarial problem like nuclear safety, which we solve easily and rather well.
Or in slogan form “Less constraints on instrumental goals is usually not good for capabilities, and the human case is probably a result of just not caring about efficiency, plus time scales. We should expect AI systems to have more constraints on instrumental goals for capability and alignment reasons.”