Re how hard alignment is, I suspect it’s harder on my median world than Yann LeCun thinks, but probably a lot easier than most LWers think, and in particular I assign some probability mass to the hypothesis that in practice, AI alignment of superhuman AI is a non-problem.
This means that while I disagree with Yann LeCun, and in general find my side (anti-doomerism/accelerationism) to be quite epistemically unsound at best, I do suspect that there are object level reasons to believe the alignment of superhuman systems is way easier than most LWers think, including you.
To give the short version, instrumental convergence is probably going to be constrained for pure capabilities reasons, and in particular unconstrained instrumental goals via RL usually fail, or at best produce something useless. This means that it will be reasonably easy to add constraints to instrumental goals, and the usual thought experiment of a squiggle maximizer basically collapses, because they won’t have the capabilities. This implies that we are dealing with an easier problem, since we aren’t starting from zero, and in the strongest form, basically transforms it into a non-adversarial problem like nuclear safety, which we solve easily and rather well.
Or in slogan form “Less constraints on instrumental goals is usually not good for capabilities, and the human case is probably a result of just not caring about efficiency, plus time scales. We should expect AI systems to have more constraints on instrumental goals for capability and alignment reasons.”
Re how hard alignment is, I suspect it’s harder on my median world than Yann LeCun thinks, but probably a lot easier than most LWers think, and in particular I assign some probability mass to the hypothesis that in practice, AI alignment of superhuman AI is a non-problem.
This means that while I disagree with Yann LeCun, and in general find my side (anti-doomerism/accelerationism) to be quite epistemically unsound at best, I do suspect that there are object level reasons to believe the alignment of superhuman systems is way easier than most LWers think, including you.
To give the short version, instrumental convergence is probably going to be constrained for pure capabilities reasons, and in particular unconstrained instrumental goals via RL usually fail, or at best produce something useless. This means that it will be reasonably easy to add constraints to instrumental goals, and the usual thought experiment of a squiggle maximizer basically collapses, because they won’t have the capabilities. This implies that we are dealing with an easier problem, since we aren’t starting from zero, and in the strongest form, basically transforms it into a non-adversarial problem like nuclear safety, which we solve easily and rather well.
Or in slogan form “Less constraints on instrumental goals is usually not good for capabilities, and the human case is probably a result of just not caring about efficiency, plus time scales. We should expect AI systems to have more constraints on instrumental goals for capability and alignment reasons.”