I agree that it’s good that we don’t need to create an aligned superintelligence from scratch with GD, but stating that like this seems like you require incredibly pessimistic priors on how hard alignment is, and I do want to make sure people don’t misunderstand your post and end up believing that alignment is easier than it is. I guess for most people understanding the sharp left turn should update them towards “alignment is harder”.
As an aside, it shortens timelines and especially shortens the time we have where we know what process will create AGI.
The key problem in the alignment problem is to create an AGI whose goals extrapolate to a good utility function. This is harder than just creating an AI that is reasonably aligned with us at human level, because such an AI may still kill us when we scale up optimization power, which at a minimum needs to make the AI’s preferences more coherent and may likely scramble them more.
Importantly, “extrapolate to a good utility function” is harder than “getting a human-level AI with the right utility function”, because the steep slopes for increasing intelligence may well push towards misalignment by default, so it’s possible that we then don’t have a good way to scale up intelligence while preserving alignment. Navigating the steep slopes well is a hard part of the problem, and we probably need a significantly superhuman AGI with the right utility function to do that well. Getting that is really really hard.
I agree that it’s good that we don’t need to create an aligned superintelligence from scratch with GD, but stating that like this seems like you require incredibly pessimistic priors on how hard alignment is, and I do want to make sure people don’t misunderstand your post and end up believing that alignment is easier than it is. I guess for most people understanding the sharp left turn should update them towards “alignment is harder”.
As an aside, it shortens timelines and especially shortens the time we have where we know what process will create AGI.
The key problem in the alignment problem is to create an AGI whose goals extrapolate to a good utility function. This is harder than just creating an AI that is reasonably aligned with us at human level, because such an AI may still kill us when we scale up optimization power, which at a minimum needs to make the AI’s preferences more coherent and may likely scramble them more.
Importantly, “extrapolate to a good utility function” is harder than “getting a human-level AI with the right utility function”, because the steep slopes for increasing intelligence may well push towards misalignment by default, so it’s possible that we then don’t have a good way to scale up intelligence while preserving alignment. Navigating the steep slopes well is a hard part of the problem, and we probably need a significantly superhuman AGI with the right utility function to do that well. Getting that is really really hard.