I think the crux here is that since the AGI does not kill us right away, this gives us time to use it (or other AGIs) to achieve alignment. So the question is than—isnt’t it true that achieving alignment is much harder than coming up with a reliable plan of killing the humanity (based on—plenty of “good” roadmaps for the latter, no good roadmaps for the former)? So even if there is a bit of time, eventually AGIs will succeed at coming up with a high-quality “kill humanity” plan before they can come up with a good “align an AGI” plan?
That’s a very good way of putting it. I deny that someone has demonstrated that alignment is much harder than coming up/executing a plan of killing humanity.
I think the crux here is that since the AGI does not kill us right away, this gives us time to use it (or other AGIs) to achieve alignment. So the question is than—isnt’t it true that achieving alignment is much harder than coming up with a reliable plan of killing the humanity (based on—plenty of “good” roadmaps for the latter, no good roadmaps for the former)? So even if there is a bit of time, eventually AGIs will succeed at coming up with a high-quality “kill humanity” plan before they can come up with a good “align an AGI” plan?
That’s a very good way of putting it. I deny that someone has demonstrated that alignment is much harder than coming up/executing a plan of killing humanity.