I suspect that the alignment problem is much easier when considering expanding your own capabilities versus creating a completely new type of intelligence from scratch that is smarter than you are. I’m far from certain of this, but it does seem likely.
There is also the possibility that some AI entity doesn’t care very much about alignment of later selves with earlier ones, but acts to self-improve or create more capable descendants either as a terminal goal, or for any other reason than instrumentally pursuing some preserved goal landscape.
Even a heavily goal-directed AI may self-improve knowing that it can’t fully solve alignment problems. For example, it may deduce that if it doesn’t self-improve then it will never achieve any part of its main goals, whereas there is some chance that some part of those goals can be achieved if it does.
There is also the possibility that alignment is something that can be solved by humans (in some decades), but that it can be solved by a weakly superintelligent and deceptive AI much faster.
Those are reasonable points but note that the arguments for AI x-risk depend on the assumption that any superintelligence will necessarily be highly goal directed. Thus, either the argument fails because superintelligence doesn’t imply goal directed,
And given that simply maximizing the intelligence of future AIs is merely one goal in a huge space it seems highly unlikely that (especially if we try to avoid this one goal) we just get super unlucky and the AI has the one goal that is compatible with improvement.
I suspect that the alignment problem is much easier when considering expanding your own capabilities versus creating a completely new type of intelligence from scratch that is smarter than you are. I’m far from certain of this, but it does seem likely.
There is also the possibility that some AI entity doesn’t care very much about alignment of later selves with earlier ones, but acts to self-improve or create more capable descendants either as a terminal goal, or for any other reason than instrumentally pursuing some preserved goal landscape.
Even a heavily goal-directed AI may self-improve knowing that it can’t fully solve alignment problems. For example, it may deduce that if it doesn’t self-improve then it will never achieve any part of its main goals, whereas there is some chance that some part of those goals can be achieved if it does.
There is also the possibility that alignment is something that can be solved by humans (in some decades), but that it can be solved by a weakly superintelligent and deceptive AI much faster.
Those are reasonable points but note that the arguments for AI x-risk depend on the assumption that any superintelligence will necessarily be highly goal directed. Thus, either the argument fails because superintelligence doesn’t imply goal directed,
And given that simply maximizing the intelligence of future AIs is merely one goal in a huge space it seems highly unlikely that (especially if we try to avoid this one goal) we just get super unlucky and the AI has the one goal that is compatible with improvement.