Human values are complex and fragile. We don’t know yet how to make AI pursue such goals.
Any sufficiently complex plan would require pursuing complex and fragile instrumental goals. AGI should be able to implement complex plans. Hence, it’s near certain that AGI will be able to understand complex and fragile values (for it’s instrumental goals).
If we will make an AI which is able to successfully pursue complex and fragile goals, it will likely be enough to make it AGI.
Hence, a complete solution to Alignment will very likely have solving AGI as a side effect. And solving AGI will solve some parts of Alignment, maybe even the hardest ones, but not all of them.
It may be that the only way to be truly aware of the world is to have complex and fragile values. Humans are motivated by a thousand things at once and that may give us the impression that we are not agents moving from a clearly defined point A to point B, as AI in its current form is, but are rather just… alive. I’m not sure how to describe that. Consciousness is not an end state but a mode of being. This seems to me like a key part of the solution to AGI: aim for a mode of being not an endstate.
For a machine whose only capability is to move from point A to point B, adding a thousand different, complex and fragile, goals may be the way to go. As such solving AGI may also solve most of the alignment problem, so long as the AIs specific cocktail of values is not too different from the average human’s.
In my opinion there is more to fear from highly capable narrow AI than there is from AGI, for this reason. But then I know nothing.
My theory is that the core of the human values is about what human brain was made for—making decisions. Making meaningful decision individually and as a group. Including collectively making decisions about the human fate.
Human values are complex and fragile. We don’t know yet how to make AI pursue such goals.
Any sufficiently complex plan would require pursuing complex and fragile instrumental goals. AGI should be able to implement complex plans. Hence, it’s near certain that AGI will be able to understand complex and fragile values (for it’s instrumental goals).
If we will make an AI which is able to successfully pursue complex and fragile goals, it will likely be enough to make it AGI.
Hence, a complete solution to Alignment will very likely have solving AGI as a side effect. And solving AGI will solve some parts of Alignment, maybe even the hardest ones, but not all of them.
To elaborate your idea here a little:
It may be that the only way to be truly aware of the world is to have complex and fragile values. Humans are motivated by a thousand things at once and that may give us the impression that we are not agents moving from a clearly defined point A to point B, as AI in its current form is, but are rather just… alive. I’m not sure how to describe that. Consciousness is not an end state but a mode of being. This seems to me like a key part of the solution to AGI: aim for a mode of being not an endstate.
For a machine whose only capability is to move from point A to point B, adding a thousand different, complex and fragile, goals may be the way to go. As such solving AGI may also solve most of the alignment problem, so long as the AIs specific cocktail of values is not too different from the average human’s.
In my opinion there is more to fear from highly capable narrow AI than there is from AGI, for this reason. But then I know nothing.
My theory is that the core of the human values is about what human brain was made for—making decisions. Making meaningful decision individually and as a group. Including collectively making decisions about the human fate.