Strictly improving intelligent and rational lifeforms will over time converge into the same (non distinguishable) beings (kind of like a generalized form of “Great minds think alike”).
This assumption doesn’t seem true. While knowledge will converge as result of improving epistemic rationality, goals will not, and for agent to change its terminal goals is in most cases irrational, since it won’t reach the currently intended goals.
The goals can be seen as vectors in a high-dimensional space; and if humanity goals vector and AI goals vector are different, then this difference, no matter how small, will be critical at high power.
everything good about humanity is good in itself
Veto “good in itself”; do you mean “valuable by currently existing civilization”?
What you’re describing above sounds like an aligned AI and I agree that convergence to the best-possible values over time seems like something an aligned AI would do.
But I think you’re mixing up intelligence and values. Sure, maybe an ASI would converge on useful concepts in a way similar to humans. For example, AlphaZero rediscovered some human chess concepts. But because of the orthogonality thesis, intelligence and goals are more or less independent: you can increase the intelligence of a system without its goals changing.
The classic thought experiment illustrating this is Bostrom’s paperclip maximizer which continues to value only paperclips even when it becomes superintelligent.
Also, I don’t think neuromorphic AI would reliably lead to an aligned AI. Maybe an exact whole-brain emulation of some benevolent human would be aligned but otherwise, a neuromorphic AI could have a wide variety of possible goals and most of them wouldn’t be aligned.
This assumption doesn’t seem true. While knowledge will converge as result of improving epistemic rationality, goals will not, and for agent to change its terminal goals is in most cases irrational, since it won’t reach the currently intended goals.
The goals can be seen as vectors in a high-dimensional space; and if humanity goals vector and AI goals vector are different, then this difference, no matter how small, will be critical at high power.
Veto “good in itself”; do you mean “valuable by currently existing civilization”?
What you’re describing above sounds like an aligned AI and I agree that convergence to the best-possible values over time seems like something an aligned AI would do.
But I think you’re mixing up intelligence and values. Sure, maybe an ASI would converge on useful concepts in a way similar to humans. For example, AlphaZero rediscovered some human chess concepts. But because of the orthogonality thesis, intelligence and goals are more or less independent: you can increase the intelligence of a system without its goals changing.
The classic thought experiment illustrating this is Bostrom’s paperclip maximizer which continues to value only paperclips even when it becomes superintelligent.
Also, I don’t think neuromorphic AI would reliably lead to an aligned AI. Maybe an exact whole-brain emulation of some benevolent human would be aligned but otherwise, a neuromorphic AI could have a wide variety of possible goals and most of them wouldn’t be aligned.
I suggest reading The Superintelligent Will to understand these concepts better.
But I did state its goal; to seek out truth (and to utilize anything that might yeild to that effort)