If NAMSI achieved a superhuman level of expertise in morality, how would we know? I consider our society to be morally superior to the one we had in 1960. People in 1960 would not agree with this assessment upon looking. If NAMSI agrees with us about everything, it’s not superhuman. So how do we determine whether its possibly-superhuman morality is superior or inferior?
Just as we measure intelligence based on fundamental attributes, we can do the same with morality. It seems that we have generally agreed upon moral principles like not lying and stealing and not hurting others without good reason. So it seems we would measure intelligence based on how well it does in those areas. Just like there is a lack of a consensus regarding what intelligence is and how it should be measured, the same would apply for morality. But I believe we can still arrive at a useful working understanding of relative morality based on accepted moral principles.
Also its proposals for how we could best solve alignment would probably make more sense to us.
I think intelligence is a lot easier than morality, here. There are agreed upon moral principles like not lying, not stealing, and not hurting others, sure...but even those aren’t always stable across time. For instance, standard Western morality held that it was acceptable to hit your children a couple of generations ago, now standard Western morality says it’s not. If an AI trained to be moral said that actually, hitting children in some circumstances is a worthwhile tradeoff, that could mean that the AI is more moral than we are and we overcorrected, or it could mean that the AI is less moral than we are and is simply wrong.
And that’s just for the same values! What about how values change over the decades? If our moral AI says that a Confucianism obeying of parental authority is just, and that us Westerners are actually wrong about this, how do we know whether it’s correct?
Intelligence tests tend to have a quick feedback loop. The answer is right or wrong. If a Go-playing AI makes a move that looks bizarre but then wins the game, that’s indicative that it’s superior. Morality is more like long-term planning—if a policy-making AI suggests a strange policy, we have no immediate way to judge whether this is good or not, because we don’t have access to the ground truth of whether or not it works for a long time.
Similar with alignment. How do we know that a superhuman alignment solution would look reasonable to us instead of weird? (Also, for that matter, why would a more moral agent have better alignment solutions? Do you think that the blocker for good alignment solutions are that current alignment researchers are insufficiently virtuous to come up with correct solutions?)
Yes, I appreciate the complexities of morality when compared with intelligence but it’s not something that we can in any way afford to ignore. It’s an essential part of alignment, and if we can get narrow ASI behind it we may be able to sufficiently solve it before we arrive at AGI and full ASI.
I don’t think this is an intelligence vs morality matter. It seems that we need to apply AI intelligence much more directly to better understanding and solving moral questions that have thus far proved too difficult for humans. Another part of this is that we don’t need full consensus. All of the nations of the world have an extensive body of laws that not everyone agrees with but that are useful in ensuring the best welfare of their citizens. Naturally I’m not defending laws that disenfranchise various groups like women, but our system of laws shows that much can be done by agreeing upon various moral questions.
I think a lot of AI’s success with this will depend on logic and reasoning algorithms. For example 99% of Americans eat animal products notwithstanding the suffering that those animals endure in factory farms. While there may not be consensus on the cruelty of this practice, the logic and reasoning behind it being terribly cruel could not be more clear.
Yes, I do believe that we humans need to ramp up our own morality in order to better understand what AI comes up with. Perhaps we need it to also help us do that.
If NAMSI achieved a superhuman level of expertise in morality, how would we know? I consider our society to be morally superior to the one we had in 1960. People in 1960 would not agree with this assessment upon looking. If NAMSI agrees with us about everything, it’s not superhuman. So how do we determine whether its possibly-superhuman morality is superior or inferior?
If we’re measuring intelligence we measure it relative to a known metric:
https://www.google.com/amp/s/www.scientificamerican.com/article/i-gave-chatgpt-an-iq-test-heres-what-i-discovered/%3Famp=true
Just as we measure intelligence based on fundamental attributes, we can do the same with morality. It seems that we have generally agreed upon moral principles like not lying and stealing and not hurting others without good reason. So it seems we would measure intelligence based on how well it does in those areas. Just like there is a lack of a consensus regarding what intelligence is and how it should be measured, the same would apply for morality. But I believe we can still arrive at a useful working understanding of relative morality based on accepted moral principles.
Also its proposals for how we could best solve alignment would probably make more sense to us.
I think intelligence is a lot easier than morality, here. There are agreed upon moral principles like not lying, not stealing, and not hurting others, sure...but even those aren’t always stable across time. For instance, standard Western morality held that it was acceptable to hit your children a couple of generations ago, now standard Western morality says it’s not. If an AI trained to be moral said that actually, hitting children in some circumstances is a worthwhile tradeoff, that could mean that the AI is more moral than we are and we overcorrected, or it could mean that the AI is less moral than we are and is simply wrong.
And that’s just for the same values! What about how values change over the decades? If our moral AI says that a Confucianism obeying of parental authority is just, and that us Westerners are actually wrong about this, how do we know whether it’s correct?
Intelligence tests tend to have a quick feedback loop. The answer is right or wrong. If a Go-playing AI makes a move that looks bizarre but then wins the game, that’s indicative that it’s superior. Morality is more like long-term planning—if a policy-making AI suggests a strange policy, we have no immediate way to judge whether this is good or not, because we don’t have access to the ground truth of whether or not it works for a long time.
Similar with alignment. How do we know that a superhuman alignment solution would look reasonable to us instead of weird? (Also, for that matter, why would a more moral agent have better alignment solutions? Do you think that the blocker for good alignment solutions are that current alignment researchers are insufficiently virtuous to come up with correct solutions?)
Yes, I appreciate the complexities of morality when compared with intelligence but it’s not something that we can in any way afford to ignore. It’s an essential part of alignment, and if we can get narrow ASI behind it we may be able to sufficiently solve it before we arrive at AGI and full ASI.
I don’t think this is an intelligence vs morality matter. It seems that we need to apply AI intelligence much more directly to better understanding and solving moral questions that have thus far proved too difficult for humans. Another part of this is that we don’t need full consensus. All of the nations of the world have an extensive body of laws that not everyone agrees with but that are useful in ensuring the best welfare of their citizens. Naturally I’m not defending laws that disenfranchise various groups like women, but our system of laws shows that much can be done by agreeing upon various moral questions.
I think a lot of AI’s success with this will depend on logic and reasoning algorithms. For example 99% of Americans eat animal products notwithstanding the suffering that those animals endure in factory farms. While there may not be consensus on the cruelty of this practice, the logic and reasoning behind it being terribly cruel could not be more clear.
Yes, I do believe that we humans need to ramp up our own morality in order to better understand what AI comes up with. Perhaps we need it to also help us do that.