In practical terms, it’s very hard to change the intuitive opinions of people on this, even after many philosophical arguments. Those statements of mine don’t touch the subject. For that the literature should be read, for instance the essay I wrote about it. But if we consider general superintelligences, then they could easily understand it and put it coherently into practice. It seems that this can be naturally expected, except perhaps in practice under some specific cases of human intervention.
Their motivation (or what they care about) should be in line with their rationality. This doesn’t happen with humans because we have evolutionarily selected and primitive motivations, coupled with a weak rationality, but should not happen with much more intelligent and designed (possibly self-modifying) agents. Logically, one should care about what one’s rationality tells.
Since we can’t built superintelligences straight off, we have to build self-improving AIs.
A rational self-improving AI has to be motivated to become more intelligent, rational, and so on.
So rational self-improving AIs won’t have arbitrary motivations. They will be motivated to value rationality in order to become more rational.
Valuing rationality means disvaluing bias and partiality.
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments (we don’t expect
highly rational humans to say “that is a perfectly good argument, but I am going to just ignore it”).
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments for morality.
Therefore, a highly rational agent would no “just not care”. The only possible failure modes are:
1) Non existence of good rational arguments for morality (failure of objective moral cognitivism).
2) Failure of Intrinsic Motivation arising from their conceptual understanding of valid arguments for morality, ie
they understand that X is good, that they should do X, and what “should” means, but none of that adds up to a motivation to do X.
In practical terms, it’s very hard to change the intuitive opinions of people on this, even after many philosophical arguments. Those statements of mine don’t touch the subject. For that the literature should be read, for instance the essay I wrote about it. But if we consider general superintelligences, then they could easily understand it and put it coherently into practice. It seems that this can be naturally expected, except perhaps in practice under some specific cases of human intervention.
Yet, as the eminent philosopher Jos Whedon observed, “Yeah… but [they] don’t care!”
Their motivation (or what they care about) should be in line with their rationality. This doesn’t happen with humans because we have evolutionarily selected and primitive motivations, coupled with a weak rationality, but should not happen with much more intelligent and designed (possibly self-modifying) agents. Logically, one should care about what one’s rationality tells.
Since we can’t built superintelligences straight off, we have to build self-improving AIs.
A rational self-improving AI has to be motivated to become more intelligent, rational, and so on.
So rational self-improving AIs won’t have arbitrary motivations. They will be motivated to value rationality in order to become more rational.
Valuing rationality means disvaluing bias and partiality.
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments (we don’t expect highly rational humans to say “that is a perfectly good argument, but I am going to just ignore it”).
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments for morality.
Therefore, a highly rational agent would no “just not care”. The only possible failure modes are:
1) Non existence of good rational arguments for morality (failure of objective moral cognitivism).
2) Failure of Intrinsic Motivation arising from their conceptual understanding of valid arguments for morality, ie they understand that X is good, that they should do X, and what “should” means, but none of that adds up to a motivation to do X.