Since we can’t built superintelligences straight off, we have to build self-improving AIs.
A rational self-improving AI has to be motivated to become more intelligent, rational, and so on.
So rational self-improving AIs won’t have arbitrary motivations. They will be motivated to value rationality in order to become more rational.
Valuing rationality means disvaluing bias and partiality.
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments (we don’t expect
highly rational humans to say “that is a perfectly good argument, but I am going to just ignore it”).
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments for morality.
Therefore, a highly rational agent would no “just not care”. The only possible failure modes are:
1) Non existence of good rational arguments for morality (failure of objective moral cognitivism).
2) Failure of Intrinsic Motivation arising from their conceptual understanding of valid arguments for morality, ie
they understand that X is good, that they should do X, and what “should” means, but none of that adds up to a motivation to do X.
Since we can’t built superintelligences straight off, we have to build self-improving AIs.
A rational self-improving AI has to be motivated to become more intelligent, rational, and so on.
So rational self-improving AIs won’t have arbitrary motivations. They will be motivated to value rationality in order to become more rational.
Valuing rationality means disvaluing bias and partiality.
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments (we don’t expect highly rational humans to say “that is a perfectly good argument, but I am going to just ignore it”).
Therefore, a highly rational agent would not arbitrarily disregard valid rational arguments for morality.
Therefore, a highly rational agent would no “just not care”. The only possible failure modes are:
1) Non existence of good rational arguments for morality (failure of objective moral cognitivism).
2) Failure of Intrinsic Motivation arising from their conceptual understanding of valid arguments for morality, ie they understand that X is good, that they should do X, and what “should” means, but none of that adds up to a motivation to do X.