A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification;
Isn’t this also true for unfriendly AI? Any AI has to ensure that improved versions of itself are friendly with respect to its initial values. So for each modification, or successor, it has to find a proof that it will not only respect its values but that it will do so in a way that more effectively maximizes expected utility.
Ah no. Friendliness is a special category of AIs, and as such is more restrictive: No AI can be Friendly whose output changes under optimisation, but an Unfriendly AI is still Unfriendly if its output changes.
Not really. For example, you could have a “sloppy” superintelligence that traded short term gain over the future of the universe by giving it a short planning horizon.
The phrase “has to” is a little confusing here. Sure, any AI that doesn’t reliably preserve its value structure under self-modification risks destroying value when it self-modifies. But something can be an AI without preserving its value structure, just like we can be NIs without preserving our value structures.
Isn’t this also true for unfriendly AI? Any AI has to ensure that improved versions of itself are friendly with respect to its initial values. So for each modification, or successor, it has to find a proof that it will not only respect its values but that it will do so in a way that more effectively maximizes expected utility.
Ah no. Friendliness is a special category of AIs, and as such is more restrictive: No AI can be Friendly whose output changes under optimisation, but an Unfriendly AI is still Unfriendly if its output changes.
Not really. For example, you could have a “sloppy” superintelligence that traded short term gain over the future of the universe by giving it a short planning horizon.
The phrase “has to” is a little confusing here. Sure, any AI that doesn’t reliably preserve its value structure under self-modification risks destroying value when it self-modifies. But something can be an AI without preserving its value structure, just like we can be NIs without preserving our value structures.