XiXiDu comments on Limits on self-optimisation

XiXiDu 21 Jan 2012 10:14 UTC
0 points

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification;

Isn’t this also true for unfriendly AI? Any AI has to ensure that improved versions of itself are friendly with respect to its initial values. So for each modification, or successor, it has to find a proof that it will not only respect its values but that it will do so in a way that more effectively maximizes expected utility.
- RolfAndreassen 21 Jan 2012 21:50 UTC
  3 points
  Parent
  Ah no. Friendliness is a special category of AIs, and as such is more restrictive: No AI can be Friendly whose output changes under optimisation, but an Unfriendly AI is still Unfriendly if its output changes.
- timtyler 21 Jan 2012 22:50 UTC
  0 points
  Parent
  Not really. For example, you could have a “sloppy” superintelligence that traded short term gain over the future of the universe by giving it a short planning horizon.
- TheOtherDave 21 Jan 2012 18:19 UTC
  0 points
  Parent
  The phrase “has to” is a little confusing here. Sure, any AI that doesn’t reliably preserve its value structure under self-modification risks destroying value when it self-modifies. But something can be an AI without preserving its value structure, just like we can be NIs without preserving our value structures.