To avoid ‘messing with the parts of itself’ it needs to be able to tell whenever actions do or do not mess with parts of itself. Moving oneself to another location, is that messing with itself or is that not? In non-isotropic universe, turning around could kill you, just as excessive accelerations could kill you in our universe.
I wouldn’t doubt that in principle you could hard-code some definition of what “parts of itself” are and what constitutes messing with them into an AI, so that it can non mess with those parts without knowing what they do, the point is that this won’t scale, and will break down if the AI gets too clever.
As for self preservation in AIXI-tl, there’s a curious anthropomorphization bias at play. Suppose that the reward was −0.999 and the lack of reward was −1 . The math that AIXI will work the same, but the common sense intuition switches from the mental image of a gluttonous hedonist that protects itself, to a tortured being yearning for death. In actuality, it’s neither, math of AIXI does not account for the destruction of the physical machinery in question one way or the other—it is neither a reward, nor lack of reward, it simply never happens in it’s model. Calling one value “reward” and other “absence of reward” makes us wrongfully assume that destruction of the machinery corresponds to the latter.
To avoid ‘messing with the parts of itself’ it needs to be able to tell whenever actions do or do not mess with parts of itself. Moving oneself to another location, is that messing with itself or is that not? In non-isotropic universe, turning around could kill you, just as excessive accelerations could kill you in our universe.
I wouldn’t doubt that in principle you could hard-code some definition of what “parts of itself” are and what constitutes messing with them into an AI, so that it can non mess with those parts without knowing what they do, the point is that this won’t scale, and will break down if the AI gets too clever.
As for self preservation in AIXI-tl, there’s a curious anthropomorphization bias at play. Suppose that the reward was −0.999 and the lack of reward was −1 . The math that AIXI will work the same, but the common sense intuition switches from the mental image of a gluttonous hedonist that protects itself, to a tortured being yearning for death. In actuality, it’s neither, math of AIXI does not account for the destruction of the physical machinery in question one way or the other—it is neither a reward, nor lack of reward, it simply never happens in it’s model. Calling one value “reward” and other “absence of reward” makes us wrongfully assume that destruction of the machinery corresponds to the latter.