I’m not keen on the requirement that the basin of attraction be strictly larger than the target configuration set. I don’t think this buys you much, and seems to needlessly rule out goals based on narrow maintenance of some status-quo. Switching to a utility function as suggested by others improves things, I think.
For example: a highly capable AI whose only goal is to maintain a chess set in a particular position for as long as possible, but not to care about it after it’s disturbed.
Here the target set is identical to the basin of attraction: states containing the chess set in the particular position (or histories where it’s remained undisturbed).
This doesn’t tell us anything about what the AI will do in pursuing this goal. It may not do much until something approaches the board; it may re-arrange the galaxy to minimise the chances that a piece will be moved (but arbitrarily small environmental changes might have it take very different actions, so in general we can’t say it’s optimising for some particular configuration of the galaxy).
I want to say that this system is optimising to keep the chess set undisturbed.
With utility you can easily represent this goal, and all you need to do is compare unperturbed utility with the utility under various perturbations.
Something like: The system S optimises U 𝛿-robustly to perturbation x if E[U(S)] - E[U(x(S))] < 𝛿
Great post.
I’m not keen on the requirement that the basin of attraction be strictly larger than the target configuration set. I don’t think this buys you much, and seems to needlessly rule out goals based on narrow maintenance of some status-quo. Switching to a utility function as suggested by others improves things, I think.
For example: a highly capable AI whose only goal is to maintain a chess set in a particular position for as long as possible, but not to care about it after it’s disturbed.
Here the target set is identical to the basin of attraction: states containing the chess set in the particular position (or histories where it’s remained undisturbed).
This doesn’t tell us anything about what the AI will do in pursuing this goal. It may not do much until something approaches the board; it may re-arrange the galaxy to minimise the chances that a piece will be moved (but arbitrarily small environmental changes might have it take very different actions, so in general we can’t say it’s optimising for some particular configuration of the galaxy).
I want to say that this system is optimising to keep the chess set undisturbed.
With utility you can easily represent this goal, and all you need to do is compare unperturbed utility with the utility under various perturbations.
Something like: The system S optimises U 𝛿-robustly to perturbation x if E[U(S)] - E[U(x(S))] < 𝛿