This post argues that a distinguishing factor of goal-directed policies is that they have low Kolmogorov complexity, relative to e.g. a lookup table that assigns a randomly selected action to each observation. It then relates this to quantilizers (AN #48 ) and <@mesa optimization@>(@Risks from Learned Optimization in Advanced Machine Learning Systems@).
Planned opinion:
This seems reasonable to me as an aspect of goal-directedness. Note that it is not a sufficient condition. For example, the policy that always chooses action A has extremely low complexity, but I would not call it goal-directed.
The others in the AISC group and I discussed the example that you mentioned more than once. I agree with you that such an agent is not goal-directed, mainly because it doesn’t do anything to ensure that it will be able to perform action A even if adverse events happen.
It is still true that action A is a short description of the behaviour of that agent and one could interpret action A as its goal, although the agent is not good at pursuing it (“robustness” could be an appropriate term to indicate what the agent is lacking).
Maybe the criterion that removes this specific policy is locality? What I mean is that this policy has a goal only on its output (which action it chooses), and thus a very local goal. Since the intuition of goals as short descriptions assumes that goals are “part of the world”, maybe this only applies to non-local goals.
Planned summary for the Alignment Newsletter:
Planned opinion:
The others in the AISC group and I discussed the example that you mentioned more than once. I agree with you that such an agent is not goal-directed, mainly because it doesn’t do anything to ensure that it will be able to perform action A even if adverse events happen.
It is still true that action A is a short description of the behaviour of that agent and one could interpret action A as its goal, although the agent is not good at pursuing it (“robustness” could be an appropriate term to indicate what the agent is lacking).
Maybe the criterion that removes this specific policy is locality? What I mean is that this policy has a goal only on its output (which action it chooses), and thus a very local goal. Since the intuition of goals as short descriptions assumes that goals are “part of the world”, maybe this only applies to non-local goals.
I wouldn’t say goals as short descriptions are necessarily “part of the world”.
Anyway, locality definitely seems useful to make a distinction in this case.