Rohin Shah comments on Goals and short descriptions

Rohin Shah 2 Jul 2020 19:10 UTC
LW: 3 AF: 3
AF
Planned summary for the Alignment Newsletter:
This post argues that a distinguishing factor of goal-directed policies is that they have low Kolmogorov complexity, relative to e.g. a lookup table that assigns a randomly selected action to each observation. It then relates this to quantilizers (AN #48 ) and <@mesa optimization@>(@Risks from Learned Optimization in Advanced Machine Learning Systems@).
Planned opinion:
This seems reasonable to me as an aspect of goal-directedness. Note that it is not a sufficient condition. For example, the policy that always chooses action A has extremely low complexity, but I would not call it goal-directed.
- Michele Campolo 3 Jul 2020 9:58 UTC
  LW: 1 AF: 1
  AF Parent
  The others in the AISC group and I discussed the example that you mentioned more than once. I agree with you that such an agent is not goal-directed, mainly because it doesn’t do anything to ensure that it will be able to perform action A even if adverse events happen.
  It is still true that action A is a short description of the behaviour of that agent and one could interpret action A as its goal, although the agent is not good at pursuing it (“robustness” could be an appropriate term to indicate what the agent is lacking).
  - adamShimi 3 Jul 2020 18:32 UTC
    LW: 1 AF: 1
    AF Parent
    Maybe the criterion that removes this specific policy is locality? What I mean is that this policy has a goal only on its output (which action it chooses), and thus a very local goal. Since the intuition of goals as short descriptions assumes that goals are “part of the world”, maybe this only applies to non-local goals.
    - Michele Campolo 4 Jul 2020 8:55 UTC
      LW: 1 AF: 1
      AF Parent
      I wouldn’t say goals as short descriptions are necessarily “part of the world”.
      Anyway, locality definitely seems useful to make a distinction in this case.