V_V comments on A toy model of the control problem

V_V 18 Sep 2015 14:17 UTC
−1 points

If B does not have a model (for instance, if it’s a Q-learning agent), then it can still learn this behaviour, without knowing anything about A, simply through trial and error.

Sure, but somebody would presumably notice that B is learning to do something it is not intended to do before it manages to push all the six blocks.

You might feel that B can before this deception because it has some measure of autonomy, at least in its stunted world. We can construct models with even less autonomy. Suppose that there is another agent C, who has the same goal as B. B now is a very simple algorithm, that just pushes a designated block towards the hole. C designates the block for it to push.

I don’t think you can meaningful consider B and C separate agents in this case. B is merely a low-level subroutine while C is the high-level control program.
- Stuart_Armstrong 18 Sep 2015 14:37 UTC
  4 points
  Parent
  
  I don’t think you can meaningful consider B and C separate agents in this case. B is merely a low-level subroutine while C is the high-level control program.
  
  Which is one of the reasons that concepts like “autonomy” are so vague.