What feels wrong to me is the implicit link drawn between goal-directedness and competence. A bad Go player will usually lose, but it doesn’t seem any less goal-directed to me than a stronger one that consistently wins.
Competence is thus not the whole story. It might be useful to compute goal-directedness; reaching some lower-bound of competency might even be a necessary condition for goal-directedness (play badly enough and it becomes debatable whether you’re even trying to win). But when forcing together the two, I feel like something important is lost.
Competence, or knowledge? If something is unable to learn* the rules of the game, then even if it has winning as a goal, that knowledge doesn’t help make useful predictions (beyond ‘keeps playing’).
*If it hasn’t learned as well—when watching a beginner (AlphaZero just starting out) play chess you might say ‘what are you trying to do?’ (confusion) or even ‘no, you can’t do that’ (breaking the rules).
What does “all policies given by using RL” mean in this case? The easy answer is all policies resulting from taking any RL method and any initial conditions, and training for any amount of resources on the reward function of the goal. But not only is this really, really uncomputable, I’m not sure it’s well defined enough [(]what are “all methods of RL”?).
A good question. Unusually, it is an open parenthesis that is missing not a closed one.
The ghost of competence strikes back here, because I cannot really consider any amount of resources; if I did, then every policy would be maximally-focused for the goal, as it would be generated by taking the policy as an initial condition and using no resources at all.
Yes and no. A random initial policy could in theory be such a thing—though probabilistically it wouldn’t be (for any non-trivial task, absent hardcoding*).
*Hardcoding isn’t always optimal either (relative to the goal) - it’s feasible for solved games with small solutions though, like tic-tac-toe. Which is arguably still RL, just not on a computer.
lower bound on the amount of resources the RL algorithm has to use before the resulting policy is indeed maximally-focused.
Resources may be required for a policy to be verified to be maximally-focused. I’m not sure if things get to ‘maximally’ in practice—though superhuman performance in chess certainly seems to qualify as ‘goal-directed’ within that domain*, making the goal a useful predictor.
*What other goals are there in that domain though?
Or said another way, a non-trivial goal with a small but not negligible focus exhibits [1] goal-directedness [2] than a trivial goal with enormous focus.
[#] more
Even with all those uncertainties, I still believe focus is a step in the right direction. It trims down competence to the part that seems the most relevant to goal-directedness. That being said, I am very interested in any weakness of the idea, or any competing intuition.
From the view of knowledge, this may be easy to demonstrate as follow—show a better way, and the agent will use it. (For simple tasks/improvements.) But it’s easy for people to do this (with people) - programs, not necessarily.
Competence, or knowledge? If something is unable to learn* the rules of the game, then even if it has winning as a goal, that knowledge doesn’t help make useful predictions (beyond ‘keeps playing’).
*If it hasn’t learned as well—when watching a beginner (AlphaZero just starting out) play chess you might say ‘what are you trying to do?’ (confusion) or even ‘no, you can’t do that’ (breaking the rules).
A good question. Unusually, it is an open parenthesis that is missing not a closed one.
Yes and no. A random initial policy could in theory be such a thing—though probabilistically it wouldn’t be (for any non-trivial task, absent hardcoding*).
*Hardcoding isn’t always optimal either (relative to the goal) - it’s feasible for solved games with small solutions though, like tic-tac-toe. Which is arguably still RL, just not on a computer.
Resources may be required for a policy to be verified to be maximally-focused. I’m not sure if things get to ‘maximally’ in practice—though superhuman performance in chess certainly seems to qualify as ‘goal-directed’ within that domain*, making the goal a useful predictor.
*What other goals are there in that domain though?
[#] more
From the view of knowledge, this may be easy to demonstrate as follow—show a better way, and the agent will use it. (For simple tasks/improvements.) But it’s easy for people to do this (with people) - programs, not necessarily.