Joe Collman comments on Focus: you are allowed to be bad at accomplishing your goals

Joe Collman 6 Jun 2020 18:50 UTC
4 points
A few thoughts:
I think rather than saying “The focus of S towards G is F”, I’d want to say something like “S is consistent with a focus F towards G”. In particular, any S is currently going to count as maximally focused towards many goals. Saying it’s maximally focusing on each of them feels strange. Saying its actions are consistent with maximal focus on any one of them feels more reasonable.
Maybe enough resource for all state values or state-action pairs value to have been updated at least once?
This seems either too strict (if we’re directly updating state values), or not strict enough (if we’re indirectly updating).
E.g. if we have to visit all states in Go, that’s too strict: not because it’s intractable, but because once you’ve visited all those states you’ll be extremely capable. If we’re finding a sequence v(i) of value function approximations for Go, then it’s not strict enough. E.g. requiring only that for each state S we can find N such that there are some v(i)(S) != v(j)(S) with i, j < N.
I don’t yet see a good general condition.

Another issue I have is with goals of the form “Do A or B”, and systems that are actually focused on A. I’m not keen on saying they’re maximally focused on “A or B”. E.g. I don’t want to say that a system that’s focused on fetching me bananas is maximally focused on the goal “Fetch me bananas or beat me at chess”.
Perhaps it’d be better to define G not as a set of states in one fixed environment, but as a function from environments to sets of states? (was this your meaning anyway? IIRC this is close to one of Michele’s setups)
This way you can say that my policy is focused if for any given environment, it’s close to the outcome of non-trivial RL training within that environment. (probably you’d define a system’s focus as 1/(max distance from Pol over all environments))
So in my example that would include environments with no bananas, and a mate-in-one position on the chess board.
This might avoid some of the issues with trivially maximally focused policies: they’d be maximally focused over RL training in some environments (e.g. those where goal states weren’t ever reached), but not over all. So by defining G over a suitably large class of environments, and taking a minimum over per-environment focus values, you might get a reasonable result.
Typo: “valued 1 at states in and 0...” should be “valued 1 at states in G and 0...”
- adamShimi 9 Jun 2020 14:39 UTC
  3 points
  Parent
  I think rather than saying “The focus of S towards G is F”, I’d want to say something like “S is consistent with a focus F towards G”. In particular, any S is currently going to count as maximally focused towards many goals. Saying it’s maximally focusing on each of them feels strange. Saying its actions are consistent with maximal focus on any one of them feels more reasonable.
  Honestly I don’t really care about the words used, more the formalism behind it. I personally don’t have any problem with saying that the system is maximally focused on multiple goals—I see focus as measuring “what proportion of my actions are coherent with trying to accomplish the goal”. But if many people find this weird, I’m okay with changing the phrasing.
  E.g. if we have to visit all states in Go, that’s too strict: not because it’s intractable, but because once you’ve visited all those states you’ll be extremely capable. If we’re finding a sequence v(i) of value function approximations for Go, then it’s not strict enough. E.g. requiring only that for each state S we can find N such that there are some v(i)(S) != v(j)(S) with i, j < N.
  I don’t yet see a good general condition.
  Yes, as I mentioned in another comment, I’m not convinced anymore by this condition. And I don’t have a decent alternative yet.
  Perhaps it’d be better to define G not as a set of states in one fixed environment, but as a function from environments to sets of states? (was this your meaning anyway? IIRC this is close to one of Michele’s setups)
  This way you can say that my policy is focused if for any given environment, it’s close to the outcome of non-trivial RL training within that environment. (probably you’d define a system’s focus as 1/(max distance from Pol over all environments))
  I like this idea, although I fail to see how it “solves” your problem with “A and B”. I think I get the intuition: in some environments, it will be easier to reach B than A. And if your system aims towards A instead of “A and B”, this might make it less focused towards “A and B” in these environments. But even then, the fact remains that $\forall G^{'} \supseteq G$ , the focus towards $G^{'}$ is always greater or equal to the focus towards $G$ . This is why I stand by my measure of triviality, or more intuitively a weight inversely proportional to the size of the goal.