Oliver Sourbut comments on Think carefully before calling RL policies “agents”

Oliver Sourbut 2 Jun 2023 10:18 UTC
LW: 3 AF: 3
0
AF
Strong agree with the need for nuance. ‘Model’ is another word that gets horribly mangled a lot recently.

I think the more sensible uses of the word ‘agent’ I’ve come across are usually referring to the assemblage of a policy-under-training plus the rest of the shebang: learning method, exploration tricks of one kind or another, environment modelling (if any), planning algorithm (if any) etc. This seems more legit to me, though I still avoid using the word ‘agent’ as far as possible for similar reasons (discussed here (footnote 6) and here).

Similarly to Daniel’s response to ‘reward is not the optimization target’ I think you can be more generous in your interpretation of RL experts’ words and read less error in. That doesn’t mean that more care in communication and terminology would be preferable, which is a takeaway I strongly endorse.
- TurnTrout 5 Jun 2023 17:58 UTC
  LW: 2 AF: 2
  0
  AF Parent
  I think you can be more generous in your interpretation of RL experts’ words and read less error in.
  What other, more favorable interpretations might I consider?
  - Oliver Sourbut 8 Jun 2023 22:47 UTC
    3 points
    0
    Parent
    Oh, I mean to refer to the rest of the comment
    
    the more sensible uses of the word ‘agent’ I’ve come across...
    
    and taking that sort of reading as a kind of innocent until proven guilty.
    
    I’ll confess I was in a meeting yesterday and someone (a PhD student) made the obvious error of considering RL prerequisite to agentiness, perhaps (but not definitely) a consequence of exactly the conflation you’re referring to in this post. Several people in the room were able to clarify. The context was a crossover between a DL lab (this mentioned PhD student’s) and the safety research community in Oxford (me et al).