Max H comments on Think carefully before calling RL policies “agents”

Max H 2 Jun 2023 4:36 UTC
LW: 6 AF: 3
0
AF
Once you have a policy network and a sampling procedure, you can embody it in a system which samples the network repeatedly, and hooks up the I/O to the proper environment and actuators. Usually this involves hooking the policy into a simulation of a game environment (e.g. in a Gym), but sometimes the embodiment is an actual robot in the real world.
I think using the term “agent” for the policy itself is actually a type error, and not just misleading. I think using the term to refer to the embodied system has the correct type signature, but I agree it can be misleading, for the reasons you describe.
OTOH, I do think modelling the outward behavior of such systems by regarding them as agents with black-box internals is often useful as a predictor, and I would guess that this modelling is the origin of the use of the term in RL.
But modelling outward behavior is very different from attributing that behavior to agentic cognition within the policy itself. I think it is unlikely that any current policy networks are doing (much) agentic cognition at runtime, but I wouldn’t necessarily count on that trend continuing. So moving away from the term “agent” proactively seems like a good idea.

Anyway, I appreciate posts like this which clarify / improve standard terminology. Curious if you agree with my distinction about embodiment, and if so, if you have any better suggested term for the embodied system than “agent” or “embodiment”.
- the gears to ascension 2 Jun 2023 8:25 UTC
  12 points
  9
  Parent
  Claim: The embodied system is still not necessarily an agent, and may in failure cases not have the agency one expects it to. Any representation of what agency is needs to separate successful agency from system that is claimed to have it.
  
  Core reason: Agency is a property of pulling the future back in time; it’s when a system selects actions by conditioning on the future. Agency is when any object, even ones not structured like traditional agents, takes the shape of the future before the future does and thereby steers the future.
  
  How I came to believe this confidently: this paper, which you have probably seen but I link as pdf for reasons; anyone reading this who hasn’t seen it, I’d very strongly encourage at least skimming it. If by chance you haven’t already read it in detail, my recommended reading order if you have 20 minutes and already understand SCMs would be {1. intro} → {appendix B.} → {1.1 example, 1.2 other characterizations, 1.3 what do we consider} → skim/quick-index/first-pass {2. background, 3. algorithms, 3.1 MSCM, 3.2 labeled MCG} → read and ponder 3.3 & 3.4 and algorithms 1 and 2, then skim through assumptions in 3.5 and read algorithm 3. If you really want to get into it you can then do several more passes to properly understand the algorithms.
  
  This took me several days with multiple calls with friends, as I was new to SCMs. I’m abbreviating things so there isn’t an easy gloss of what I’m referring to without reading the paper; I can’t summarize precisely so I’m choosing to not summarize at all. Hopefully this isn’t new to @Max H, but on the off chance it is, this is my reply to describe why I disagree.
  What links here?
  - How to Slow AI Development by PeterMcCluskey (7 Jun 2023 0:29 UTC; 20 points)
  - Max H 2 Jun 2023 12:52 UTC
    1 point
    0
    Parent
    Hadn’t seen the paper, but I think I basically agree with it, and your claim.
    
    I was mainly saying something even weaker: the policy itself is just a function, so it can’t be an agent. The thing that might or might not be an agent is an embodiment of the policy by repeatedly executing it in the appropriate environment, while hooked up to (real or simulated) I/O channels.
    - the gears to ascension 2 Jun 2023 15:24 UTC
      2 points
      0
      Parent
      Interesting distinction. An agent that is asleep isn’t an agent, by this usage.
      
      By the way, are you Max H of the space rock ai thingy?
      - Max H 3 Jun 2023 0:23 UTC
        5 points
        0
        Parent
        Also, I didn’t mean for this distinction to be particularly interesting—I am still slightly concerned that it is so pedantic / boring / obvious that I’m the only one who finds it worth distinguishing at all.
        I’m literally just saying, a description of a function / mind / algorithm is a different kind of thing than the (possibly repeated) execution of that function / mind / algorithm on some substrate. If that sounds like a really deep or interesting point, I’m probably still being misunderstood.
      - Max H 2 Jun 2023 15:41 UTC
        1 point
        0
        Parent
        Interesting distinction. An agent that is asleep isn’t an agent, by this usage.
        
        Well, a sleeping person is still an embodied system, with running processes and sensors that can wake the agent up. And the agent, before falling asleep, might arrange things such that they are deliberately woken up in the future under certain circumstances (e.g. setting an alarm, arranging a guard to watch over them during their sleep).
        
        The thing I’m saying that is not an agent is more like, a static description of a mind. e.g. the source code of an AGI isn’t an agent until it is compiled and executed on some kind of substrate. I’m not a carbon (or silicon) chauvinist; I’m not picky about which substrate. But without some kind of embodiment and execution, you just have a mathematical description of a computation, the actual execution of which may or may not be computable or otherwise physically realizable within our universe.
        By the way, are you Max H of the space rock ai thingy?
        Nope, different person!
        the gears to ascension 2 Jun 2023 16:23 UTC
        2 points
        1
        Parent
        okay, perhaps sleep doesn’t cut it. I was calling the unrun policy a sleeping ai, but perhaps suspended or stopped might be better words to generalize the unrun state of a system that would be agentic when you type python inference.py and hit enter on your commandline.
- TurnTrout 5 Jun 2023 18:07 UTC
  LW: 4 AF: 3
  0
  AF Parent
  I think the embodiment distinction is interesting and hadn’t thought of it before (note that I didn’t understand your point until reading the replies to your comment). I’m not yet sure if I find this distinction worth making, though. I’d refer to the embodied system as a “trained system” or—after reading your suggestion—an “embodiment.” Neither feels quite right to me, though.