Past Account comments on MDP models are determined by the agent architecture and the environmental dynamics

Past Account 29 May 2021 0:12 UTC
LW: 1 AF: -2
AF
[Deleted]
- TurnTrout 29 May 2021 14:54 UTC
  LW: 7 AF: 8
  AF Parent
  say we agree that our state abstraction needs to be model-irrelevant
  Why would we need that, and what is the motivation for “models”? The moment we give the agent sensors and actions, we’re done specifying the rewardless MDP (and its model).
  ETA: potential confusion—in some MDP theory, the “model” is a model of the environment dynamics. Eg in deterministic environments, the model is shown with a directed graph. i don’t use “model” to refer to an agent’s world model over which it may have an objective function. I should have chosen a better word, or clarified the distinction.
  a priori there should be skepticism that all tasks can be modeled with a specific state-abstraction.
  If, by “tasks”, you mean “different agent deployment scenarios”—I’m not claiming that. I’m saying that if we want to predict what happens, we:
  1. Consider the underlying environment (assumed Markovian)
  2. Consider different state/action encodings we might supply the agent.
  3. For each, fix a reward function distribution (what goals we expect to assign to the agent)
  4. See what the theory predicts.
  There’s a further claim (which seems plausible, but which I’m not yet making) that (2) won’t affect (4) very much in practice. The point of this post is that if you say “the MDP has a different model”, you’re either disagreeing with (1) the actual dynamics, or claiming that we will physically supply the agent with a different state/action encoding (2).
  I’d suspect this does generalize into a fragility/impossibility result any time the reward is given to the agent in a way that’s decoupled from the agent’s sensors which is really going to be the prominent case in practice. In conclusion, you can try to work with a variable/rewardless MDP, but then this argument will apply and severely limit the usefulness of the generic theoretical analysis.
  I don’t follow. Can you give a concrete example?
  What links here?
  - TurnTrout's comment on MDP models are determined by the agent architecture and the environmental dynamics by TurnTrout (30 May 2021 15:35 UTC; 7 points)
  - Past Account 29 May 2021 18:02 UTC
    LW: 1 AF: -2
    AF Parent
    [Deleted]
    - TurnTrout 29 May 2021 22:38 UTC
      LW: 7 AF: 8
      AF Parent
      I read your formalism, but I didn’t understand what prompted you to write it. I don’t yet see the connection to my claims.
      If so, I might try to formalize it.
      Yeah, I don’t want you to spend too much time on a bulletproof grounding of your argument, because I’m not yet convinced we’re talking about the same thing.
      In particular, if the argument’s like, “we usually express reward functions in some featurized or abstracted way, and it’s not clear how the abstraction will interact with your theorems” / “we often use different abstractions to express different task objectives”, then that’s something I’ve been thinking about but not what I’m covering here. I’m not considering practical expressibility issues over the encoded MDP: (“That’s also a claim that we can, in theory, specify reward functions which distinguish between 5 googolplex variants of red-ghost-game-over.”)
      If this doesn’t answer your objection—can you give me an english description of a situation where the objection holds? (Let’s taboo ‘model’, because it’s overloaded in this context)
      - Past Account 30 May 2021 13:21 UTC
        LW: -1 AF: -3
        AF Parent
        [Deleted]
        TurnTrout 30 May 2021 15:35 UTC
        LW: 7 AF: 8
        AF Parent
        I don’t understand your point in this exchange. I was being specific about my usage of model; I meant what I said in the original post, although I noted room for potential confusion in my comment above. However, I don’t know how you’re using the word.
        I don’t use the term model in my previous reply anyway.
        You used the word ‘model’ in both of your prior comments, and so the search-replace yields “state-abstraction-irrelevant abstractions.” Presumably not what you meant?
        I already pointed out a concrete difference: I claim it’s reasonable to say there are three alternatives while you claim there are two alternatives.
        That’s not a “concrete difference.” I don’t know what you mean when you talk about this “third alternative.” You think you have some knockdown argument—that much is clear—but it seems to me like you’re talking about a different consideration entirely. I likewise feel an urge to disengage, but if you’re interested in explaining your idea at some point, message me and we can set up a higher-bandwidth call.
        Past Account 31 May 2021 20:59 UTC
        1 point
        Parent
        [Deleted]