mattmacdermott comments on Ayn Rand’s model of “living money”; and an upside of burnout

mattmacdermott 2 Dec 2024 18:48 UTC
3 points
0
Under that definition you end up saying that what are usually called ‘model-free’ RL algorithms like Q-learning are model-based. E.g. in Connect 4, once you’ve learned that getting 3 in a row has a high value, you get credit for taking actions that lead to 3 in a row, even if you ultimately lose the game.

I think it is kinda reasonable to call Q-learning model-based, to be fair, since you can back out a lot of information about the world from the Q-values with little effort.
- abramdemski 2 Dec 2024 21:13 UTC
  2 points
  0
  Parent
  Ah, yeah, sorry. I do think about this distinction more than I think about the actual model-based vs model-free distinction as defined in ML. Are there alternative terms you’d use if you wanted to point out this distinction? Maybe policy-gradient vs … not policy-gradient?
  - mattmacdermott 3 Dec 2024 8:16 UTC
    1 point
    0
    Parent
    Not sure. I guess you also have to exclude policy gradient methods that make use of learned value estimates. “Learned evaluation vs sampled evaluation” is one way you could say it.
    
    Model-based vs model-free does feel quite appropriate, shame it’s used for a narrower kind of model in RL. Not sure if it’s used in your sense in other contexts.