In machine-learning terms, this is the difference between model-free learning (reputation based on success/failure record alone) and model-based learning (reputation can be gained for worthy failed attempts, or lost for foolish lucky wins).
Under that definition you end up saying that what are usually called ‘model-free’ RL algorithms like Q-learning are model-based. E.g. in Connect 4, once you’ve learned that getting 3 in a row has a high value, you get credit for taking actions that lead to 3 in a row, even if you ultimately lose the game.
I think it is kinda reasonable to call Q-learning model-based, to be fair, since you can back out a lot of information about the world from the Q-values with little effort.
Ah, yeah, sorry. I do think about this distinction more than I think about the actual model-based vs model-free distinction as defined in ML. Are there alternative terms you’d use if you wanted to point out this distinction? Maybe policy-gradient vs … not policy-gradient?
Not sure. I guess you also have to exclude policy gradient methods that make use of learned value estimates. “Learned evaluation vs sampled evaluation” is one way you could say it.
Model-based vs model-free does feel quite appropriate, shame it’s used for a narrower kind of model in RL. Not sure if it’s used in your sense in other contexts.
In machine-learning terms, this is the difference between model-free learning (reputation based on success/failure record alone) and model-based learning (reputation can be gained for worthy failed attempts, or lost for foolish lucky wins).
Under that definition you end up saying that what are usually called ‘model-free’ RL algorithms like Q-learning are model-based. E.g. in Connect 4, once you’ve learned that getting 3 in a row has a high value, you get credit for taking actions that lead to 3 in a row, even if you ultimately lose the game.
I think it is kinda reasonable to call Q-learning model-based, to be fair, since you can back out a lot of information about the world from the Q-values with little effort.
Ah, yeah, sorry. I do think about this distinction more than I think about the actual model-based vs model-free distinction as defined in ML. Are there alternative terms you’d use if you wanted to point out this distinction? Maybe policy-gradient vs … not policy-gradient?
Not sure. I guess you also have to exclude policy gradient methods that make use of learned value estimates. “Learned evaluation vs sampled evaluation” is one way you could say it.
Model-based vs model-free does feel quite appropriate, shame it’s used for a narrower kind of model in RL. Not sure if it’s used in your sense in other contexts.