I also wouldn’t be too surprised if in some domains RL leads to useful agents if all the individual actions are known to and doable by the model and RL teaches it how to sensibly string these actions together. This doesn’t seem too different from mathematical derivations.
I think that is exactly right.
I also wouldn’t be too surprised if in some domains RL leads to useful agents if all the individual actions are known to and doable by the model and RL teaches it how to sensibly string these actions together. This doesn’t seem too different from mathematical derivations.