Hmmm… not sure if this is exactly what I want. I’d prefer not to assume too much about the environment dynamics. Not sure if this is related to what you’re talking about, but one possibility, maybe, for a way in which you could do model-based planning with an explicit reward function but without assuming much about the environment dynamics could be to learn all the dynamics necessary to do model-based planning in a model-free way (like MuZero) except for the reward function and then include the reward function explicitly.
Hmmm… not sure if this is exactly what I want. I’d prefer not to assume too much about the environment dynamics. Not sure if this is related to what you’re talking about, but one possibility, maybe, for a way in which you could do model-based planning with an explicit reward function but without assuming much about the environment dynamics could be to learn all the dynamics necessary to do model-based planning in a model-free way (like MuZero) except for the reward function and then include the reward function explicitly.