Expected return in a particular environment/distribution? Or not? If not, then you may be in a deployment context where you aren’t updating the weights anymore and so there is no expected return
I think you might be misunderstanding this? My take is that “return” is just the discounted sum of future rewards, which you can (in an idealized setting) think of as a mathematical function of the future trajectory of the system. So it’s still well-defined even when you aren’t updating weights.
I think you might be misunderstanding this? My take is that “return” is just the discounted sum of future rewards, which you can (in an idealized setting) think of as a mathematical function of the future trajectory of the system. So it’s still well-defined even when you aren’t updating weights.