DanielFilan comments on What exactly is GPT-3′s base objective?

DanielFilan 11 Nov 2021 0:17 UTC
LW: 2 AF: 1
AF

Expected return in a particular environment/distribution? Or not? If not, then you may be in a deployment context where you aren’t updating the weights anymore and so there is no expected return

I think you might be misunderstanding this? My take is that “return” is just the discounted sum of future rewards, which you can (in an idealized setting) think of as a mathematical function of the future trajectory of the system. So it’s still well-defined even when you aren’t updating weights.