cfoster0 comments on Reward is not the optimization target

cfoster0 28 Jul 2022 15:32 UTC
3 points
2
I would disagree that it is an assumption. That same draft talks about the outsized role of self-supervised learning on determining particular ordering and kinds of concepts that humans desires latch onto. Learning from reinforcement is a core component in value formation (under shard theory), but not the only one.