cfoster0 comments on Thoughts on the impact of RLHF research

cfoster0 27 Jan 2023 18:42 UTC
4 points
2
Why should we think of base GPT as myopic, such that “non-myopic training” can remove that property? Training a policy to imitate traces of “non-myopic cognition” in the first place seems like a way to plausibly create a policy that itself has “non-myopic cognition”. But this is exactly how GPT pretraining works.