(I may promote this to a full question)
Do we actually know what’s happening when you take an LLM trained on token prediction and fine-tune is via e.g. RLHF to get something like InstructGPT or ChatGPT? The more I think about the phenomenon, the more confused I feel.
Here is a short overview: https://openai.com/blog/instruction-following/
Do please promote to a full question; I also want to know the answer.
Done: https://www.lesswrong.com/posts/eywpzHRgXTCCAi8yt/what-s-actually-going-on-in-the-mind-of-the-model-when-we
Upvoted.
(I may promote this to a full question)
Do we actually know what’s happening when you take an LLM trained on token prediction and fine-tune is via e.g. RLHF to get something like InstructGPT or ChatGPT? The more I think about the phenomenon, the more confused I feel.
Here is a short overview: https://openai.com/blog/instruction-following/
Do please promote to a full question; I also want to know the answer.
Done: https://www.lesswrong.com/posts/eywpzHRgXTCCAi8yt/what-s-actually-going-on-in-the-mind-of-the-model-when-we
Upvoted.