Specifically, after self-supervised pretraining, an LLM outputs exactly the thing that it expects to see. (After RLHF, that is no longer strictly true, but RLHF is just a fine-tuning step, most of the behavioral inclinations are coming from pretraining IMO.)
Qualitatively the differences between a purely predictively-trained LLM and one after RLHF seems quite large (e.g., see the comparison between GPT-3 and InstructGPT examples from OpenAI).
I was thinking: In pretraining they use 400,000,000,000,000 bits of information (or whatever) to sculpt the model from “every possible token string is equally likely” (or similar) to “typical internet text”. And then in RLHF they use 10,000 bits of information (or something) to sculpt the model from “typical internet text” to “annoyingly-chipper-bland-corporate-speak chatbot”. So when I say “most of the behavioral inclinations”, I guess I can quantify that as 99.99999999%? Or a different perspective is: I kinda feel like 10,000 bits (or 100,000, I don’t know what the number is) is kinda too small to build anything interesting from scratch, as opposed to tweaking the relative prominence of things that are already there. This isn’t rigorous or anything; I’m open to discussion.
(I absolutely agree that RLHF has very obvious effects on output.)
Qualitatively the differences between a purely predictively-trained LLM and one after RLHF seems quite large (e.g., see the comparison between GPT-3 and InstructGPT examples from OpenAI).
I was thinking: In pretraining they use 400,000,000,000,000 bits of information (or whatever) to sculpt the model from “every possible token string is equally likely” (or similar) to “typical internet text”. And then in RLHF they use 10,000 bits of information (or something) to sculpt the model from “typical internet text” to “annoyingly-chipper-bland-corporate-speak chatbot”. So when I say “most of the behavioral inclinations”, I guess I can quantify that as 99.99999999%? Or a different perspective is: I kinda feel like 10,000 bits (or 100,000, I don’t know what the number is) is kinda too small to build anything interesting from scratch, as opposed to tweaking the relative prominence of things that are already there. This isn’t rigorous or anything; I’m open to discussion.
(I absolutely agree that RLHF has very obvious effects on output.)