I was thinking: In pretraining they use 400,000,000,000,000 bits of information (or whatever) to sculpt the model from “every possible token string is equally likely” (or similar) to “typical internet text”. And then in RLHF they use 10,000 bits of information (or something) to sculpt the model from “typical internet text” to “annoyingly-chipper-bland-corporate-speak chatbot”. So when I say “most of the behavioral inclinations”, I guess I can quantify that as 99.99999999%? Or a different perspective is: I kinda feel like 10,000 bits (or 100,000, I don’t know what the number is) is kinda too small to build anything interesting from scratch, as opposed to tweaking the relative prominence of things that are already there. This isn’t rigorous or anything; I’m open to discussion.
(I absolutely agree that RLHF has very obvious effects on output.)
I was thinking: In pretraining they use 400,000,000,000,000 bits of information (or whatever) to sculpt the model from “every possible token string is equally likely” (or similar) to “typical internet text”. And then in RLHF they use 10,000 bits of information (or something) to sculpt the model from “typical internet text” to “annoyingly-chipper-bland-corporate-speak chatbot”. So when I say “most of the behavioral inclinations”, I guess I can quantify that as 99.99999999%? Or a different perspective is: I kinda feel like 10,000 bits (or 100,000, I don’t know what the number is) is kinda too small to build anything interesting from scratch, as opposed to tweaking the relative prominence of things that are already there. This isn’t rigorous or anything; I’m open to discussion.
(I absolutely agree that RLHF has very obvious effects on output.)