Would adding some human-generated text of ‘inner monologuing’ to the dataset be a good way to do that, or is that already done? Obviously it’s done insofar as a sufficiently vast and diverse dataset invariably includes examples, but I mean moreso a dedicated dataset focused on self reasoning.
Upon finishing the previous sentence I decided that maybe that’s not such a good idea.
I think it would probably not work too well if you mean simply “dump some in like any other text”, because it would be diluted by the hundreds of billions of other tokens and much of it would be ‘wasted’ by being trained on while the model is too stupid to learn the inner monologue technique. (Given that smaller models like 80b-esque models don’t inner-monologue while larger ones like LaMDA & GPT-3 do, presumably the inner-monologue capability only emerges in the last few bits of loss separating the 80b-esque and 200b-esque models and thus fairly late in training, at the point where the 200b-esque models pass the final loss of the 80b-esque models.) If you oversampled an inner-monologue dataset, or trained on it only at the very end (~equivalent to finetuning), or did some sort of prompt-tuning, then it might work. But compared to self-distilling where you just run it on the few-shot-prompt + a bunch of questions & generate arbitrary n to then finetune on, it would be expensive to collect that data, so why do so?
Personally, I think approaches like STaR (28 March 2022) will be important: bootstrap from weak chain-of-thought reasoners to strong ones by retraining on successful inner monologues. They also implement “backward chaining”: training on monologues generated with the correct answer visible.
I don’t have much to add but I did see this interesting project for something similar using an “inner monologue” by using prompts to ask questions about the given input, and progressively building up the outputs and asking questions and reasoning about the prompt itself. This video is also an older demonstration but covers the concept quite well. I personally don’t think the system itself is well thought out in terms of alignment because this project is ultimately trying to create aligned AGI through prompts to serve certain criteria (reducing suffering, increasing prosperity, increasing understanding) which is a very simplified view of morality and human goals.
Would adding some human-generated text of ‘inner monologuing’ to the dataset be a good way to do that, or is that already done? Obviously it’s done insofar as a sufficiently vast and diverse dataset invariably includes examples, but I mean moreso a dedicated dataset focused on self reasoning.
Upon finishing the previous sentence I decided that maybe that’s not such a good idea.
I think it would probably not work too well if you mean simply “dump some in like any other text”, because it would be diluted by the hundreds of billions of other tokens and much of it would be ‘wasted’ by being trained on while the model is too stupid to learn the inner monologue technique. (Given that smaller models like 80b-esque models don’t inner-monologue while larger ones like LaMDA & GPT-3 do, presumably the inner-monologue capability only emerges in the last few bits of loss separating the 80b-esque and 200b-esque models and thus fairly late in training, at the point where the 200b-esque models pass the final loss of the 80b-esque models.) If you oversampled an inner-monologue dataset, or trained on it only at the very end (~equivalent to finetuning), or did some sort of prompt-tuning, then it might work. But compared to self-distilling where you just run it on the few-shot-prompt + a bunch of questions & generate arbitrary n to then finetune on, it would be expensive to collect that data, so why do so?
Personally, I think approaches like STaR (28 March 2022) will be important: bootstrap from weak chain-of-thought reasoners to strong ones by retraining on successful inner monologues. They also implement “backward chaining”: training on monologues generated with the correct answer visible.
I don’t have much to add but I did see this interesting project for something similar using an “inner monologue” by using prompts to ask questions about the given input, and progressively building up the outputs and asking questions and reasoning about the prompt itself. This video is also an older demonstration but covers the concept quite well. I personally don’t think the system itself is well thought out in terms of alignment because this project is ultimately trying to create aligned AGI through prompts to serve certain criteria (reducing suffering, increasing prosperity, increasing understanding) which is a very simplified view of morality and human goals.