The pivotal gain-of-capability potential in LLMs is in learning skills that wouldn’t form on their own from self-supervised learning on readily available datasets, by effectively generating such datasets, either for specific skills or for everything all at once.
Until that happens, it probably doesn’t matter what LLMs are doing (even though the recent events are unabashed madness), since they wouldn’t be able to adapt to the situations that are not covered by generalization from the datasets. After that happens, they would be able to study all textbooks and research papers, leading to generation of new research, at which point access to shell would be the least of our concerns (if only ensuring absence of such access would be an option in this world).
would likely be trained to be less chatty and less emotional
by effectively generating such datasets, either for specific skills or for everything all at once
Just to be clear, what you have in mind is something to the effect of chain-of-thought (where LLMs and people deliberate through problems instead of trying to get an answer immediately or in the next few tokens), but in a more roundabout fashion, where you make the LLM deliberate a lot and fine-tune the LLM on that deliberation so that its “in the moment” (aka next token) response is more accurate—is that right?
If so, how would you correct for the hallucinatory nature of LLMs? Do they even need to be corrected for?
Since this is a capabilities-only discussion, feel free to either not respond or take it private. I just found your claim interesting since this is the first time I encountered such an idea.
Chain-of-thought for particular skills, with corrections of mistakes, to produce more reliable/appropriate chains-of-thought where it’s necessary to take many steps, and to arrive at the answer immediately when it’s possible to form intuition for doing that immediately. Basically doing your homework, for any topic where you are ready to find or make up and solve exercises, with some correction-of-mistakes and guessed-correctly-but-checked-just-in-case overhead, for as many exercises as it takes. The result is a dataset with enough worked exercises, presented in a form that lets SSL extract the skill of more reliably doing that thing, and to calibrate on how much it needs to chain-of-thought a thing to do it correctly.
A sufficiently intelligent and coherent LLM character that doesn’t yet have a particular skill would be able to follow the instructions and complete such tasks for arbitrary skills it’s ready to study. I’m guessing ChatGPT is already good enough for that, but Bing Chat shows that it could become even better without new developments. Eventually there is a “ChatGPT, study linear algebra” routine that produces a ChatGPT that can do linear algebra (or a dataset for a pretrained GPT-N to learn linear algebra out of the box), after expending some nontrivial amount of time and compute, but crucially without any other human input/effort. And the same routine works for all other topics, not just linear algebra, provided they are not too advanced to study for the current model.
So this is nothing any high schooler isn’t aware of, not much of a capability discussion. There are variants that look differently and are likely more compute-efficient, or give other benefits at the expense of more misalignment risk (because involve data further from human experience, might produce something that’s less of a human imitation), this is just the obvious upper-bound-on-difficulty variant.
I agree that recursive self-improvement can be very very bad; in this post I meant to show that we can get less-bad-but-still-bad behavior from only (LLM, REPL) combinations.
I’m saying a more specific/ominous thing than “recursive self-improvement”. It seems plausible that these days, it might only take a few years for a talented enthusiast with enough luck and compute to succeed in corraling agentic LLM characters into automatic generation of datasets that train a wide range of specified skills. Starting with a GPT-4-level pretrained model, with some supervised fine-tuning to put useful characters in control, let alone with RLAIF when that inevitably gets open-sourced, and some prompt engineering to cause the actual dataset generation. Or else starting with characters like ChatGPT, better yet its impending GPT-4-backed successor and all the copycats, with giant 32K token contexts, it might take merely prompt engineering, nothing more.
Top labs would do this better, faster, and more inevitably, with many more alternative techniques at their fingertips. Paths to generation of datasets for everything all at once (augmented pretraining) are less clear (and present a greater misalignment hazard), but lead to the same outcome more suddenly and comprehensively.
This is the salient implication of Bing Chat appearing to be even more intelligent than ChatGPT, likely sufficiently so to follow complicated requests and guidelines outlining skill-forming dataset generation, given an appropriate character that would mostly actually do the thing.
The pivotal gain-of-capability potential in LLMs is in learning skills that wouldn’t form on their own from self-supervised learning on readily available datasets, by effectively generating such datasets, either for specific skills or for everything all at once.
Until that happens, it probably doesn’t matter what LLMs are doing (even though the recent events are unabashed madness), since they wouldn’t be able to adapt to the situations that are not covered by generalization from the datasets. After that happens, they would be able to study all textbooks and research papers, leading to generation of new research, at which point access to shell would be the least of our concerns (if only ensuring absence of such access would be an option in this world).
This poses the much more important risk of giving them inhumane misaligned personalities, which lowers the chances that they end up caring for us by at least a very tiny fraction.
Just to be clear, what you have in mind is something to the effect of chain-of-thought (where LLMs and people deliberate through problems instead of trying to get an answer immediately or in the next few tokens), but in a more roundabout fashion, where you make the LLM deliberate a lot and fine-tune the LLM on that deliberation so that its “in the moment” (aka next token) response is more accurate—is that right?
If so, how would you correct for the hallucinatory nature of LLMs? Do they even need to be corrected for?
Since this is a capabilities-only discussion, feel free to either not respond or take it private. I just found your claim interesting since this is the first time I encountered such an idea.
Chain-of-thought for particular skills, with corrections of mistakes, to produce more reliable/appropriate chains-of-thought where it’s necessary to take many steps, and to arrive at the answer immediately when it’s possible to form intuition for doing that immediately. Basically doing your homework, for any topic where you are ready to find or make up and solve exercises, with some correction-of-mistakes and guessed-correctly-but-checked-just-in-case overhead, for as many exercises as it takes. The result is a dataset with enough worked exercises, presented in a form that lets SSL extract the skill of more reliably doing that thing, and to calibrate on how much it needs to chain-of-thought a thing to do it correctly.
A sufficiently intelligent and coherent LLM character that doesn’t yet have a particular skill would be able to follow the instructions and complete such tasks for arbitrary skills it’s ready to study. I’m guessing ChatGPT is already good enough for that, but Bing Chat shows that it could become even better without new developments. Eventually there is a “ChatGPT, study linear algebra” routine that produces a ChatGPT that can do linear algebra (or a dataset for a pretrained GPT-N to learn linear algebra out of the box), after expending some nontrivial amount of time and compute, but crucially without any other human input/effort. And the same routine works for all other topics, not just linear algebra, provided they are not too advanced to study for the current model.
So this is nothing any high schooler isn’t aware of, not much of a capability discussion. There are variants that look differently and are likely more compute-efficient, or give other benefits at the expense of more misalignment risk (because involve data further from human experience, might produce something that’s less of a human imitation), this is just the obvious upper-bound-on-difficulty variant.
But also, this is the sort of capability idea that doesn’t destroy the property of LLM characters being human imitations, and more time doesn’t just help with alignment, but also with unalignable AGIs. LLM characters with humane personality are the only plausible-in-practice way to produce direct, if not transitive alignment, that I’m aware of. Something with the same alignment shortcomings as humans, but sufficiently different that it might still change things for the better.
I agree that recursive self-improvement can be very very bad; in this post I meant to show that we can get less-bad-but-still-bad behavior from only (LLM, REPL) combinations.
I’m saying a more specific/ominous thing than “recursive self-improvement”. It seems plausible that these days, it might only take a few years for a talented enthusiast with enough luck and compute to succeed in corraling agentic LLM characters into automatic generation of datasets that train a wide range of specified skills. Starting with a GPT-4-level pretrained model, with some supervised fine-tuning to put useful characters in control, let alone with RLAIF when that inevitably gets open-sourced, and some prompt engineering to cause the actual dataset generation. Or else starting with characters like ChatGPT, better yet its impending GPT-4-backed successor and all the copycats, with giant 32K token contexts, it might take merely prompt engineering, nothing more.
Top labs would do this better, faster, and more inevitably, with many more alternative techniques at their fingertips. Paths to generation of datasets for everything all at once (augmented pretraining) are less clear (and present a greater misalignment hazard), but lead to the same outcome more suddenly and comprehensively.
This is the salient implication of Bing Chat appearing to be even more intelligent than ChatGPT, likely sufficiently so to follow complicated requests and guidelines outlining skill-forming dataset generation, given an appropriate character that would mostly actually do the thing.