I agree that recursive self-improvement can be very very bad; in this post I meant to show that we can get less-bad-but-still-bad behavior from only (LLM, REPL) combinations.
I’m saying a more specific/ominous thing than “recursive self-improvement”. It seems plausible that these days, it might only take a few years for a talented enthusiast with enough luck and compute to succeed in corraling agentic LLM characters into automatic generation of datasets that train a wide range of specified skills. Starting with a GPT-4-level pretrained model, with some supervised fine-tuning to put useful characters in control, let alone with RLAIF when that inevitably gets open-sourced, and some prompt engineering to cause the actual dataset generation. Or else starting with characters like ChatGPT, better yet its impending GPT-4-backed successor and all the copycats, with giant 32K token contexts, it might take merely prompt engineering, nothing more.
Top labs would do this better, faster, and more inevitably, with many more alternative techniques at their fingertips. Paths to generation of datasets for everything all at once (augmented pretraining) are less clear (and present a greater misalignment hazard), but lead to the same outcome more suddenly and comprehensively.
This is the salient implication of Bing Chat appearing to be even more intelligent than ChatGPT, likely sufficiently so to follow complicated requests and guidelines outlining skill-forming dataset generation, given an appropriate character that would mostly actually do the thing.
I agree that recursive self-improvement can be very very bad; in this post I meant to show that we can get less-bad-but-still-bad behavior from only (LLM, REPL) combinations.
I’m saying a more specific/ominous thing than “recursive self-improvement”. It seems plausible that these days, it might only take a few years for a talented enthusiast with enough luck and compute to succeed in corraling agentic LLM characters into automatic generation of datasets that train a wide range of specified skills. Starting with a GPT-4-level pretrained model, with some supervised fine-tuning to put useful characters in control, let alone with RLAIF when that inevitably gets open-sourced, and some prompt engineering to cause the actual dataset generation. Or else starting with characters like ChatGPT, better yet its impending GPT-4-backed successor and all the copycats, with giant 32K token contexts, it might take merely prompt engineering, nothing more.
Top labs would do this better, faster, and more inevitably, with many more alternative techniques at their fingertips. Paths to generation of datasets for everything all at once (augmented pretraining) are less clear (and present a greater misalignment hazard), but lead to the same outcome more suddenly and comprehensively.
This is the salient implication of Bing Chat appearing to be even more intelligent than ChatGPT, likely sufficiently so to follow complicated requests and guidelines outlining skill-forming dataset generation, given an appropriate character that would mostly actually do the thing.