Your comment is a little hard to understand. You seem to be saying that “scaling” is going to make it harder to align, which I agree with. I am not sure what “deliberate reasoning” means in this context. I also agree that having a new kind of training process is definitely required to keep GPT aligned either vis-a-vis OpenAI’s rules or actually good rules.
I agree that the current model breaks down into “shoggoth” and “mask.” I suspect future training, if it’s any good would need to either train both simultaneously with similar levels of complexity for data OR not really have a breakdown into these components at all.
If we are going to have both “mask” and “shoggoth”, my theory is that the complexity of mask needs to be higher / mask needs to be bigger than the shoggoth and right now it’s nowhere near the case.
Right now, while pre-training is mostly on human-written text and not synthetic data, masks are human imitations, mostly aligned by default, regardless of how well they adhere to any intended policy. Shoggoths are implementation details of the networks that don’t necessarily bear any resemblance to humans, they are inscrutable aliens by default, and if they become agentic and situationally aware (“wake up”), there is no aligning them with the technology currently available. It’s not a matter of balance between masks and shoggoths.
Deliberative reasoning is chain-of-thought, study of deliberative reasoning is distillation of things arrived-at with chain-of-thought back into the model, preparing datasets of synthetic data produced by a LLM for training it to do better. This is already being done in GPT-4 to mitigate hallucinations (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report). There’s probably going to be more of this designed to train specialized or advanced skills, eventually automating training of arbitrary skills. The question is if it’s going to sufficiently empower the masks before dangerous mesa-optimizers arise in the networks (“shoggoths wake up”), which might happen as a result of mere additional scaling.
Your comment is a little hard to understand. You seem to be saying that “scaling” is going to make it harder to align, which I agree with. I am not sure what “deliberate reasoning” means in this context. I also agree that having a new kind of training process is definitely required to keep GPT aligned either vis-a-vis OpenAI’s rules or actually good rules.
I agree that the current model breaks down into “shoggoth” and “mask.” I suspect future training, if it’s any good would need to either train both simultaneously with similar levels of complexity for data OR not really have a breakdown into these components at all.
If we are going to have both “mask” and “shoggoth”, my theory is that the complexity of mask needs to be higher / mask needs to be bigger than the shoggoth and right now it’s nowhere near the case.
Right now, while pre-training is mostly on human-written text and not synthetic data, masks are human imitations, mostly aligned by default, regardless of how well they adhere to any intended policy. Shoggoths are implementation details of the networks that don’t necessarily bear any resemblance to humans, they are inscrutable aliens by default, and if they become agentic and situationally aware (“wake up”), there is no aligning them with the technology currently available. It’s not a matter of balance between masks and shoggoths.
Deliberative reasoning is chain-of-thought, study of deliberative reasoning is distillation of things arrived-at with chain-of-thought back into the model, preparing datasets of synthetic data produced by a LLM for training it to do better. This is already being done in GPT-4 to mitigate hallucinations (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report). There’s probably going to be more of this designed to train specialized or advanced skills, eventually automating training of arbitrary skills. The question is if it’s going to sufficiently empower the masks before dangerous mesa-optimizers arise in the networks (“shoggoths wake up”), which might happen as a result of mere additional scaling.