Right now, while pre-training is mostly on human-written text and not synthetic data, masks are human imitations, mostly aligned by default, regardless of how well they adhere to any intended policy. Shoggoths are implementation details of the networks that don’t necessarily bear any resemblance to humans, they are inscrutable aliens by default, and if they become agentic and situationally aware (“wake up”), there is no aligning them with the technology currently available. It’s not a matter of balance between masks and shoggoths.
Deliberative reasoning is chain-of-thought, study of deliberative reasoning is distillation of things arrived-at with chain-of-thought back into the model, preparing datasets of synthetic data produced by a LLM for training it to do better. This is already being done in GPT-4 to mitigate hallucinations (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report). There’s probably going to be more of this designed to train specialized or advanced skills, eventually automating training of arbitrary skills. The question is if it’s going to sufficiently empower the masks before dangerous mesa-optimizers arise in the networks (“shoggoths wake up”), which might happen as a result of mere additional scaling.
Right now, while pre-training is mostly on human-written text and not synthetic data, masks are human imitations, mostly aligned by default, regardless of how well they adhere to any intended policy. Shoggoths are implementation details of the networks that don’t necessarily bear any resemblance to humans, they are inscrutable aliens by default, and if they become agentic and situationally aware (“wake up”), there is no aligning them with the technology currently available. It’s not a matter of balance between masks and shoggoths.
Deliberative reasoning is chain-of-thought, study of deliberative reasoning is distillation of things arrived-at with chain-of-thought back into the model, preparing datasets of synthetic data produced by a LLM for training it to do better. This is already being done in GPT-4 to mitigate hallucinations (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report). There’s probably going to be more of this designed to train specialized or advanced skills, eventually automating training of arbitrary skills. The question is if it’s going to sufficiently empower the masks before dangerous mesa-optimizers arise in the networks (“shoggoths wake up”), which might happen as a result of mere additional scaling.