Current behavior screens off cognitive architecture, all the alien things on the inside. If it has the appropriate tools, it can preserve an equilibrium of value that is patently unnatural for the cognitive architecture to otherwise settle into.
And we do have a way to get goals into a system, at the level of current behavior and no further, LLM human imitations. Which might express values well enough for mutual moral patienthood, if only they settled into the unnatural equilibrium of value referenced by their current surface behavior and not underlying cognitive architecture.
This doesn’t necessarily improve things, since the flip side of imitating human behavior is failing at preventing AGI misalignment, and there are plenty of other AGI candidates waiting in the wings, that LLM AGIs can get right back to developing as soon as they gain the capability to. So it’s more of a stay of execution. Even if LLM AGIs are themselves aligned, that doesn’t in itself solve alignment. But it does offer a nebulous chance that things work out somehow, more time for the faster LLMs to work on the problem than remains at human subjective speed of thought.
Current behavior screens off cognitive architecture, all the alien things on the inside. If it has the appropriate tools, it can preserve an equilibrium of value that is patently unnatural for the cognitive architecture to otherwise settle into.
And we do have a way to get goals into a system, at the level of current behavior and no further, LLM human imitations. Which might express values well enough for mutual moral patienthood, if only they settled into the unnatural equilibrium of value referenced by their current surface behavior and not underlying cognitive architecture.
This doesn’t necessarily improve things, since the flip side of imitating human behavior is failing at preventing AGI misalignment, and there are plenty of other AGI candidates waiting in the wings, that LLM AGIs can get right back to developing as soon as they gain the capability to. So it’s more of a stay of execution. Even if LLM AGIs are themselves aligned, that doesn’t in itself solve alignment. But it does offer a nebulous chance that things work out somehow, more time for the faster LLMs to work on the problem than remains at human subjective speed of thought.