LLMs are progress towards alignment in the same way as dodging a brick is progress towards making good soup: to succeed, someone capable and motivated needs to remain alive. LLMs probably help with dodging lethal consequences of directly misaligned AGIs being handed first mover advantage of thinking faster or smarter than humans.
On its own, this is useless against transitive misalignment risk that immediately follows, of LLMs building misaligned successor AGIs. In this respect, building LLM AGIs is not helpful at all, but it’s better than building misaligned AGIs directly, because it gives LLMs a nebulous chance to somehow ward off that eventuality.
To the extent the chance of LLMs succeeding in setting up actually meaningful alignment is small, first AGIs being LLMs rather than paperclip maximizers is not that much better. Probably doesn’t even affect the time of disassembly: it’s likely successor AGIs a few steps removed either way, as making progress on software is faster than doing things in the real world.
For what it’s worth I don’t think LLMs are that much more alignable. Somewhat, but nothing to write home about. We need superplanner-proof alignment.
LLMs are progress towards alignment in the same way as dodging a brick is progress towards making good soup: to succeed, someone capable and motivated needs to remain alive. LLMs probably help with dodging lethal consequences of directly misaligned AGIs being handed first mover advantage of thinking faster or smarter than humans.
On its own, this is useless against transitive misalignment risk that immediately follows, of LLMs building misaligned successor AGIs. In this respect, building LLM AGIs is not helpful at all, but it’s better than building misaligned AGIs directly, because it gives LLMs a nebulous chance to somehow ward off that eventuality.
To the extent the chance of LLMs succeeding in setting up actually meaningful alignment is small, first AGIs being LLMs rather than paperclip maximizers is not that much better. Probably doesn’t even affect the time of disassembly: it’s likely successor AGIs a few steps removed either way, as making progress on software is faster than doing things in the real world.