It might also be a crux for alignment, since scalable alignment schemes like IDA and Debate rely on “task decomposition”, which seems closely related to “planning” and “reasoning”. I’ve been wondering about the slow pace of progress of IDA and Debate. Maybe it’s part of the same phenomenon as the underwhelming results of AutoGPT and BabyAGI?
If that’s the case (which seems very plausible) then it seems like we’ll either get progress on both LLM-based AGI and IDA/Debate, or on neither. That seems like a relatively good situation; those approaches will work for alignment if & only if we need them (to whatever extent they would have worked in the absence of this consideration).
There’s two other ways for things to go wrong though:
AI capabilities research switches attention from LLM (back) to RL. (There was a lot of debate in the early days of IDA about whether it would be competitive with RL, and part of that was about whether all the important tasks we want a highly capable AI to do could be broken down easily enough and well enough.)
The task decomposition part starts working well enough, but Eliezer’s (and others’) concern about “preserving alignment while amplifying capabilities” proves valid.
It might also be a crux for alignment, since scalable alignment schemes like IDA and Debate rely on “task decomposition”, which seems closely related to “planning” and “reasoning”. I’ve been wondering about the slow pace of progress of IDA and Debate. Maybe it’s part of the same phenomenon as the underwhelming results of AutoGPT and BabyAGI?
If that’s the case (which seems very plausible) then it seems like we’ll either get progress on both LLM-based AGI and IDA/Debate, or on neither. That seems like a relatively good situation; those approaches will work for alignment if & only if we need them (to whatever extent they would have worked in the absence of this consideration).
There’s two other ways for things to go wrong though:
AI capabilities research switches attention from LLM (back) to RL. (There was a lot of debate in the early days of IDA about whether it would be competitive with RL, and part of that was about whether all the important tasks we want a highly capable AI to do could be broken down easily enough and well enough.)
The task decomposition part starts working well enough, but Eliezer’s (and others’) concern about “preserving alignment while amplifying capabilities” proves valid.