I wonder what would happen if we “amplified” reasoning like this, as in HCH, IDA, Debate, etc.
Do we understand reasoning well enough to ensure that this class of errors can avoided in AI alignment schemes that depend on human reasoning, or to ensure that this class of errors will be reliably self-corrected as the AI scales up?
I wonder what would happen if we “amplified” reasoning like this, as in HCH, IDA, Debate, etc.
Do we understand reasoning well enough to ensure that this class of errors can avoided in AI alignment schemes that depend on human reasoning, or to ensure that this class of errors will be reliably self-corrected as the AI scales up?
This is not an “error” per se. It’s a baseline, outside-view argument presented in lay terms.