These are all arguments about the limit; whether or not they’re relevant depends on whether they apply to the regime of “smart enough to automate alignment research”.
Agreed. Are you aware of any work that attempts to answer this question? Does this work look like work on debate? (not rhetorical questions!)
My guess is that work likely to address this does not look like work on debate. Therefore my current position remains: don’t bother working on debate; rather work on understanding the fundamentals that might tell you when it’ll break.
The world won’t be short of debate schemes. It’ll be short of principled arguments for their safe application.
These are all arguments about the limit; whether or not they’re relevant depends on whether they apply to the regime of “smart enough to automate alignment research”.
Agreed.
Are you aware of any work that attempts to answer this question?
Does this work look like work on debate?
(not rhetorical questions!)
My guess is that work likely to address this does not look like work on debate.
Therefore my current position remains: don’t bother working on debate; rather work on understanding the fundamentals that might tell you when it’ll break.
The world won’t be short of debate schemes.
It’ll be short of principled arguments for their safe application.