The need for another post explaining IDA and/or Debate is not that obvious for me, but this is a good one to point people to. My only caveat about the content comes from not discussing why debate extends the reach of human supervision (the PSPACE, NP and P part of the Debate paper, or just the intuition of verifying part of an argument, verifying a solution, and finding a solution).
Even then, (∗) remains a problem. The only way to build an outer-aligned AI system is to build an outer-aligned AI system, and we can’t do it if we don’t know how to do it.
I would have preferred an explanation of why these easier conditions do not help with outer-alignment.
Before we begin, here are other possible resources to understand IDA:
The LessWrong/Alignment Forum sequence (written by Paul Christiano)
A video by Robert Miles (who makes lots of AI-Alignment relevant youtube content)
Great of you to list other explanations and references, but it would even be better to explain why someone should read your explanation instead of any of those.
If it is not obvious why, imagine you had access to a slightly dumber version of yourself that ran at 10000 times your speed. Anytime you have a (sub)-question that does not require your full intellect, you can relegate it to this slightly dumber version and obtain an answer at once. This allows you to effectively think for longer than you otherwise could.
Great of you to list other explanations and references, but it would even be better to explain why someone should read your explanation instead of any of those.
My eternal gripe with academia is that no-one explains things the way I want them explained. I personally found every existing explanation of IDA confusing. So, the reason to read this rather than any other resource is that, if you’re anything like me, you’ll have an easier time with this post vs. any other. (I’m not sure for how many people this is true; maybe if you think more like Paul and less like me, the original sequence will make more sense.) Also, there are a bunch of resources on IDA, but not very much on Debate. And the post by Chi Nguyen hadn’t been published yet when I wrote this.
But you’re right that I’m not saying that in the post. Maybe I should. (Edit: I added a brief note in the intro.)
It’s also supposed to be an advertisement for my sequence on Factored Cognition, since the style is going to be quite similar. I.e., I was worried that a sequence of original research could seem intimidating to people who don’t have a background in AI alignment, but I think anyone who found this post easy to understand won’t have trouble with the sequence, except perhaps with the math in the first two posts.
My only caveat about the content comes from not discussing why debate extends the reach of human supervision (the PSPACE, NP and P part of the Debate paper, or just the intuition of verifying part of an argument, verifying a solution, and finding a solution).
Good point. I agree that should be mentioned explicitly. I’ll add something about it tomorrow.
For what it’s worth, I read/skimmed all of the listed IDA explanations and found this post to be the best explanation of IDA and Debate (and how they relate to each other). So thanks a lot for writing this!
The need for another post explaining IDA and/or Debate is not that obvious for me, but this is a good one to point people to. My only caveat about the content comes from not discussing why debate extends the reach of human supervision (the PSPACE, NP and P part of the Debate paper, or just the intuition of verifying part of an argument, verifying a solution, and finding a solution).
I would have preferred an explanation of why these easier conditions do not help with outer-alignment.
Great of you to list other explanations and references, but it would even be better to explain why someone should read your explanation instead of any of those.
Nice concrete example
My eternal gripe with academia is that no-one explains things the way I want them explained. I personally found every existing explanation of IDA confusing. So, the reason to read this rather than any other resource is that, if you’re anything like me, you’ll have an easier time with this post vs. any other. (I’m not sure for how many people this is true; maybe if you think more like Paul and less like me, the original sequence will make more sense.) Also, there are a bunch of resources on IDA, but not very much on Debate. And the post by Chi Nguyen hadn’t been published yet when I wrote this.
But you’re right that I’m not saying that in the post. Maybe I should. (Edit: I added a brief note in the intro.)
It’s also supposed to be an advertisement for my sequence on Factored Cognition, since the style is going to be quite similar. I.e., I was worried that a sequence of original research could seem intimidating to people who don’t have a background in AI alignment, but I think anyone who found this post easy to understand won’t have trouble with the sequence, except perhaps with the math in the first two posts.
Good point. I agree that should be mentioned explicitly. I’ll add something about it tomorrow.
For what it’s worth, I read/skimmed all of the listed IDA explanations and found this post to be the best explanation of IDA and Debate (and how they relate to each other). So thanks a lot for writing this!