A few thoughts: If length is the issue, then replacing “leads” with “led” would reflect the reality.
I don’t have an issue with titles like ”...Improving safety...” since it has a [this is what this line of research is aiming at] vibe, rather than a [this is what we have shown] vibe. Compare “curing cancer using x” to “x cures cancer”. Also in that particular case your title doesn’t suggest [we have achieved AI control]. I don’t think it’s controversial that control would improve safety, if achieved.
I agree that this isn’t a huge deal in general—however, I do think it’s usually easy to fix: either a [name a process, not a result] or a [say what happened, not what you guess it implies] approach is pretty general.
Also agreed that improving summaries is more important. Quite hard to achieve given the selection effects: [x writes a summary on y] tends to select for [x is enthusiastic about y] and [x has time to write a summary]. [x is enthusiastic about y] in turn selects for [x misunderstands y to be more significant than it is].
Improving this situation deserves thought and effort, but seems hard. Great communication from the primary source is clearly a big plus (not without significant time cost, I’m sure). I think your/Buck’s posts on the control stuff are commendably clear and thorough.
I expect the paper itself is useful (I’ve still not read it). In general I’d like the focus to be on understanding where/how/why debate fails—both in the near-term cases, and the more exotic cases (though I expect the latter not to look like debate-specific research). It’s unsurprising that it’ll work most of the time in some contexts. Completely fine for [show a setup that works] to be the first step, of course—it’s just not the interesting bit.
Thanks for the thoughtful response.
A few thoughts:
If length is the issue, then replacing “leads” with “led” would reflect the reality.
I don’t have an issue with titles like ”...Improving safety...” since it has a [this is what this line of research is aiming at] vibe, rather than a [this is what we have shown] vibe. Compare “curing cancer using x” to “x cures cancer”.
Also in that particular case your title doesn’t suggest [we have achieved AI control]. I don’t think it’s controversial that control would improve safety, if achieved.
I agree that this isn’t a huge deal in general—however, I do think it’s usually easy to fix: either a [name a process, not a result] or a [say what happened, not what you guess it implies] approach is pretty general.
Also agreed that improving summaries is more important. Quite hard to achieve given the selection effects: [x writes a summary on y] tends to select for [x is enthusiastic about y] and [x has time to write a summary]. [x is enthusiastic about y] in turn selects for [x misunderstands y to be more significant than it is].
Improving this situation deserves thought and effort, but seems hard. Great communication from the primary source is clearly a big plus (not without significant time cost, I’m sure). I think your/Buck’s posts on the control stuff are commendably clear and thorough.
I expect the paper itself is useful (I’ve still not read it). In general I’d like the focus to be on understanding where/how/why debate fails—both in the near-term cases, and the more exotic cases (though I expect the latter not to look like debate-specific research). It’s unsurprising that it’ll work most of the time in some contexts. Completely fine for [show a setup that works] to be the first step, of course—it’s just not the interesting bit.