The motivation seems trivial to me, which might be part of the proble
Yeah, for a long time many people have been very confused about the motivation of Paul’s research, so I don’t think you’re typical in this regard. I think that due to this sequence, Alex Zhu’s FAQ, lots of writing by Evan, Paul’s post “What Failure Looks Like”, and more posts on LW by others, a many more people understand Paul’s work on a basic level that was not at all the case 3 years ago. Like, you say “Most training procedures are obviously outer-misaligned”, and ‘outer-alignment’ was not a concept with a name or a write-up at the time this sequence was published.
I agree that this post talks about assuming human-level performance, whereas much of iterated amplification also relaxes that assumption. My sense is that if someone were to just read this sequence, it would still help them focus on the brunt of the problem, that being ‘helpful’ or ‘useful’ is not well-defined in the way many other tasks are, and help realize the urgency of this task and why it’s possible to make progress now.
Yeah, for a long time many people have been very confused about the motivation of Paul’s research, so I don’t think you’re typical in this regard. I think that due to this sequence, Alex Zhu’s FAQ, lots of writing by Evan, Paul’s post “What Failure Looks Like”, and more posts on LW by others, a many more people understand Paul’s work on a basic level that was not at all the case 3 years ago. Like, you say “Most training procedures are obviously outer-misaligned”, and ‘outer-alignment’ was not a concept with a name or a write-up at the time this sequence was published.
I agree that this post talks about assuming human-level performance, whereas much of iterated amplification also relaxes that assumption. My sense is that if someone were to just read this sequence, it would still help them focus on the brunt of the problem, that being ‘helpful’ or ‘useful’ is not well-defined in the way many other tasks are, and help realize the urgency of this task and why it’s possible to make progress now.
That makes me feel a bit like the student who thinks they can debate the professor after having researched the field for 5 minutes.
But it would actually be a really good sign if the area has become more accessible.
Hah!
Yeah, I think we’ve made substantial progress in the last couple of years.