Explain what problem civilization is likely to face with the development of AI (i.e. we’ll be able to build AIs that solve well-defined tasks, but we don’t know how to well define the task of being ‘useful’).
Connect this problem to a clear and simple open problem, which is clear on its assumptions and helps me understand the hard parts of fixing the above problem regarding AI development.
Provides argument for the important and urgency of working on this sort of technical research.
This gives real-world motivation and a direct problem to be solved to help the real-world problem, and engages concretely with the bigger picture in a way other posts in the sequence don’t (e.g. “Clarifying AI Alignment” is a clarification and doesn’t explicitly motivate why the problem is important).
In general, especially at the time when these posts were published, when I’d read them I felt like I understood a particular detail very clearly, but I did not understand the bigger picture of why that particular detail was interesting, or why those particular assumptions should be salient, and this post helped me understand it a great deal.
I’m not sure if the above still helped? Does it make sense what experience I had reading the post? I’m also interested if other posts feel to you like they clearly motivate iterated amplification.
The motivation seems trivial to me, which might be part of the problem. Most training procedures are obviously outer-misaligned, so if we have one that may plausibly be outer-aligned (and might plausibly scale), that seems like an obvious reason to take it seriously. I’ve felt like I totally got that once I first understood what IDA is trying to do.
Does it make sense what experience I had reading the post?
It does, but it still leaves me with the problem that it doesn’t seem to be connected to the remaining sequence. IDA isn’t about how we take an already trained system with human-level performance and use it for good things; it’s about how we train a system from the ground up.
The real problem may be that I expect a post like this to closely tie into the actual scheme, so when it doesn’t I take it as evidence that I’ve misunderstood something. What this post talks about may just not be intended to be [the problem the remaining sequence is trying to solve].
The motivation seems trivial to me, which might be part of the proble
Yeah, for a long time many people have been very confused about the motivation of Paul’s research, so I don’t think you’re typical in this regard. I think that due to this sequence, Alex Zhu’s FAQ, lots of writing by Evan, Paul’s post “What Failure Looks Like”, and more posts on LW by others, a many more people understand Paul’s work on a basic level that was not at all the case 3 years ago. Like, you say “Most training procedures are obviously outer-misaligned”, and ‘outer-alignment’ was not a concept with a name or a write-up at the time this sequence was published.
I agree that this post talks about assuming human-level performance, whereas much of iterated amplification also relaxes that assumption. My sense is that if someone were to just read this sequence, it would still help them focus on the brunt of the problem, that being ‘helpful’ or ‘useful’ is not well-defined in the way many other tasks are, and help realize the urgency of this task and why it’s possible to make progress now.
Things this post does:
Explain what problem civilization is likely to face with the development of AI (i.e. we’ll be able to build AIs that solve well-defined tasks, but we don’t know how to well define the task of being ‘useful’).
Connect this problem to a clear and simple open problem, which is clear on its assumptions and helps me understand the hard parts of fixing the above problem regarding AI development.
Provides argument for the important and urgency of working on this sort of technical research.
This gives real-world motivation and a direct problem to be solved to help the real-world problem, and engages concretely with the bigger picture in a way other posts in the sequence don’t (e.g. “Clarifying AI Alignment” is a clarification and doesn’t explicitly motivate why the problem is important).
In general, especially at the time when these posts were published, when I’d read them I felt like I understood a particular detail very clearly, but I did not understand the bigger picture of why that particular detail was interesting, or why those particular assumptions should be salient, and this post helped me understand it a great deal.
I’m not sure if the above still helped? Does it make sense what experience I had reading the post? I’m also interested if other posts feel to you like they clearly motivate iterated amplification.
The motivation seems trivial to me, which might be part of the problem. Most training procedures are obviously outer-misaligned, so if we have one that may plausibly be outer-aligned (and might plausibly scale), that seems like an obvious reason to take it seriously. I’ve felt like I totally got that once I first understood what IDA is trying to do.
It does, but it still leaves me with the problem that it doesn’t seem to be connected to the remaining sequence. IDA isn’t about how we take an already trained system with human-level performance and use it for good things; it’s about how we train a system from the ground up.
The real problem may be that I expect a post like this to closely tie into the actual scheme, so when it doesn’t I take it as evidence that I’ve misunderstood something. What this post talks about may just not be intended to be [the problem the remaining sequence is trying to solve].
Yeah, for a long time many people have been very confused about the motivation of Paul’s research, so I don’t think you’re typical in this regard. I think that due to this sequence, Alex Zhu’s FAQ, lots of writing by Evan, Paul’s post “What Failure Looks Like”, and more posts on LW by others, a many more people understand Paul’s work on a basic level that was not at all the case 3 years ago. Like, you say “Most training procedures are obviously outer-misaligned”, and ‘outer-alignment’ was not a concept with a name or a write-up at the time this sequence was published.
I agree that this post talks about assuming human-level performance, whereas much of iterated amplification also relaxes that assumption. My sense is that if someone were to just read this sequence, it would still help them focus on the brunt of the problem, that being ‘helpful’ or ‘useful’ is not well-defined in the way many other tasks are, and help realize the urgency of this task and why it’s possible to make progress now.
That makes me feel a bit like the student who thinks they can debate the professor after having researched the field for 5 minutes.
But it would actually be a really good sign if the area has become more accessible.
Hah!
Yeah, I think we’ve made substantial progress in the last couple of years.