johnswentworth comments on Look For Principles Which Will Carry Over To The Next Paradigm

johnswentworth 21 Feb 2022 7:16 UTC
3 points
So when you’re talking about ‘principles that carry over,’ are you talking about principles in alignment research that will remain useful across various breakthroughs in AI research, or are you thinking about principles within one of these two research programs that will remain useful across various breakthroughs within that research program?
Good question. Both.
alignment research can’t only be about modeling reality; it must also include some sort of plan for how to bring about a particular sort of future
Imagine that we’re planning a vacation to Australia. We need to plan flights, hotels, and a rental car. Now someone says “oh, don’t forget that we must include some sort of plan for how to get from the airport to the rental car center”. And my answer to that would usually be… no, I really don’t need to plan out how to get from the airport to the rental car center. That part is usually easy enough that we can deal with it on-the-fly, without having to devote significant attention to it in advance.
Just because a sub-step is necessary for a plan’s execution, does not mean that sub-step needs to be significantly involved in the planning process, or even planned in advance at all.
Setting aside for the moment whether or not that’s a good analogy for whether “alignment research can’t only be about modeling reality”, what are the criteria for whether it’s a good analogy? In what worlds would it be a good analogy, and in what worlds would it not be a good analogy?
The key question is: what are the “hard parts” of alignment? What are the rate-limiting steps? What are the steps which, once we solve those, we expect the remaining steps to be much easier? The hard parts are like the flights and hotel. The rest is like getting from the airport to the rental car center: that’s a problem which we expect will be easy enough that we don’t need to put much thought into it in advance (and shouldn’t bother to plan it at all until after we’ve figured out what flight we’re taking). If the hard parts of alignment are all about modeling reality, then alignment research can, in principle, be only about modeling reality.
My own main model for the “hard part” of alignment is in the first half of this video. (I’d been putting off bringing this up in the discussion on your Paradigm-Building posts, because I was waiting for the video to be ready.)