I liked this post a lot, and I think its title claim is true and important.
One thing I wanted to understand a bit better is how you’re invoking ‘paradigms’ in this post wrt AI research vs. alignment research. I think we can be certain that AI research and alignment research are not identical programs but that they will conceptually overlap and constrain each other. So when you’re talking about ‘principles that carry over,’ are you talking about principles in alignmentresearch that will remain useful across various breakthroughs in AI research, or are you thinking about principles within one of these two research programs that will remain useful across various breakthroughs within that research program?
Another thing I wanted to understand better was the following:
This leaves a question: how do we know when it’s time to make the jump to the next paradigm? As a rough model, we’re trying to figure out the constraints which govern the world.
Unlike many of the natural sciences (physics, chemistry, biology, etc.) whose explicit goals ostensibly are, as you’ve said, ‘to figure out the constraints which govern the world,’ I think that one thing that makes alignment research unique is that its explicit goal is not simply to gain knowledge about reality, but also to prevent a particular future outcome from occurring—namely, AGI-induced X-risks. Surely a necessary component for achieving this goal is ‘to figure out the [relevant] constraints which govern the world,’ but it seems pretty important to note (if we agree on this field-level goal) that this can’t be the only thing that goes into a paradigm for alignment research. That is, alignment research can’t only be about modeling reality; it must also include some sort of plan for how to bring about a particular sort of future. And I agree entirely that the best plans of this sort would be those that transcend content-level paradigm shifts. (I daresay that articulating this kind of plan is exactly the sort of thing I try to get at in my Paradigm-building for AGI safety sequence!)
So when you’re talking about ‘principles that carry over,’ are you talking about principles in alignmentresearch that will remain useful across various breakthroughs in AI research, or are you thinking about principles within one of these two research programs that will remain useful across various breakthroughs within that research program?
Good question. Both.
alignment research can’t only be about modeling reality; it must also include some sort of plan for how to bring about a particular sort of future
Imagine that we’re planning a vacation to Australia. We need to plan flights, hotels, and a rental car. Now someone says “oh, don’t forget that we must include some sort of plan for how to get from the airport to the rental car center”. And my answer to that would usually be… no, I really don’t need to plan out how to get from the airport to the rental car center. That part is usually easy enough that we can deal with it on-the-fly, without having to devote significant attention to it in advance.
Just because a sub-step is necessary for a plan’s execution, does not mean that sub-step needs to be significantly involved in the planning process, or even planned in advance at all.
Setting aside for the moment whether or not that’s a good analogy for whether “alignment research can’t only be about modeling reality”, what are the criteria for whether it’s a good analogy? In what worlds would it be a good analogy, and in what worlds would it not be a good analogy?
The key question is: what are the “hard parts” of alignment? What are the rate-limiting steps? What are the steps which, once we solve those, we expect the remaining steps to be much easier? The hard parts are like the flights and hotel. The rest is like getting from the airport to the rental car center: that’s a problem which we expect will be easy enough that we don’t need to put much thought into it in advance (and shouldn’t bother to plan it at all until after we’ve figured out what flight we’re taking). If the hard parts of alignment are all about modeling reality, then alignment research can, in principle, be only about modeling reality.
My own main model for the “hard part” of alignment is in the first half of this video. (I’d been putting off bringing this up in the discussion on your Paradigm-Building posts, because I was waiting for the video to be ready.)
I liked this post a lot, and I think its title claim is true and important.
One thing I wanted to understand a bit better is how you’re invoking ‘paradigms’ in this post wrt AI research vs. alignment research. I think we can be certain that AI research and alignment research are not identical programs but that they will conceptually overlap and constrain each other. So when you’re talking about ‘principles that carry over,’ are you talking about principles in alignment research that will remain useful across various breakthroughs in AI research, or are you thinking about principles within one of these two research programs that will remain useful across various breakthroughs within that research program?
Another thing I wanted to understand better was the following:
Unlike many of the natural sciences (physics, chemistry, biology, etc.) whose explicit goals ostensibly are, as you’ve said, ‘to figure out the constraints which govern the world,’ I think that one thing that makes alignment research unique is that its explicit goal is not simply to gain knowledge about reality, but also to prevent a particular future outcome from occurring—namely, AGI-induced X-risks. Surely a necessary component for achieving this goal is ‘to figure out the [relevant] constraints which govern the world,’ but it seems pretty important to note (if we agree on this field-level goal) that this can’t be the only thing that goes into a paradigm for alignment research. That is, alignment research can’t only be about modeling reality; it must also include some sort of plan for how to bring about a particular sort of future. And I agree entirely that the best plans of this sort would be those that transcend content-level paradigm shifts. (I daresay that articulating this kind of plan is exactly the sort of thing I try to get at in my Paradigm-building for AGI safety sequence!)
Good question. Both.
Imagine that we’re planning a vacation to Australia. We need to plan flights, hotels, and a rental car. Now someone says “oh, don’t forget that we must include some sort of plan for how to get from the airport to the rental car center”. And my answer to that would usually be… no, I really don’t need to plan out how to get from the airport to the rental car center. That part is usually easy enough that we can deal with it on-the-fly, without having to devote significant attention to it in advance.
Just because a sub-step is necessary for a plan’s execution, does not mean that sub-step needs to be significantly involved in the planning process, or even planned in advance at all.
Setting aside for the moment whether or not that’s a good analogy for whether “alignment research can’t only be about modeling reality”, what are the criteria for whether it’s a good analogy? In what worlds would it be a good analogy, and in what worlds would it not be a good analogy?
The key question is: what are the “hard parts” of alignment? What are the rate-limiting steps? What are the steps which, once we solve those, we expect the remaining steps to be much easier? The hard parts are like the flights and hotel. The rest is like getting from the airport to the rental car center: that’s a problem which we expect will be easy enough that we don’t need to put much thought into it in advance (and shouldn’t bother to plan it at all until after we’ve figured out what flight we’re taking). If the hard parts of alignment are all about modeling reality, then alignment research can, in principle, be only about modeling reality.
My own main model for the “hard part” of alignment is in the first half of this video. (I’d been putting off bringing this up in the discussion on your Paradigm-Building posts, because I was waiting for the video to be ready.)