While I agree that outer objective, training data and prior should be considered together, I disagree that it makes the inner alignment problem dissolve except for manipulation of the search. In principle, if you could indeed ensure through a smart choice of these three parameters that there is only one global optimum, only “bad” (meaning high loss) local minima, and that your search process will always reach the global optimum, then I would agree that the inner alignment problem disappears.
But answering “what do we even want?” at this level of precision seems basically impossible. I expect that it’s pretty much equivalent to specifying exactly the result we want, which we are quite unable to do in general.
So my perspective is that the inner alignment problem appears because of inherent limits into our outer alignment capabilities. And that in realistic settings where we cannot rule out multiple very good local minima, the sort of reasoning underpinning the inner alignment discussion is the best approach we have to address such problems.
That being said, I’m not sure how this view interacts with yours or Evan’s, or if this is a very standard use of the terms. But since that’s part of the discussion Abram is pushing, here is how I use these terms.
While I agree that outer objective, training data and prior should be considered together, I disagree that it makes the inner alignment problem dissolve except for manipulation of the search. In principle, if you could indeed ensure through a smart choice of these three parameters that there is only one global optimum, only “bad” (meaning high loss) local minima, and that your search process will always reach the global optimum, then I would agree that the inner alignment problem disappears.
But answering “what do we even want?” at this level of precision seems basically impossible. I expect that it’s pretty much equivalent to specifying exactly the result we want, which we are quite unable to do in general.
So my perspective is that the inner alignment problem appears because of inherent limits into our outer alignment capabilities. And that in realistic settings where we cannot rule out multiple very good local minima, the sort of reasoning underpinning the inner alignment discussion is the best approach we have to address such problems.
That being said, I’m not sure how this view interacts with yours or Evan’s, or if this is a very standard use of the terms. But since that’s part of the discussion Abram is pushing, here is how I use these terms.