paulfchristiano comments on On how various plans miss the hard bits of the alignment challenge

paulfchristiano 13 Jul 2022 17:52 UTC
LW: 33 AF: 16
4
AF
I don’t feel like this is right (though I think this duality feels like a real thing that is important sometimes and is interesting to think about, so appreciated the comment).
ARC is spending its time right now (i) trying to write down concrete algorithms that solve ELK using heuristic arguments, and then trying to produce concrete examples in which they do the wrong thing, (ii) trying to write down concrete formalizations of heuristic arguments that have the desiderata needed for those algorithms to work, and trying to identify cases in which our algorithms don’t yet meet those desiderata or they may be unachievable. The output is just actual code which is purported to solve major difficulties in alignment.
And on the flip side, I spend a significant amount of my time looking at the algorithms we are proposing (and the bigger plans into which they would fit if successful) and trying to find the best arguments I can that these plans will fail.
I think that the disagreement is more about what kind of concreteness is possible or desirable in this domain.
Put differently: I’m not saying that Nate and Eliezer are vague about problems but concrete about solutions, I’m saying they are vague about everything. And I don’t think they are saying that I’m concrete about problems but vague about solutions, they would say that I’m concrete about parts of the solution/problem that don’t matter while systematically pushing all the difficulty into the parts I’m still vague about.
I do think “how well do we understand the problem” seems like a pretty big crux; that leads Nate and Eliezer to think that I’m avoiding the predictably-important difficulty, and it leads me to think that Nate and Eliezer need to get more concrete in order to have an accurate picture of what’s going on.
- Richard_Ngo 13 Jul 2022 18:12 UTC
  LW: 3 AF: 3
  0
  AF Parent
  Yeah, my comment was sloppily phrased; I agree with “I think that the disagreement is more about what kind of concreteness is possible or desirable in this domain.”