Ben Pace comments on Critical review of Christiano’s disagreements with Yudkowsky

Ben Pace 30 Dec 2023 1:01 UTC
6 points
3
Oh. Here’s a compressed gloss of this conversation as I understand it.
1. Eliezer: People should give up on producing alignment theory directly and instead should build augmented humans to produce aligned agents.
2. Vanessa: And how do you think the augmented humans will go about doing so?
3. Eliezer: I mean, the point here is that’s a problem for them to solve. Insofar as they need my help they have not augmented enough. (Ben reads into Eliezer’s comment: And insofar as you wish to elicit my approach, I have already written over a million words on this, a few marginal guesses ain’t gonna do much.)
4. Peter: It seems wrong to not share your guesses here, people should not have to build the foundations themselves without help from Eliezer.
5. Ben: Eliezer has spent many years writing up and helping build the foundations, this comment seems very confusing given that context.
6. Said: But have Eliezer’s writings worked?
7. Ben: (I’m not quite sure what the implied relationship is of this question to the question of whether Eliezer has tried very hard to help build a foundation for alignment research, but I will answer the question directly.) To the incredibly high standard that ~nobody ever succeeds at, it has not succeeded. To the lower standard that scientists sometimes succeed at, it has worked a little but has not been sufficient.
My first guess was that you’re implicitly asking because, if it has not been successful, then you think Eliezer still ought to answer questions like Vanessa’s. I am not confident in this guess.
If you’re simply asking for clarification on whether Eliezer’s writing works to convey his philosophy and approach to the alignment problem, I have now told you two standards to which I would evaluate this question and how I think it does on those standards. Do you understand why I wrote what I wrote in my reply to Peter?
- Said Achmiz 30 Dec 2023 1:29 UTC
  7 points
  3
  Parent
  Uh… I think you’re somewhat overcomplicating things. Again, you wrote (emphasis mine):
  
  If folks want to understand how Eliezer would tackle the problem they can read the over-one-million thoughtfully written words he has published across LessWrong and Arbital and in MIRI papers about how to solve hard problems in general and how to think about AI in particular.
  
  In other words, you wrote that if people want X, they can do Y. (Implying that doing Y will, with non-trivial probability, cause people to gain X.^[1]) My question was simply whether there exists anyone who has done Y and now, as a consequence, has X.
  
  There are basically two possible answers to this:
  
  “Yes, persons A, B, C have all done Y, and now, as a consequence, have X.”
  
  Or
  
  “No, there exists no such person who has done Y and now, as a consequence, has X.”
  
  (Which may be because nobody has done Y, or it may be because some people have done Y, but have not gained X.)
  
  There’s not any need to bring in questions of standards or trying hard or… anything like that.
  
  So, to substitute the values back in for the variables, I could ask:
  
  Would you say that you “understand how Eliezer would tackle the problem”?
  
  and:
  
  Have you “read the over-one-million thoughtfully written words he has published across LessWrong and Arbital and in MIRI papers about how to solve hard problems in general and how to think about AI in particular”?
  
  Likewise, the same question about Vanessa: does she “understand … etc.”, and has she “read the … etc.”?
  
  Again, what I mean by all of these words is just whatever you meant when you wrote them.
  ↩︎
  “Implying” in the Gricean sense, of course, not in the logical sense. Strictly speaking, if we drop all Gricean assumptions, we could read your original claim as being analogous to, e.g., “if folks want to grow ten feet fall, they can paint their toenails blue”—which is entirely true, and yet does not, strictly speaking, entail any causal claims. I assumed that this kind of thing is not what you meant, because… that would make your comment basically pointless.
  - Ben Pace 31 Dec 2023 6:51 UTC
    10 points
    7
    Parent
    Ah, but I think you have assumed slightly too much in what I meant. I simply meant to say that if someone wants Eliezer’s help in their goal to “work out the foundations” to “a field of (MIRI-style) Agent Foundations” the better way to understand Eliezer’s perspective on these difficult questions is to read the high-effort and thoughtful blog posts and papers he produced that’s intended to communicate his perspective on these questions, rather than ask him for some quick guesses on how one could in-principle solve the problems. I did not mean to imply that (either) way would necessarily work, as the goal itself is hard to achieve, I simply meant that one approach is clearly much more likely than the other to achieve that goal.
    (That said, again to answer your question, my current guess is that Nate Soares is an example of a person who read those works and then came to share a lot of Eliezer’s approach to solving the problems. Though I’m honestly not sure how much he would put down to the factors of (a) reading the writing (b) working directly with Eliezer (c) trying to solve the problem himself and coming up with similar approaches. I also think that Wei Dai at least understood enough to make substantial progress on an open research question in decision theory, and similar things can be said of Scott Garrabrant re: logical uncertainty and others at MIRI.)