Lukas_Gloor comments on Discussion with Eliezer Yudkowsky on AGI interventions

Lukas_Gloor 11 Nov 2021 19:27 UTC
LW: 43 AF: 3
AF
I share the impression that the agent foundations research agenda seemed not that important. But that point doesn’t feel sufficient to argue that Eliezer’s pessimism about the current state of alignment research is just a face-saving strategy his brain tricked him into adopting. (I’m not saying you claimed that it is sufficient; probably a lot of other data points are factoring into your judgment.) MIRI have deprioritized agent foundations research for quite a while now. I also just think it’s extremely common for people to have periods where they work on research that eventually turns out to be not that important; the interesting thing is to see what happens when that becomes more apparent. I immediately trust people more if I see that they are capable of pivoting and owning up to past mistakes, and I could imagine that MIRI deserves a passing grade on this, even though I also have to say that I don’t know how exactly they nowadays think about prioritization in 2017 and earlier.

I really like Vaniver’s comment further below:
For what it’s worth, my sense is that EY’s track record is best in 1) identifying problems and 2) understanding the structure of the alignment problem.
And, like, I think it is possible that you end up in situations where the people who understand the situation best end up the most pessimistic about it.
I’m very far away from confident that Eliezer’s pessimism is right, but it seems plausible to me. Of course, some people might be in the epistemic position of having tried to hash out that particular disagreement on the object level and have concluded that Eliezer’s pessimism is misguided – I can’t comment on that. I’m just saying that based on what I’ve read, which is pretty much every post and comment on AI alignment on LW and the EA forum, I don’t get the impression that Eliezer’s pessimism is clearly unfounded.

Everyone’s views look like they are suspiciously shaped to put themselves and their efforts into a good light. If someone believed that their work isn’t important or their strengths aren’t very useful, they wouldn’t do the work and wouldn’t cultivate the strengths. That applies to Eliezer, but it also applies to the people who think alignment will likely be easy. I feel like people in the latter group would likely be inconvenienced (in terms of the usefulness of their personal strengths or the connections they’ve built in the AI industry, or past work they’ve done), too, if it turned out not to be.

Just to give an example on the sorts of observations that make me think Eliezer/”MIRI” could have a point:
- I don’t know what happened with a bunch of safety people leaving OpenAI but it’s at least possible to me that it involved some people having had negative updates on the feasibility of a certain type of strategy that Eliezer criticized early on here. (I might be totally wrong about this interpretation because I haven’t talked to anyone involved.)
- I thought it was interesting when Paul noted that our civilization’s Covid response was a negative update for him on the feasibility of AI alignment. Kudos to him for noting the update, but also: Isn’t that exactly the sort of misprediction one shouldn’t be making if one confidently thinks alignment is likely to succeed? (That said, my sense is that Paul isn’t even at the most optimistic end of people in the alignment community.)
- A lot of the work in the arguments for alignment being easy seems to me to be done by dubious analogies that assume that AI alignment is relevantly similar to risky technologies that we’ve already successfully invented. People seem insufficiently quick to get to the actual crux with MIRI, which makes me think they might not be great at passing the Ideological Turing Test. When we get to the actual crux, it’s somewhere deep inside the domain of predicting the training conditions for AGI, which feels like the sort of thing Eliezer might be good at thinking about. Other people might also be good at thinking about this, but then why do they often start their argument with dubious analogies to past technologies that seem to miss the point?
  [Edit: I may be strawmanning some people here. I have seen direct discussions about the likelihood of treacherous turns vs. repeated early warnings of alignment failure. I didn’t have a strong opinion either way, but it’s totally possible that some people feel like they understand the argument and confidently disagree with Eliezer’s view there.]
- adamShimi 12 Nov 2021 16:13 UTC
  LW: 7 AF: 3
  AF Parent
  That’s an awesome comment, thanks!
  But that point doesn’t feel sufficient to argue that Eliezer’s pessimism about the current state of alignment research is just a face-saving strategy his brain tricked him into adopting. (I’m not saying you claimed that it is sufficient; probably a lot of other data points are factoring into your judgment.)
  I get why you take that from my rant, but that’s not really what I meant. I’m more criticizing the “everything is doomed but let’s not give concrete feedback to people” stance, and I think part of it comes from believing for so long (and maybe still believing) that their own approach was the only non-fake one. Also just calling everyone else a faker is quite disrespectful and not helping.
  I also just think it’s extremely common for people to have periods where they work on research that eventually turns out to be not that important; the interesting thing is to see what happens when that becomes more apparent. I immediately trust people more if I see that they are capable of pivoting and owning up to past mistakes, and I could imagine that MIRI deserves a passing grade on this, even though I also have to say that I don’t know how exactly they nowadays think about prioritization in 2017 and earlier.
  MIRI does have some positive points for changing their minds, but also some negative points IMO for taking so long to change their mind. Not sure what the total is.
  I’m very far away from confident that Eliezer’s pessimism is right, but it seems plausible to me. Of course, some people might be in the epistemic position of having tried to hash out that particular disagreement on the object level and have concluded that Eliezer’s pessimism is misguided – I can’t comment on that. I’m just saying that based on what I’ve read, which is pretty much every post and comment on AI alignment on LW and the EA forum, I don’t get the impression that Eliezer’s pessimism is clearly unfounded.
  Here again, it’s not so much that I disagree with EY about there being problems in the current research proposals. I expect that some of the problems he would point out are ones I see too. I just don’t get the transition from “there are problems with all our current ideas” to “everyone is faking working on alignment and we’re all doomed”.
  Everyone’s views look like they are suspiciously shaped to put themselves and their efforts into a good light. If someone believed that their work isn’t important or their strengths aren’t very useful, they wouldn’t do the work and wouldn’t cultivate the strengths. That applies to Eliezer, but it also applies to the people who think alignment will likely be easy. I feel like people in the latter group would likely be inconvenienced (in terms of the usefulness of their personal strengths or the connections they’ve built in the AI industry, or past work they’ve done), too, if it turned out not to be.
  Very good point. That being said, many of the more prosaic alignment people changed their minds multiple times, whereas on these specific questions I feel EY and MIRI didn’t except when forced by tremendous pressure, which makes me believe that this criticism applies more to them. But that’s one point where having some more knowledge of the internal debates at MIRI could make me change my mind completely.
  I don’t know what happened with a bunch of safety people leaving OpenAI but it’s at least possible to me that it involved some people having had negative updates on the feasibility of a certain type of strategy that Eliezer criticized early on here. (I might be totally wrong about this interpretation because I haven’t talked to anyone involved.)
  My impression from talking with people (but not having direct confirmation from the people who left) was far more that OpenAI was focusing the conceptual safety team on ML work and the other safety team on making sure GPT-3 was not racist, which was not the type of work they were really excited about. But I might also be totally wrong about this.
  I thought it was interesting when Paul noted that our civilization’s Covid response was a negative update for him on the feasibility of AI alignment. Kudos to him for noting the update, but also: Isn’t that exactly the sort of misprediction one shouldn’t be making if one confidently thinks alignment is likely to succeed? (That said, my sense is that Paul isn’t even at the most optimistic end of people in the alignment community.)
  I’m confused about your question, because what you describe sounds like a misprediction that makes sense? Also I feel that in this case, there’s a different between solving the coordination problem of having people implement the solution or not go on a race (which looks indeed harder in the light of Covid management) and solving the technical problem, which is orthogonal to Covid response.
  - Rob Bensinger 12 Nov 2021 17:09 UTC
    LW: 11 AF: 6
    AF Parent
    My impression from talking with people (but not having direct confirmation from the people who left) was far more that OpenAI was focusing the conceptual safety team on ML work and the other safety team on making sure GPT-3 was not racist, which was not the type of work they were really excited about. But I might also be totally wrong about this.
    Interesting! This is quite different from the second-hand accounts I heard. (I assume we’re touching different parts of the elephant.)