Vanessa Kosoy comments on MIRI announces new “Death With Dignity” strategy

Vanessa Kosoy 3 Apr 2022 15:19 UTC
50 points
I’m more optimistic than Yudkowsky^[1], and I want to state what I think are the reasons for the different conclusions (I’m going to compare my own reasoning to my understanding of Yudkowsky’s reasoning, and the latter might be flawed), in a nutshell.
- Yudkowsky seems very pessimistic about alignment of anything resembling deep learning, and also believes that deep learning leads to TAI pretty soon. I’m both more optimistic about aligning deep learning and more skeptical of TAI soon.
- Optimism about deep learning: There has been considerable progress in theoretical understanding of deep learning. This understanding is far from complete, but also the problem doesn’t seem intractable. I think that we will have pretty good theory in a decade, more likely than not^[2].
- Skepticism of TAI soon: My own models of AGI include qualitative elements that current systems don’t have. It is possible that the gap will be resolved soon, but also possible that a new “AI winter” will eventually result.
- Yudkowsky seems to believe we are pretty far from a good theory of rational agents. On the other hand, I have a model of how this theory will look like, and a concrete pathway towards constructing it.
- These differences seem to be partly caused by different assumptions regarding which mathematical tools are appropriate. MIRI have been very gung-ho about using logic and causal networks. At the same time they mostly ignored learning theory. These (IMO biased) preconceptions about what the correct theory should look like, combined with failure to make sufficient progress, led to an overly pessimistic view of overall tractability.
- MIRI’s recruiting was almost entirely targeted at the sort of people who would accept their pre-existing methods and assumptions. I suspect this created inertia and groupthink.
To be clear, I am deeply grateful to Yudkowsky in MIRI for the work they did and continue to do, not to mention funding my own work. I am voicing some criticism only because transparency is essential for success, as the OP makes clear.
1. ↩︎
  My rough estimate of the success probability is 30%, but I haven’t invested much effort into calibrating this.
2. ↩︎
  It is possible that doom will come sooner, but it doesn’t seem overwhelmingly likely.
What links here?
- Safety timelines: How long will it take to solve alignment? by Esben Kran (EA Forum; 19 Sep 2022 12:51 UTC; 45 points)
- Safety timelines: How long will it take to solve alignment? by Esben Kran (19 Sep 2022 12:53 UTC; 37 points)
- Eliezer Yudkowsky 3 Apr 2022 20:01 UTC
  35 points
  Parent
  MIRI have been very gung-ho about using logic and causal networks. At the same time they mostly ignored learning theory.
  I’ll remark in passing that I disagree with this characterization of events. We looked under some street lights where the light was better, because we didn’t think that others blundering around in the dark were really being that helpful—including because of the social phenomenon where they blundered around until a bad solution Goodharted past their blurry filters; we wanted to train people up in domains where wrong answers could be recognized as that by the sort of sharp formal criteria that inexperienced thinkers can still accept as criticism.
  That was explicitly the idea at the time.
  - Vanessa Kosoy 4 Apr 2022 8:45 UTC
    19 points
    Parent
    Thanks for responding, Eliezer.
    
    I’m not sure to what extent you mean that (i) your research programme was literally a training exercise for harder challenges ahead vs (ii) your research programme was born of despair: looking under a street light had a better chance of success even though the keys were not especially likely to be there.
    
    If you mean (i), then what made you give up on this plan? From my perspective, the training exercise played its role and perhaps outlived its usefulness, why not move on beyond it?
    
    If you mean (ii), then why such pessimism from the get-go? I imagine you reasoning along the lines of: developing the theory of rational agency is a difficult problem with little empirical feedback in early stages, hence it requires nigh impossible precision of reasoning. But, humanity actually has a not-bad track record in this type of questions in the last century. VNM, game theory, the Church-Turing thesis, information theory, complexity theory, Solomonoff induction: all these are examples of similar problems (creating a mathematical theory starting from an imprecise concept without much empirical data to help) in which we made enormous progress. They also look like they are steps towards the theory of rational agents itself. So, we “just” need to add more chapters to this novel, not do something entirely unprecedented^[1]. Maybe your position is that the previous parts were done by geniuses who are unmatched in our generation because of lost cultural DNA?
    
    I think that the “street light” was truly useful to better define multiple relevant problems (Newcombian decision problems, Vingean reflection, superrationality...), but it was not where the solutions are.
    
    Another thing is, IMO (certain type of) blundering in the dark is helpful. In practice, science often doesn’t progress in a straight line from problem to solution. People try all sorts of things, guided partly by concrete problems and partly by sheer curiosity, some of those work out, some of those don’t work out, some of those lead to something entirely unexpected. As results accumulate, paradigms crystallize and it becomes clear which models were “True Names”^[2] and which were blunders. And, yes, maybe we don’t have time for this. But I’m not so sure.
    
    ↩︎
    That is, the theory of rational agency wouldn’t be unprecedented. The project of dodging AI risk as a whole certainly has some “unprecedetedness” about it.
    
    ↩︎
    Borrowing the term from John Wentworth.
  - dxu 3 Apr 2022 20:44 UTC
    17 points
    Parent
    While we happen to be on the topic: can I ask whether (a) you’ve been keeping up with Vanessa’s work on infra-Bayesianism, and if so, whether (b) you understand it well enough to have any thoughts on it? It sounds (and has sounded for quite a while) like Vanessa is proposing this as an alternative theoretical foundation for agency / updating, and also appears to view this as significantly more promising than the stuff MIRI has been doing (as is apparent from e.g. remarks like this):
    Optimism about deep learning: There has been considerable progress in theoretical understanding of deep learning. This understanding is far from complete, but also the problem doesn’t seem intractable. I think that we will have pretty good theory in a decade, more likely than not[...]
    Yudkowsky seems to believe we are pretty far from a good theory of rational agents. On the other hand, I have a model of how this theory will look like, and a concrete pathway towards constructing it.
    Ideally I (along with anyone else interested in this field) would be well-placed to evaluate Vanessa’s claims directly; in practice it seems that very few people are able to do so, and consequently infra-Bayesianism has received very little discussion on LW/AF (though my subjective impression of the discussion it has received is that those who discuss it seem to be reasonably impressed with / enthusiastic about it).
    So, as long as one of the field’s founding members happens to be on LW handing out takes… could I ask for a take on infra-Bayesianism?
    (You’ve stated multiple times that you find other people’s work unpromising; this by implication suggests also that infra-Bayesianism is one of the things you find unpromising, but only if you’ve been paying enough attention to it to have an assessment. It seems like infra-Bayesianism has flew under the radar of a lot of people, though, so I’m hesitantly optimistic that it may have underflew your radar as well.)