Ben Pace comments on 2019 AI Alignment Literature Review and Charity Comparison

Ben Pace 19 Dec 2019 5:45 UTC
34 points
Well I’m in love. Thanks so much for doing this again, this was great, I learned a lot, yadda yadda yadda, let me talk about MIRI and Secrecy.
I don’t think we can give people money just because they say they are doing good things, because of the risk of abuse. There are many other reasons for not publishing anything. Some simple alternative hypothesis include “we failed to produce anything publishable” or “it is fun to fool ourselves into thinking we have exciting secrets” or “we are doing bad things and don’t want to get caught.” The fact that MIRI’s researchers appear intelligent suggest they at least think they are doing important and interesting issues, but history has many examples of talented reclusive teams spending years working on pointless stuff in splendid isolation.
And later on you say that it’s hard to donate to them “in good faith”, “given their lack of disclosure”.
Here’s my quick thoughts:
- In the last 2-5 years I endorsed donating to MIRI (and still do), and my reasoning back then was always of the type “I don’t understand their technical research, but I have read a substantial amount of the philosophy and worldview that was used to successfully pluck that problem out of the space of things to work on, and think it is deeply coherent and sensible and it’s been surprisingly successful in figuring out AI is an x-risk, and I expect to find it is doing very sensible things in places I understand less well.” Then, about a year ago, MIRI published the Embdded Agency sequence, and for the first time I thought “Oh, now I feel like I have an understanding of what the technical research is guided by, and what it’s about, and indeed, this makes a lot of sense.” My feelings have rarely been changed by reading ongoing research papers at MIRI, which were mostly just very confusing to me. They all seemed individually interesting, but I didn’t see the broader picture before Embedded Agency.
- At no time has any important part of my reasoning been “I don’t understand their reasoning, but they’re situated inside a broader intellectual space that is running a number of checks of the basics of their work, and can evaluate the quality of the work, so I don’t have to evaluate the project myself.” I’ve never really thought there were experts in this space I could defer to; academia seemed terribly slow and confused about the problem. Similarly to how companies rarely argue that they shouldn’t exist while they’re making money, fields are very bad at arguing that they shouldn’t exist whilst the research is progressing, which is the sort of work that would be required to realise that AGI is by-default an existential catastrophe and that capabilities must be dropped whilst alignment work needs doing. It’s always been a question of whether I trust the epistemology of the leadership at MIRI to think clearly about these problems, and what I understand of how they think.
- During the time that MIRI was non-non-disclosed (i.e. open), I don’t think there was substantially more checking of their work than there is now. Most of academia and industry has/had no interest. There were LWers like Paul Christiano and Wei Dai and Stuart Armstrong that engaged, and I’m sad that these people aren’t able to engage with the present work. But at the time Paul and Wei were basically saying “I don’t get why this is going to work out” so it’s not like MIRI could start getting much more negative feedback, unless Paul and Wei were going to say “This obviously fails and I can prove how,” which I assign little probability to.
- Also, you mention hypotheses like “we failed to produce anything publishable” or “it is fun to fool ourselves into thinking we have exciting secrets” or “we are doing bad things and don’t want to get caught.” Again, I don’t think this is likely to be the cause of MIRI doing non-disclosed work, because they had policies like this since a long time before when there were real results. For example, when Scott Garrabrant came up with logical inductors, MIRI spent (according to this post) almost 6 months figuring out whether to disclose it. So I expect the reasoning to be fairly consistent with the same decisions in the past.
- Note that Nate used the new non-disclosed research strategy to solve the initial formulation of the tiling agents problem, something he mentions here (search ‘tiling agents’). Now, he’s not published it yet, but, I mean, I definitely trust he’s telling the truth. Maybe it turned out to not be as useful as hoped (as with Logical Inductors), but it still seems like progress that’s similar in kind to the progress that happened before MIRI was non-disclosed by default.
- Oh also, they’ve finally been published for decision theory in a prestigious philosophy journal, so a number of highly skeptical outside views should now not be concerned. Overall it seems to me like from many such perspectives they’re clearly now in a stronger place than they were 2 years ago. They used to be confusing and had no results and not recognised by academia. Now they are less confusing and have some fascinating results and are recognised by academia.
To be clear, I have many mixed feelings about MIRI being non-disclosed, and think it’s not very helpful for the broader intellectual environment and for recruitment and I broadly feel a bit confused about it. But “risk of abuse [of secrecy]” is not a factor I personally worry about with MIRI’s research progress. I feel similarly about the day-to-day and month-to-month work at MIRI as before they entered non-disclosure; I expect they’re largely chugging along, but with more direction and focus than 3-4 years ago.
- Ben Pace 19 Dec 2019 5:45 UTC
  13 points
  Parent
  Let me return to this section:
  I don’t think we can give people money just because they say they are doing good things, because of the risk of abuse. There are many other reasons for not publishing anything. Some simple alternative hypothesis include “we failed to produce anything publishable” or “it is fun to fool ourselves into thinking we have exciting secrets” or “we are doing bad things and don’t want to get caught.” The fact that MIRI’s researchers appear intelligent suggest they at least think they are doing important and interesting issues, but history has many examples of talented reclusive teams spending years working on pointless stuff in splendid isolation.
  Additionally, by hiding the highest quality work we risk impoverishing the field, making it look unproductive and unattractive to potential new researchers.
  It’s funny to hear you suggest “What if they have no good research?” and “I’m irritated that they are hiding all of our best research!” in successive paragraphs as two reasons you’re concerned about secrecy ;-)
  So, my current epistemic state is something like this: Eliezer and Benya and Patrick and others spent something like 4 MIRI-years hacking away at research, and I didn’t get it. Finally Scott and Abram made some further progress on it, and crystalised it into an explanation I actually felt I sorta get. And most of the time I spent trying to understand their work in the meantime was wasted effort on my part, and quite plausibly wasted effort on their part. I remember that time they wrote up a ton of formal-looking papers for the puerto rico conference, to be ready in case a field suddenly sprang around them… but then nobody really read them or built on them. So I don’t mind if, in the intervening 3-4 years, they again don’t really try to explain what they’re thinking about to me, until a bit of progress and good explanation comes along. They’ll continue to write things about the background worldview, like Security Mindset, and Rocket Alignment, and Challenges to Christiano’s Capability Amplification Proposal, and all of the beautiful posts that Scott and Abram write, but overall focus on getting a better understanding of the problem by themselves.
  (I’m open to hearing that other researchers at DM, OpenAI, and so on, feel like they used to get major value out of talking with MIRI and now are not able to; my current epistemic state is that core MIRI is still interested to talk with Paul about iterated amplification, talk with Critch about the stuff Critch does, and so on, about as much as when those folks worked at other orgs but MIRI was open. Which is to say I think they used to talk a little bit, and were often interesting but confusing, and now still talk a little bit, and are similarly interesting but confusing.)
  In their annual write-up they suggest that progress was slower than expected in 2019. However I assign little weight to this as I think most of the cross-sectional variation in organisation reported subjective effectiveness comes from variance in how optimistic/salesy/aggressive they are, rather than actually indicating much about object-level effectiveness.
  It took me 5 attempts to read that sentence. I think you’re saying “when different parts of orgs report how well they’re doing, the difference is primarily explained by how salesy they generally are”. To state the obvious, if Nate had said “Y’know what, it’s going great, everything’s fine, don’t worry about us!” then I’d’ve felt it was more obviously a bit worrying from the perspective of someone on the outside, but especially a somewhat negative report feels like a not obviously adversarial signal. I mean, maybe they’re playing one level higher, but I mostly think they care about saying literally true things, and hold themselves to high standards here.
  That said, to empathise with your position a bit, I do think that if the quote by Nate hadn’t been in the most recent fundraiser post… it would have felt much more like I couldn’t coordinate with this (large!) part of MIRI, and I’d have to start acting as though they just didn’t exist. Like, maybe they’d be making the right choices for the right reasons, but it certainly wouldn’t likely be something I’d feel I could plan around for ~5 years, without a massive chance of being unpleasantly blindsided at the end by “Oh yeah, for the past few years we’ve increasingly realised this research is doomed”. MIRI’s annual self-reports and small number of explicit results have often been some of the central things I’ve had to go on regarding how well the research is going year-to-year, and losing those would feel pretty rough. I think I’d be pretty sad if a few clear results also weren’t written up to some extent in the intervening years (e.g. the tiling agents result).
  I don’t want to underplay that I think the secrecy is currently pretty bad from the perspective of public discourse, and that most folks in x-risk are noticing all the true and important reasons for secrecy but not making plans to ensure that the public discourse will continue to work once they’ve stopped saying what they think out loud any more, especially in public discourse’s current fragile state. I’m not sure if people aren’t noticing the problems or just feel stretched too thin. I think this will substantially reduce the number of surprisingly good things that will happen and damage our long-term coordination. I should write more on that some time, when I have something clearer to say, which I don’t feel I do right now.
  - ESRogs 20 Dec 2019 20:38 UTC
    3 points
    Parent
    
    It took me 5 attempts to read that sentence. I think you’re saying “when different parts of orgs report how well they’re doing, the difference is primarily explained by how salesy they generally are”. To state the obvious, if Nate had said “Y’know what, it’s going great, everything’s fine, don’t worry about us!” then I’d’ve felt it was more obviously a bit worrying from the perspective of someone on the outside, but especially a somewhat negative report feels like a not obviously adversarial signal.
    
    And now I’m having trouble interpreting your comment! :P
    
    Your paraphrase of Larks’s sentence sounds plausible (though I am also confused about what he was trying to say), but I don’t see how your “To state the obvious...” responds to it. I didn’t take Larks to be saying that he finds Nate’s sentence worrying, but rather that he just doesn’t consider it much evidence of how much progress MIRI is making one way or the other.
- Rafael Harth 19 Dec 2019 17:40 UTC
  1 point
  Parent
  Like, during the time that MIRI was non-non-disclosed (i.e. open), I don’t think there was substantially more checking of their work than there is now. Most of academia and industry has/had no interest. There were LWers like Paul Christiano and Wei Dai and Stuart Armstrong that engaged, and I’m sad that these people aren’t able to engage with the present work. But at the time Paul and Wei were basically saying “I don’t get why this is going to work out” so it’s not like MIRI could start getting much more negative feedback, unless Paul and Wei were going to say “This obviously fails and I can prove how,” which I assign little probability to.
  However, the open Philanthropy project did have people evaluate Miri’s work, and they ended up sending them a substantial donation because they liked the Logical Induction paper – unless I’m misremembering how it went down.
  - Ben Pace 19 Dec 2019 18:51 UTC
    6 points
    Parent
    Yes, that’s right. OTOH I recall OpenPhil trying to evaluate MIRI’s work before the logical induction paper, and they thought it was pretty terrible. As I mentioned, I’d be pro MIRI writing up any / a few clear results over the next few years, for reasons like this.
    There’s also a question of how much OpenPhil’s support mattered in your estimation of MIRI. I might write more on it later, but overall it’s not been a major factor for me.
    - ioannes 27 Dec 2019 17:10 UTC
      3 points
      Parent
      See also My current thoughts on MIRI’s “highly reliable agent design” work by Daniel Dewey (Open Phil lead on technical AI grant-making).
      From the “What do I think of HRAD?” section:
      … This reduces my credence in HRAD being very helpful to around 10%. I think this is the decision-relevant credence.