Ben Pace comments on 2019 AI Alignment Literature Review and Charity Comparison

Ben Pace 19 Dec 2019 5:45 UTC
13 points
Let me return to this section:
I don’t think we can give people money just because they say they are doing good things, because of the risk of abuse. There are many other reasons for not publishing anything. Some simple alternative hypothesis include “we failed to produce anything publishable” or “it is fun to fool ourselves into thinking we have exciting secrets” or “we are doing bad things and don’t want to get caught.” The fact that MIRI’s researchers appear intelligent suggest they at least think they are doing important and interesting issues, but history has many examples of talented reclusive teams spending years working on pointless stuff in splendid isolation.
Additionally, by hiding the highest quality work we risk impoverishing the field, making it look unproductive and unattractive to potential new researchers.
It’s funny to hear you suggest “What if they have no good research?” and “I’m irritated that they are hiding all of our best research!” in successive paragraphs as two reasons you’re concerned about secrecy ;-)
So, my current epistemic state is something like this: Eliezer and Benya and Patrick and others spent something like 4 MIRI-years hacking away at research, and I didn’t get it. Finally Scott and Abram made some further progress on it, and crystalised it into an explanation I actually felt I sorta get. And most of the time I spent trying to understand their work in the meantime was wasted effort on my part, and quite plausibly wasted effort on their part. I remember that time they wrote up a ton of formal-looking papers for the puerto rico conference, to be ready in case a field suddenly sprang around them… but then nobody really read them or built on them. So I don’t mind if, in the intervening 3-4 years, they again don’t really try to explain what they’re thinking about to me, until a bit of progress and good explanation comes along. They’ll continue to write things about the background worldview, like Security Mindset, and Rocket Alignment, and Challenges to Christiano’s Capability Amplification Proposal, and all of the beautiful posts that Scott and Abram write, but overall focus on getting a better understanding of the problem by themselves.
(I’m open to hearing that other researchers at DM, OpenAI, and so on, feel like they used to get major value out of talking with MIRI and now are not able to; my current epistemic state is that core MIRI is still interested to talk with Paul about iterated amplification, talk with Critch about the stuff Critch does, and so on, about as much as when those folks worked at other orgs but MIRI was open. Which is to say I think they used to talk a little bit, and were often interesting but confusing, and now still talk a little bit, and are similarly interesting but confusing.)
In their annual write-up they suggest that progress was slower than expected in 2019. However I assign little weight to this as I think most of the cross-sectional variation in organisation reported subjective effectiveness comes from variance in how optimistic/salesy/aggressive they are, rather than actually indicating much about object-level effectiveness.
It took me 5 attempts to read that sentence. I think you’re saying “when different parts of orgs report how well they’re doing, the difference is primarily explained by how salesy they generally are”. To state the obvious, if Nate had said “Y’know what, it’s going great, everything’s fine, don’t worry about us!” then I’d’ve felt it was more obviously a bit worrying from the perspective of someone on the outside, but especially a somewhat negative report feels like a not obviously adversarial signal. I mean, maybe they’re playing one level higher, but I mostly think they care about saying literally true things, and hold themselves to high standards here.
That said, to empathise with your position a bit, I do think that if the quote by Nate hadn’t been in the most recent fundraiser post… it would have felt much more like I couldn’t coordinate with this (large!) part of MIRI, and I’d have to start acting as though they just didn’t exist. Like, maybe they’d be making the right choices for the right reasons, but it certainly wouldn’t likely be something I’d feel I could plan around for ~5 years, without a massive chance of being unpleasantly blindsided at the end by “Oh yeah, for the past few years we’ve increasingly realised this research is doomed”. MIRI’s annual self-reports and small number of explicit results have often been some of the central things I’ve had to go on regarding how well the research is going year-to-year, and losing those would feel pretty rough. I think I’d be pretty sad if a few clear results also weren’t written up to some extent in the intervening years (e.g. the tiling agents result).
I don’t want to underplay that I think the secrecy is currently pretty bad from the perspective of public discourse, and that most folks in x-risk are noticing all the true and important reasons for secrecy but not making plans to ensure that the public discourse will continue to work once they’ve stopped saying what they think out loud any more, especially in public discourse’s current fragile state. I’m not sure if people aren’t noticing the problems or just feel stretched too thin. I think this will substantially reduce the number of surprisingly good things that will happen and damage our long-term coordination. I should write more on that some time, when I have something clearer to say, which I don’t feel I do right now.
- ESRogs 20 Dec 2019 20:38 UTC
  3 points
  Parent
  
  It took me 5 attempts to read that sentence. I think you’re saying “when different parts of orgs report how well they’re doing, the difference is primarily explained by how salesy they generally are”. To state the obvious, if Nate had said “Y’know what, it’s going great, everything’s fine, don’t worry about us!” then I’d’ve felt it was more obviously a bit worrying from the perspective of someone on the outside, but especially a somewhat negative report feels like a not obviously adversarial signal.
  
  And now I’m having trouble interpreting your comment! :P
  
  Your paraphrase of Larks’s sentence sounds plausible (though I am also confused about what he was trying to say), but I don’t see how your “To state the obvious...” responds to it. I didn’t take Larks to be saying that he finds Nate’s sentence worrying, but rather that he just doesn’t consider it much evidence of how much progress MIRI is making one way or the other.