lincolnquirk comments on Ngo and Yudkowsky on alignment difficulty

lincolnquirk 16 Nov 2021 11:05 UTC
43 points
I’d like to express my gratitude and excitement (and not just to you, Rob, though your work is included in this):

Deep thanks to everyone involved for having the discussion, writing up and formatting, and posting it on LW. I think this is some of the more interesting and potentially impactful stuff I’ve seen relating to AI alignment in a long while.

(My only thought is… why hasn’t a discussion like this occurred sooner? Or has it, and it just hasn’t made it to LW?)
- Rob Bensinger 16 Nov 2021 14:35 UTC
  21 points
  Parent
  I’m not sure why we haven’t tried the ‘generate and publish chatroom logs’ option before. If you mean more generally ‘why is MIRI waiting to hash these things out with other xrisk people until now?’, my basic model is:
  - Syncing with others was a top priority for SingInst (2000-2012), and this resulted in stuff like the Sequences, the FOOM debate, Highly Advanced Epistemology 101 for Beginners, the Singularity Summits, etc. It (largely) doesn’t cover the same ground as current disagreements because people disagree about different stuff now.
  - ‘SingInst’ becoming ‘MIRI’ in 2013 coincided with us shifting much more to a focus on alignment research. That said, a lot of factors resulted in us continuing to have a lot of non-research-y conversations with others, including: EA coalescing in 2012-2014; the wider AI alignment field starting in earnest with the release of Superintelligence (2014) and the Puerto Rico conference (2015); and Open Philanthropy starting in 2014.
    Some of these conversations (and the follow-up reflections prompted by these conversations) ended up inspiring publications at some point, including some of the content on Arbital (mostly active 2015-2017), Inadequate Equilibria (published 2017, but mostly written around 2013-2015 I believe), etc.
  - My model is that we then mostly disappeared in 2018-2020 while we bunkered down to do research, continuing to have intermittent conversations and email exchanges with folks, but not sinking very much time into syncing up. (I’ll say that a lot of non-MIRI EA leaders were very eager to sink loads of time into syncing up with MIRI, and it’s entirely MIRI’s ‘sorry, we want to do research instead’ that caused this to not happen during this period.)
  So broadly I’d say ‘we did try to sync up a lot, but it turns out there’s a lot of ground to cover, and different individuals at different times have very different perspectives and cruxes’. At a certain point, (a) we’d transmitted enough of our perspective that we expected to be pretty happy with e.g. EA leaders’ sense of how to do broader field-building, academic outreach, etc.; and (b) we felt we’d plucked the low-hanging fruit and further syncing up would require a lot more focused effort, which seemed lower-priority than ‘make ourselves less confused about the alignment problem by working on this research program’ at the time.
  - Eliezer Yudkowsky 16 Nov 2021 15:27 UTC
    11 points
    Parent
    I’m definitely not happy with others’ sense of how to do field-building, but it’s not like I thought I could fix that issue by spending the rest of my life trying to do it myself.
  - Vaniver 16 Nov 2021 17:26 UTC
    6 points
    Parent
    I’m not sure why we haven’t tried the ‘generate and publish chatroom logs’ option before.
    My guess is that a lot of these conversations often hinge on details that people are somewhat ansy about saying in public, and I suspect MIRI now thinks the value of “credible public pessimism” is larger than the cost of “gesturing towards things that seem powerful” on the margin such that chatlogs like this are a better idea than they would have seemed to the MIRI of 4 years ago. [Or maybe it was just “no one thought to try, because we had access to in-person conversations and those seemed much better, despite not generating transcripts.”]