paulfchristiano comments on Where I agree and disagree with Eliezer

paulfchristiano 20 Jun 2022 0:52 UTC
LW: 17 AF: 7
6
AF
It sounds like we are broadly on the same page about 1 and 2 (presumably partly because my list doesn’t focus on my spiciest takes, which might have generated more disagreement).
Here are some extremely rambling thoughts on point 3.
I agree that the interaction between AI and existing conflict is a very important consideration for understanding or shaping policy responses to AI, and that you should be thinking a lot about how to navigate (and potentially leverage) those dynamics if you want to improve how well we handle any aspect of AI. I was trying to mostly point to differences in “which problems related to AI are we trying to solve?” We could think about technical or institutional or economic approaches/aspects of any problem.
With respect to “which problem are we trying to solve?”: I also think potential undesirable effects of AI on the balance of power are real and important, both because it affects our long term future and because it will affect humanity’s ability to cope with problems during the transition to AI. I think that problem is at least somewhat less important than alignment, but will probably get much more attention by default. I think this is especially true from a technical perspective, because technical work plays a totally central work for alignment, and a much more unpredictable and incidental role for affecting the balance of power.
I’m not sure how alignment researchers should engage with this kind of alignment-adjacent topic. My naive guess would be that I (and probably other alignment researchers) should:
- Try to have reasonable takes on other problems (and be appropriately respectful/deferential when we don’t know what we’re talking about).
- Feel comfortable “staying in my lane” even though it does inevitably lead to lots of people being unhappy with us.
- Be relatively clear about my beliefs and prioritization with EA-types who are considering where to work, even though that will potentially lead to some conflict with people who have different priorities. (Similarly, I think people who work on different approaches to alignment should probably be clear about their positions and disagree openly, even though it will lead to some conflict.)
- Generally be respectful, acknowledge legitimate differences in what people care about, acknowledge differing empirical views without being overconfident and condescending about it, and behave like a reasonable person (I find Eliezer is often counterproductive on this front, though I have to admit that he does a better job of clearly expressing his concerns and complaints than I do).
I am somewhat concerned that general blurring of the lines between alignment and other concerns will tend to favor topics with more natural social gravity. That’s not enough to make me think it’s clearly net negative to engage, but is at least enough to make me feel ambivalent. I think it’s very plausible that semi-approvingly citing Eliezer’s term “the last derail” was unwise, but I don’t know. In my defense, the difficulty of talking about alignment per se, and the amount of social pressure to instead switch to talking about something else, is a pretty central fact about my experience of working on alignment, and leaves me protective of spaces and norms that let people just focus on alignment.
(On the other hand: (i) I would not be surprised if people on the other side of the fence feel the same way, (ii) there are clearly spaces—like LW—where the dynamic is reversed, though they have their own problems, (iii) the situation is much better than a few years ago and I’m optimistic that will continue getting better for a variety of reasons, not least that the technical problems in AI alignment become increasingly well-defined and conversations about those topics will naturally become more focused.)
I’m not convinced that the dynamic “we care a lot about who ends up with power, and more important topics are more relevant to the distribution of power” is a major part of how humanity solves hard human vs nature problems. I do agree that it’s an important fact about humans to take into account when trying to solve any problem though.
- Jan_Kulveit 21 Jun 2022 9:30 UTC
  LW: 11 AF: 3
  5
  AF Parent
  Not very coherent response to #3. Roughly
  - Caring about visible power is a very human motivation, and I’d expect will draw many people to care about “who are the AI principals”, “what are the AIs actually doing”, and few other topics, which have significant technical components
  - Somewhat wild datapoints in this space: nuclear weapons, space race. in each case, salient motivations such as “war” led some of the best technical people to work on hard technical problems. in my view, the problems the technical people ended up working on were often “vs. nature” and distant from the original social motivations
  - Another take on this is, some people want to technically interesting and import problems, but some of them want to work on “legibly important” or “legibly high-status” problems
  - I do believe there are some opportunities in steering some fraction of this attention toward some of the core technical problems (not toward all of them, at this moment).
  - This can often depend on framing; while my guess is e.g. you shouldn’t probably work on this, my guess is some people who understand alignment technical problems should
  - This can also depend on social dynamics; your “naive guess” seem a good starting point
  - Also: it seems there are many low-hanging fruits in low-difficulty problems which someone should work on—eg at this moment, many humans should be spending a lot of time trying to get empirical understanding of what types of generalization are LLMs capable of.
  With prioritization, I think it would be good if someone made some sort of a curated list “who is working on which problems, and why”—my concern with part of the “EAs figuring out what to do” is many people are doing some sort of expert-aggregation on the wrong level. (Like, if someone basically averages your and Eliezer Yudkowsky’s conclusions giving 50% weight each, I don’t think it is useful and coherent model)