habryka comments on Let’s think about slowing down AI

habryka 25 Dec 2022 20:31 UTC
55 points
28
The question feels leading enough that I don’t really know how to respond. Many of these sentences sound pretty crazy to me, so I feel like I primarily want to express frustration and confusion that you assign those sentences to me or “most of the LessWrong community”.

However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest. For instance, I tried to point it out in this previous post:

I think John Wentworth’s question is indeed the obvious question to ask. It does really seem like our prior should be that the world will not react particularly sanely here.

I also think it’s really not true that coordination has been “fraught to even suggest”. I think it’s been suggested all the time, and certain coordination plans seem more promising than others. Like, even Eliezer was for a long time apparently thinking that Deepmind having a monopoly on AGI development was great and something to be protected, which very much involves coordinating with people outside of the LessWrong community.

The same is true for whether “outsiders might recognize the existence of AI x-risk”. Of course outsiders might recognize the existence of AI X-risk. I don’t think this is uncontroversial or disputed. The question is what happens next. Many people seem to start working on AI capabilities research as the next step, which really doesn’t improve things.

That particular statement was very poorly received, with a 139-karma retort from John Wentworth arguing,

I don’t think your summary of how your statement was received is accurate. Your overall post has ~100 karma, so was received quite positively, and while John responded to this particular statement and was upvoted, I don’t think this really reflects much of a negative judgement that specific statement.

John’s question is indeed the most important question to ask about this kind of plan, and it seems correct for it to be upvoted, even if people agree with the literal sentence it is responding to (your original sentence itself was weak enough that I would be surprised if almost anyone on LW disagreed with its literal meaning, and if there is disagreement, it is with the broader implied statement of the help being useful enough to actually be worth it to have as a main-line plan and to forsake other plans that are more pivotal-act shaped).

Thankfully, it now also seems to me that perhaps the core LessWrong team has started to think that ideas from outsiders matter more to the LessWrong community’s epistemics and/or ability to get things done than previously represented, such as by including material written outside LessWrong in the 2021 LessWrong review posted just a few weeks ago, for the first time:

I don’t know man, I have always put a huge emphasis on reading external writing, learning from others, and doing practical concrete things in the external world. Including external material has been more of a UI question, and I’ve been interested in it for a long while, it just didn’t reach the correct priority level (and also, I think it isn’t really working this year for UI reasons, given that basically no nominated posts are externally linked posts, and it was correct for us to not try without putting in substantially more effort to make it work).

I think if anything I updated over the last few years that rederiving stuff for yourself is more important and trying to coordinate with the external world has less hope, given the extreme way the world was actively hostile to cooperation as well as epistemically highly corrupting during the pandemic. I also think the FTX situation made me think it’s substantially more likely that we will get fucked over again in the future when trying to coordinate with parties that have different norms, and don’t seem to care about honesty and integrity very much. I also think RLHF turning out to be the key to ChatGPT and via that OpenAI getting something like product-market fit and probably making OpenAI $10+ billion dollars in-expectation, showing that actually the “alignment ” team at OpenAI had among the worst consequences of any of the teams in the org, was an update in an opposite direction to me. I think these events also made me less hopeful about the existing large LessWrong/EA/Longtermist/Rationality community as a coherent entity that can internally coordinate, but I think that overall results in a narrowing of my circle of coordination, not a widening.

I have models here, but I guess I feel like your comment is in some ways putting words in my mouth in a way that feels bad to me, and I am interested in explaining my models, but I don’t this comment thread is the right context.

I think there is a real question about whether both me and others in the community have a healthy relationship to the rest of the world. I think the answer is pretty messy. I really don’t think it’s ideal, and indeed think it’s probably quite bad, and I have a ton of different ways I would criticize what is currently happening. But I also really don’t think that the current primary problem with the way the community relates to the rest of the world is underestimating the sanity of the rest of the world. I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of “just trust other people more” would likely make things worse.

Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your relation to the world when a very prominent member of your community just stole 8 billion dollars of innocent people’s money and committed the largest fraud since Enron), so I do think there are good and important conversations to be had here.
- David Althaus 6 Jan 2023 14:29 UTC
  13 points
  8
  Parent
  I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of “just trust other people more” would likely make things worse.
  
  [...]
  Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your relation to the world when a very prominent member of your community just stole 8 billion dollars of innocent people’s money and committed the largest fraud since Enron), [...]
  I very much agree with the sentiment of the second paragraph.
  
  Regarding the first paragraph, my own take is that (many) EAs and rationalists might be wise to trust themselves and their allies less.^[1]
  
  The main update of the FTX fiasco (and other events I’ll describe later) I’d make is that perhaps many/most EAs and rationalists aren’t very good at character judgment. They probably trust other EAs and rationalists too readily because they are part of the same tribe and automatically assume that agreeing with noble ideas in the abstract translates to noble behavior in practice.
  
  (To clarify, you personally seem to be good at character judgment, so this message is not directed at you. (I base that mostly on your comments I read about the SBF situation, big kudos for that, btw!)
  
  It seems like a non-trivial fraction of people that joined the EA and rationalist community very early turned out to be of questionable character, and this wasn’t noticed for years by large parts of the community. I have in mind people like Anissimov, Helm, Dill, SBF, Geoff Anders, arguably Vassar—these are just the known ones. Most of them were not just part of the movement, they were allowed to occupy highly influential positions. I don’t know what the base rate for such people is in other movements—it’s plausibly even higher—but as a whole our movements don’t seem to be fantastic at spotting sketchy people quickly. (FWIW, my personal experiences with a sketchy, early EA (not on the above list) inspired this post.)
  
  My own takeaway is that perhaps EAs and rationalists aren’t that much better in terms of integrity than the outside world and—given that we probably have to coordinate with some people to get anything done—I’m now more willing to coordinate with “outsiders” than I was, say, eight years ago.
  1. ^
    Though I would be hesitant to spread this message; the kinds of people who should trust themselves and their character judgment less are more likely the ones who will not take this message to heart, and vice versa.
- Andrew_Critch 26 Dec 2022 5:12 UTC
  7 points
  6
  Parent
  Thanks, Oliver. The biggest update for me here — which made your entire comment worth reading, for me — was that you said this:
  I also think it’s really not true that coordination has been “fraught to even suggest”.
  I’m surprised that you think that, but have updated on your statement at face value that you in fact do. By contrast, my experience around a bunch common acquaintances of ours has been much the same as Katja’s, like this:
  Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.
  Others: wow that sounds extremely ambitious
  Some people: yeah but it’s very important and also we are extremely smart so idk it could work
  [Work on it for a decade and a half]
  Some people: ok that’s pretty hard, we give up
  Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI?
  Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional
  
  In fact I think I may have even heard the world “delusional” specifically applied to people working on AI governance (though not by you) for thinking that coordination on AI regulation is possible / valuable / worth pursuing in service of existential safety.
  
  As for the rest of your narrative of what’s been happening in the world, to me it seems like a random mix of statements that are clearly correct (e.g., trying to coordinate with people who don’t care about honestly or integrity will get you screwed) and other statements that seem, as you say,
  pretty crazy to me,
  and I agree that for the purpose of syncing world models,
  I don’t [think] this comment thread is the right context.
  Anyway, cheers for giving me some insight into your thinking here.
- Andrew_Critch 26 Dec 2022 17:14 UTC
  2 points
  0
  Parent
  Oliver, see also this comment; I tried to @ you on it, but I don’t think LessWrong has that functionality?