jessicata comments on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)

jessicata 19 Oct 2021 16:36 UTC
8 points
0
The following recent Twitter thread by Eliezer is interesting in the context of the discussion of whether “upsetting but plausible ideas” are coming from central or non-central community actors, and Eliezer’s description of Michael Vassar as “causing psychotic breaks”:

if you actually knew how deep neural networks were solving your important mission-critical problems, you’d never stop screaming

(no, I don’t know how they’re doing it either, I just know that you’d update in a predictable net direction if you found out)

(in reply to “My model of Eliezer is not so different from his constantly screaming, silently to himself, at all times, pausing only to scream non-silently to others, so he doesn’t have to predictably update in the future.”:)

This state of affairs sounds indistinguishable from coherent Bayesian thought inside a world like this one, so I suppose that’s confirmation, yes.

A few takeaways from this:
1. Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently. [EDIT: Maybe I’m wrong that this is indicating neural nets being powerful and is just indicating them being unreliable for mission-critical applications? Both interpretations seem plausible...]
2. This statement, itself, is plausible and upsetting, though presumably less upsetting than if one actually knew the thing that could be learned about neural networks.
3. Someone who was “constantly screaming” would be considered, by those around them, to be having a psychotic break (or an even worse mental health problem), and be almost certain to be psychiatrically incarcerated.
4. Eliezer is, to all appearances, trying to convey these upsetting ideas on Twitter.
5. It follows that, to the extent that Eliezer is not “causing psychotic breaks”, it’s only because he’s insufficiently capable of causing people to believe “upsetting but plausible ideas” that he thinks are true, i.e. because he’s failing (or perhaps not-really-trying, only pretending to try) to actually convey them.
What links here?
- So8res's comment on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage) by jessicata (20 Oct 2021 22:53 UTC; 75 points)
- TurnTrout 19 Oct 2021 18:19 UTC
  38 points
  Parent
  This does not seem like the obvious reading of the thread to me.
  Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently.
  I think Eliezer is saying that if you understood on a gut level how messy deep networks are, you’d realize how doomed prosaic alignment is. And that would be horrible news. And that might make you scream, although perhaps not constantly.
  After all, Eliezer is known to use… dashes… of colorful imagery. Do you really think he is literally constantly screaming silently to himself? No? Then he was probably also being hyperbolic about how he truly thinks a person would respond to understanding a deep network in great detail.
  That’s why I feel that your interpretation is grasping really hard at straws. This is a standard “we’re doomed by inadequate AI alignment” thread from Eliezer.
  - jessicata 19 Oct 2021 18:37 UTC
    12 points
    Parent
    Even though it’s an exaggeration, Eliezer is, with this exaggeration, trying to indicate an extremely high level of fear, off the charts compared with what people are normally used to, as a result of really taking in the information. Such a level of fear is not clearly lower than the level of fear experienced by the psychotic people in question, who experienced e.g. serious sleep loss due to fear.
    - dxu 19 Oct 2021 18:58 UTC
      40 points
      Parent
      I strong-upvoted both of Jessica’s comments in this thread despite disagreeing with her interpretation in the strongest possible terms; I did so because I think it is important to note that, for every “common-sense” interpretation of a community leader’s words, there will be some small minority who interpret it in some other (possibly more damaging) way—and while I think (importantly) this does not imply it is the community leader’s responsibility to manage their words in such a way that no misinterpretation is possible (which I think is simply completely unfeasible), I am nonetheless in favor of people sharing their (non-standard) interpretations, given the variation in potential responses.
      
      As Eliezer once said (I’m paraphrasing from memory here, so the following may not be word-for-word accurate, but I am >95% confident I’m not misremembering the thrust of what he said), “The question I have to ask myself is, will this drive more than 5% of my readers insane?”
      
      EDIT: I have located the text of the original comment. I note (with some vindication) that once again, it seems that Eliezer was sensitive to this concern way ahead of when it actually became a thing.
- Viliam 19 Oct 2021 18:54 UTC
  8 points
  Parent
  Hm, I thought that the upsetting thing is how neural networks work in general. Like the ones that can correctly classify pictures with 99% probability… and then you slightly adjust a few pixels in such way that a human sees no difference, but the neural network suddenly makes a completely absurd claim with high certainty.
  And, if you are using neural networks to solve important problems, and become aware of this, then you realize that despite them doing a great job in 99% of situations and a random stupid thing in the remaining 1%, there is actually no limit to how insanely wrong they can get, and that it can happen in circumstances that would seem perfectly harmless to you. That the underlying logic is just… inhuman.
  (To make an analogy, imagine that you hire a human to translate from French to English. The human is pretty good but not perfect, which means that he gets 99% right. In the remaining 1% he either translates the word incorrectly or says that he doesn’t know. These two options are the only results you expect. -- Now instead of a human, you hire a robot. He also translates 99% correctly and 1% incorrectly or with no output. But in addition to this, if you give him a specifically designed input, he will say a complete absurdity. Like, he would translate “UN CHAT” as “A CAT”, but when you strategically add a few dots and make it “ỤN ĊHAṬ”, he will suddenly insist that is means “CENTRUM FOR APPLIED RATIONALITY” and will assign a 99.9999999% certainty to this translation. Note that this is not its usual reaction to dots; the input papers usually contain some impurities or random dots, and the algorithm has always successfully ignored them… until now. -- The answer is not just wrong, but absurdly wrong, it happened in the situation where you felt quite sure nothing wrong can happen, and the robot didn’t even feel uncertain.)
  Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently.
  So, I think that you got this part wrong (and that putting “obviously” in front of it makes this weirdly ironic in given context), and the following conclusions are therefore also wrong.
  Eliezer is simply saying (not “constantly screaming”) “do not trust neural networks, they randomly make big errors”. That message, even if perceived 100% correctly, should not cause a psychotic break in the average listener.
  - Vaniver 19 Oct 2021 22:17 UTC
    6 points
    Parent
    they randomly make big errors
    I think it’s important that the errors are not random; I think you mean something more like “they make large opaque errors.”
  - jessicata 19 Oct 2021 19:26 UTC
    4 points
    Parent
    Given what else Eliezer has said, it’s reasonable to infer that the screaming is due to the possibility of everyone dying due to neural network based AIs being powerful but unalignable, not merely that your AI application might fail unexpectedly.
    
    It’s really strange to think the idea isn’t upsetting when Eliezer says understanding it would cause “constant screaming”. Even if that’s an exaggeration, really??????? Maybe ask someone who doesn’t read LW regularly whether Elizer is saying the idea you could get by knowing how neural nets work is upsetting, I think they would agree with me.
    - localdeity 21 Oct 2021 21:05 UTC
      21 points
      Parent
      He specified “mission-critical”. An AI’s ability to take over other machines in the network, take over the internet, manufacture grey goo, etc. (choose your favorite doomsday scenario), is not really related to how mission-critical its original task was. (In fact, someone’s AI to choose the best photo filters to match the current mood on Instagram to maximize “likes” seems both more likely to have arbitrary network access and less likely to have careful oversight than a self-driving car AI.) Therefore I do think his comment was about the likelihood of failure in the critical task, and not about alignment.
      I think he meant something like this: The neural net, used e.g. to recognize cars on the road, makes most of its deductions based on accidental correlations and shortcuts in the training data—things like “it was sunny in all the pictures of trucks”, or “if it recognizes the exact shape and orientation of the car’s mirror, then it knows which model of car it is, and deduces the rest of the car’s shape and position from that, rather than by observing the rest of the car”. (Actually they’d be lower-level and less human-legible than this. It’s like someone parsing tables out of Wikipedia pages’ HTML, but instead of matching th/tr/td elements, it just counts “<” characters, and God help us if one of the elements has an extra < due to holding a link or something.) If you understood just how fragile and divorced from reality the shortcuts were, while you were sitting in such a car rushing down the highway, you would scream.
      (The counterargument to screaming, it seems to me, is that it’s relying on 100 different fragile accidental correlations, any 70 of which are sufficient—and it’s unlikely that more than 10 of them will break at once, especially if the neural net gets updated every few months, so the ensemble is robust even though the parts are not. I expect one could develop confidence in this by measuring just how overdetermined the “this is a car” deductions are, and how much they vary. But that requires careful measurement and calculation, and many people might not get past the intuitive “JFC my life depends on the equivalent of 100 of those reckless HTML-parsing shortcuts, I’m going to die”. And I expect there are plenty of applications where the ensemble really is fragile and has a >10% chance of serious failure within a few months.)
      (NB. I’ve never worked on neural nets.)
      - jessicata 21 Oct 2021 21:10 UTC
        4 points
        Parent
        Ok, I see how this is plausible. I do think that the reply to Zvi adds some context where Zvi is basically saying “Eliezer is always screaming, taking pauses to scream at others”, and the thing Eliezer is usually expressing fear about is AI killing everyone. I see how it could go either way though.