Viliam comments on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)

Viliam 19 Oct 2021 18:54 UTC
8 points
Hm, I thought that the upsetting thing is how neural networks work in general. Like the ones that can correctly classify pictures with 99% probability… and then you slightly adjust a few pixels in such way that a human sees no difference, but the neural network suddenly makes a completely absurd claim with high certainty.
And, if you are using neural networks to solve important problems, and become aware of this, then you realize that despite them doing a great job in 99% of situations and a random stupid thing in the remaining 1%, there is actually no limit to how insanely wrong they can get, and that it can happen in circumstances that would seem perfectly harmless to you. That the underlying logic is just… inhuman.
(To make an analogy, imagine that you hire a human to translate from French to English. The human is pretty good but not perfect, which means that he gets 99% right. In the remaining 1% he either translates the word incorrectly or says that he doesn’t know. These two options are the only results you expect. -- Now instead of a human, you hire a robot. He also translates 99% correctly and 1% incorrectly or with no output. But in addition to this, if you give him a specifically designed input, he will say a complete absurdity. Like, he would translate “UN CHAT” as “A CAT”, but when you strategically add a few dots and make it “ỤN ĊHAṬ”, he will suddenly insist that is means “CENTRUM FOR APPLIED RATIONALITY” and will assign a 99.9999999% certainty to this translation. Note that this is not its usual reaction to dots; the input papers usually contain some impurities or random dots, and the algorithm has always successfully ignored them… until now. -- The answer is not just wrong, but absurdly wrong, it happened in the situation where you felt quite sure nothing wrong can happen, and the robot didn’t even feel uncertain.)
Obviously, Eliezer is saying that there is a plausible but extremely upsetting idea that could be learned by studying neural networks sufficiently competently.
So, I think that you got this part wrong (and that putting “obviously” in front of it makes this weirdly ironic in given context), and the following conclusions are therefore also wrong.
Eliezer is simply saying (not “constantly screaming”) “do not trust neural networks, they randomly make big errors”. That message, even if perceived 100% correctly, should not cause a psychotic break in the average listener.
- Vaniver 19 Oct 2021 22:17 UTC
  6 points
  Parent
  they randomly make big errors
  I think it’s important that the errors are not random; I think you mean something more like “they make large opaque errors.”
- jessicata 19 Oct 2021 19:26 UTC
  4 points
  Parent
  Given what else Eliezer has said, it’s reasonable to infer that the screaming is due to the possibility of everyone dying due to neural network based AIs being powerful but unalignable, not merely that your AI application might fail unexpectedly.
  
  It’s really strange to think the idea isn’t upsetting when Eliezer says understanding it would cause “constant screaming”. Even if that’s an exaggeration, really??????? Maybe ask someone who doesn’t read LW regularly whether Elizer is saying the idea you could get by knowing how neural nets work is upsetting, I think they would agree with me.
  - localdeity 21 Oct 2021 21:05 UTC
    21 points
    Parent
    He specified “mission-critical”. An AI’s ability to take over other machines in the network, take over the internet, manufacture grey goo, etc. (choose your favorite doomsday scenario), is not really related to how mission-critical its original task was. (In fact, someone’s AI to choose the best photo filters to match the current mood on Instagram to maximize “likes” seems both more likely to have arbitrary network access and less likely to have careful oversight than a self-driving car AI.) Therefore I do think his comment was about the likelihood of failure in the critical task, and not about alignment.
    I think he meant something like this: The neural net, used e.g. to recognize cars on the road, makes most of its deductions based on accidental correlations and shortcuts in the training data—things like “it was sunny in all the pictures of trucks”, or “if it recognizes the exact shape and orientation of the car’s mirror, then it knows which model of car it is, and deduces the rest of the car’s shape and position from that, rather than by observing the rest of the car”. (Actually they’d be lower-level and less human-legible than this. It’s like someone parsing tables out of Wikipedia pages’ HTML, but instead of matching th/tr/td elements, it just counts “<” characters, and God help us if one of the elements has an extra < due to holding a link or something.) If you understood just how fragile and divorced from reality the shortcuts were, while you were sitting in such a car rushing down the highway, you would scream.
    (The counterargument to screaming, it seems to me, is that it’s relying on 100 different fragile accidental correlations, any 70 of which are sufficient—and it’s unlikely that more than 10 of them will break at once, especially if the neural net gets updated every few months, so the ensemble is robust even though the parts are not. I expect one could develop confidence in this by measuring just how overdetermined the “this is a car” deductions are, and how much they vary. But that requires careful measurement and calculation, and many people might not get past the intuitive “JFC my life depends on the equivalent of 100 of those reckless HTML-parsing shortcuts, I’m going to die”. And I expect there are plenty of applications where the ensemble really is fragile and has a >10% chance of serious failure within a few months.)
    (NB. I’ve never worked on neural nets.)
    - jessicata 21 Oct 2021 21:10 UTC
      4 points
      Parent
      Ok, I see how this is plausible. I do think that the reply to Zvi adds some context where Zvi is basically saying “Eliezer is always screaming, taking pauses to scream at others”, and the thing Eliezer is usually expressing fear about is AI killing everyone. I see how it could go either way though.