He specified “mission-critical”. An AI’s ability to take over other machines in the network, take over the internet, manufacture grey goo, etc. (choose your favorite doomsday scenario), is not really related to how mission-critical its original task was. (In fact, someone’s AI to choose the best photo filters to match the current mood on Instagram to maximize “likes” seems both more likely to have arbitrary network access and less likely to have careful oversight than a self-driving car AI.) Therefore I do think his comment was about the likelihood of failure in the critical task, and not about alignment.
I think he meant something like this: The neural net, used e.g. to recognize cars on the road, makes most of its deductions based on accidental correlations and shortcuts in the training data—things like “it was sunny in all the pictures of trucks”, or “if it recognizes the exact shape and orientation of the car’s mirror, then it knows which model of car it is, and deduces the rest of the car’s shape and position from that, rather than by observing the rest of the car”. (Actually they’d be lower-level and less human-legible than this. It’s like someone parsing tables out of Wikipedia pages’ HTML, but instead of matching th/tr/td elements, it just counts “<” characters, and God help us if one of the elements has an extra < due to holding a link or something.) If you understood just how fragile and divorced from reality the shortcuts were, while you were sitting in such a car rushing down the highway, you would scream.
(The counterargument to screaming, it seems to me, is that it’s relying on 100 different fragile accidental correlations, any 70 of which are sufficient—and it’s unlikely that more than 10 of them will break at once, especially if the neural net gets updated every few months, so the ensemble is robust even though the parts are not. I expect one could develop confidence in this by measuring just how overdetermined the “this is a car” deductions are, and how much they vary. But that requires careful measurement and calculation, and many people might not get past the intuitive “JFC my life depends on the equivalent of 100 of those reckless HTML-parsing shortcuts, I’m going to die”. And I expect there are plenty of applications where the ensemble really is fragile and has a >10% chance of serious failure within a few months.)
Ok, I see how this is plausible. I do think that the reply to Zvi adds some context where Zvi is basically saying “Eliezer is always screaming, taking pauses to scream at others”, and the thing Eliezer is usually expressing fear about is AI killing everyone. I see how it could go either way though.
He specified “mission-critical”. An AI’s ability to take over other machines in the network, take over the internet, manufacture grey goo, etc. (choose your favorite doomsday scenario), is not really related to how mission-critical its original task was. (In fact, someone’s AI to choose the best photo filters to match the current mood on Instagram to maximize “likes” seems both more likely to have arbitrary network access and less likely to have careful oversight than a self-driving car AI.) Therefore I do think his comment was about the likelihood of failure in the critical task, and not about alignment.
I think he meant something like this: The neural net, used e.g. to recognize cars on the road, makes most of its deductions based on accidental correlations and shortcuts in the training data—things like “it was sunny in all the pictures of trucks”, or “if it recognizes the exact shape and orientation of the car’s mirror, then it knows which model of car it is, and deduces the rest of the car’s shape and position from that, rather than by observing the rest of the car”. (Actually they’d be lower-level and less human-legible than this. It’s like someone parsing tables out of Wikipedia pages’ HTML, but instead of matching th/tr/td elements, it just counts “<” characters, and God help us if one of the elements has an extra < due to holding a link or something.) If you understood just how fragile and divorced from reality the shortcuts were, while you were sitting in such a car rushing down the highway, you would scream.
(The counterargument to screaming, it seems to me, is that it’s relying on 100 different fragile accidental correlations, any 70 of which are sufficient—and it’s unlikely that more than 10 of them will break at once, especially if the neural net gets updated every few months, so the ensemble is robust even though the parts are not. I expect one could develop confidence in this by measuring just how overdetermined the “this is a car” deductions are, and how much they vary. But that requires careful measurement and calculation, and many people might not get past the intuitive “JFC my life depends on the equivalent of 100 of those reckless HTML-parsing shortcuts, I’m going to die”. And I expect there are plenty of applications where the ensemble really is fragile and has a >10% chance of serious failure within a few months.)
(NB. I’ve never worked on neural nets.)
Ok, I see how this is plausible. I do think that the reply to Zvi adds some context where Zvi is basically saying “Eliezer is always screaming, taking pauses to scream at others”, and the thing Eliezer is usually expressing fear about is AI killing everyone. I see how it could go either way though.