Rohin Shah comments on [AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin Shah 3 Jan 2020 8:24 UTC
3 points
As far as I can tell, it took a while for individuals to notice, longer for it to become common knowledge, and even more time for anyone to do anything about it.
Tangential, but I wouldn’t be surprised if researchers were fairly quickly aware of the issue (e.g. within two years of the original GAN paper), but it took a while to become common knowledge because it isn’t particularly flashy. (There’s a surprising-to-me amount of know-how that is stored in researcher’s brains and never put down on paper.)
Even now, the “solutions” are hacks that don’t completely resolve the issue.
I mean, the solution is to use a VAE. If you care about covering modes but not image quality, you choose a VAE; if you care about image quality but not covering modes, you choose a GAN.
(Also, while I know very little about VAEs / GANs, Implicit Maximum Likelihood Estimation sounded like a principled fix to me.)
This strikes me as a “random fix” where the core issue was that the system did not have sufficient discriminatory power to tell apart a safe situation from an unsafe situation. Instead of properly solving this problem, the researchers put in a hack.
Agreed, I would guess that the researchers / engineers knew this was risky and thought it was worth it anyway. Or perhaps the managers did. But I do agree this is evidence against my position.
I agree that we shouldn’t be worried about situations where there is a clear threat. But that’s not quite the class of failures that I’m worried about. [...] Later, the problems are discovered and the reaction is to hack together a solution.
Why isn’t the threat clear once the problems are discovered?
unless we get an AI equivalent of Chernobyl before we get UFAI.
Part of my claim is that we probably will get that (assuming AI really is risky), though perhaps not Chernobyl-level disaster, but still something with real negative consequences that “could be worse”.
- FactorialCode 4 Jan 2020 3:24 UTC
  1 point
  Parent
  
  Why isn’t the threat clear once the problems are discovered?
  
  I think I should be more specific, when you say:
  
  Suppose that we had extremely compelling evidence that any AI system run with > X amount of compute would definitely kill us all. Do you expect that problem to get swept under the rug?
  
  I mean that no one sane who knows that will run that AI system with > X amount of computing power. When I wrote that comment I also thought that no one sane would not blow the whistle in that event. See my note at the end of the comment.*
  
  However, when presented with that evidence, I don’t expect the AI community to react appropriately. The correct response to that evidence is to stop what your doing, and revisit the entire process and culture that led to the creation of an algorithm that will kill us all if run with >X amount of compute. What I expect will happen is that the AI community will try and solve the problem the same way it’s solved every other problem it has encountered. It will try an inordinate amount of unprincipled hacks to get around the issue.
  
  Part of my claim is that we probably will get that (assuming AI really is risky), though perhaps not Chernobyl-level disaster, but still something with real negative consequences that “could be worse”.
  
  Conditional on no FOOM, I can definitely see plenty of events with real negative consequences that “could be worse”. However, I claim that anything short of a Chernobyl level event won’t shock the community and the world into changing it’s culture or trying to coordinate. I also claim that the capabilities gap between a Chernobyl level event and a global catastrophic event is small, such that even in a non-FOOM scenario the former might not happen before the latter. Together, I think that there is a high probability that we will not get a disaster that is scary enough to get the AI community to change it’s culture and coordinate before it’s too late.
  
  *Now that I think about it more though, I’m less sure. Undergraduate engineers get entire lectures dedicated to how and when to blow the whistle when faced with unethical corporate practices and dangerous projects or designs. When working, they also have insurance and some degree of legal protection from vengeful employers. Even then, you still see cover ups of shortcomings that lead to major industrial disasters. For instance, long before the disaster, someone had determined that the fukushima plant was indeed vulnerable to large tsunami impacts. The pattern where someone knows that something will go wrong but nothing is done to prevent it for one reason or another is not that uncommon in engineering disasters. Regardless of whether this is due to hindsight bias or an inadequate process for addressing safety issues, these disasters still happen regularly in fields with far more conservative, cautious, and safety oriented cultures.
  
  I find it unlikely that the field of AI will change it’s culture from one of moving fast and hacking to something even more conservative and cautious than the cultures of consumer aerospace and nuclear engineering.
  - Rohin Shah 4 Jan 2020 7:23 UTC
    2 points
    Parent
    Idk, I don’t know what to say here. I meet lots of AI researchers, and the best ones seem to me to be quite thoughtful. I can say what would change my mind:
    I take the exploration of unprincipled hacks as very weak evidence against my position, if it’s just in an academic paper. My guess is the researchers themselves would not advocate deploying their solution, or would say that it’s worth deploying but it’s an incremental improvement that doesn’t solve the full problem. And even if the researchers don’t say that, I suspect the companies actually deploying the systems would worry about it.
    I would take the deployment of unprincipled hacks more seriously as evidence, but even there I would want to be convinced that shutting down the AI system was a better decision than deploying an unprincipled hack. (Because then I would have made the same decision in their shoes.)
    Unprincipled hacks are in fact quite useful for the vast majority of problems; as a result it seems wrong to attribute irrationality to people because they use unprincipled hacks.