Daniel Kokotajlo comments on The Achilles Heel Hypothesis for AI

Daniel Kokotajlo 15 Oct 2020 8:58 UTC
9 points
I think that AI capable of being nerd-sniped by these landmines will probably be nerd-sniped by them (or other ones we haven’t thought of) on its own without our help. The kind of AI that I find more worrying (and more plausible) is the kind that isn’t significantly impeded by these landmines.
- avturchin 15 Oct 2020 11:43 UTC
  3 points
  Parent
  Yes, landmines is the last level of defence, which have very low probability to work (like 0.1 per cent). However, If AI is stable to all possible philosophical landmines, it is a very stable agent and has higher chances to keep its alignment and do not fail catastrophically.