I think that AI capable of being nerd-sniped by these landmines will probably be nerd-sniped by them (or other ones we haven’t thought of) on its own without our help. The kind of AI that I find more worrying (and more plausible) is the kind that isn’t significantly impeded by these landmines.
Yes, landmines is the last level of defence, which have very low probability to work (like 0.1 per cent). However, If AI is stable to all possible philosophical landmines, it is a very stable agent and has higher chances to keep its alignment and do not fail catastrophically.
I think that AI capable of being nerd-sniped by these landmines will probably be nerd-sniped by them (or other ones we haven’t thought of) on its own without our help. The kind of AI that I find more worrying (and more plausible) is the kind that isn’t significantly impeded by these landmines.
Yes, landmines is the last level of defence, which have very low probability to work (like 0.1 per cent). However, If AI is stable to all possible philosophical landmines, it is a very stable agent and has higher chances to keep its alignment and do not fail catastrophically.