I don’t believe that Paul’s approach to indirect normativity is on the right track. I also have no idea which of the possible problems you might be talking about. PM me. I’d put something a very high probability that it’s either not a problem, or that I thought of it years ago.
Yes, blackmail can enable a remote attacker to root your AI if your AI was not designed with this in mind and does not have a no-blackmail equilibrium (which nobody knows how to describe yet). This is true for any AI that ends up with a logical decision theory, indirectly normative or otherwise, or also CDT AIs which encounter other agents which can make credible precommitments. I figured that out I don’t even recall how long ago (remember: I’m the guy who first wrote down an equation for that issue; also I wouldn’t be bothering with TDT at all if it wasn’t relevant to some sort of existential risk). Didn’t talk about it at the time for obvious reasons. The existence of N fiddly little issues like this, any one of which can instantly kill you with zero warning if you haven’t reasoned through something moderately complex in advance and without any advance observations to hit you over the head, is why I engage in sardonic laughter whenever someone suggests that the likes of current AGI developers would be able to handle the problem at their current level of caring. Anyway, MIRI workshops are actively working on advancing our understanding of blackmail to the point where we can eventually derive a robust no-blackmail equilibrium, which is all that anyone can or should be doing AFAICT.
I don’t believe that Paul’s approach to indirect normativity is on the right track. I also have no idea which of the possible problems you might be talking about. PM me. I’d put something a very high probability that it’s either not a problem, or that I thought of it years ago.
Yes, blackmail can enable a remote attacker to root your AI if your AI was not designed with this in mind and does not have a no-blackmail equilibrium (which nobody knows how to describe yet). This is true for any AI that ends up with a logical decision theory, indirectly normative or otherwise, or also CDT AIs which encounter other agents which can make credible precommitments. I figured that out I don’t even recall how long ago (remember: I’m the guy who first wrote down an equation for that issue; also I wouldn’t be bothering with TDT at all if it wasn’t relevant to some sort of existential risk). Didn’t talk about it at the time for obvious reasons. The existence of N fiddly little issues like this, any one of which can instantly kill you with zero warning if you haven’t reasoned through something moderately complex in advance and without any advance observations to hit you over the head, is why I engage in sardonic laughter whenever someone suggests that the likes of current AGI developers would be able to handle the problem at their current level of caring. Anyway, MIRI workshops are actively working on advancing our understanding of blackmail to the point where we can eventually derive a robust no-blackmail equilibrium, which is all that anyone can or should be doing AFAICT.
PM sent.
I agree that solving blackmail in general would make things easier, and it’s good that MIRI is working on this.