I’d argue that the blackmail letter has been received, in some way, analogous to a normal blackmail letter, if you think about that an FAI might do this to make itself happen faster
You are simply mistaken. The analogy to blackmail may be misleading you—maybe try thinking about it without analogy. You might also read up on the subject, for example by reading Eliezer’s TDT paper
I’d like to see other opinions on this because I don’t see that we are proceeding any further.
I now read important parts of the TDT paper (more than just the abstract) and would say I understood at least those parts, though I don’t see anything that would contradict my considerations. I’m sorry, but I’m still not convinced. The analogies serve as a way to make the problem better graspable to intuition, but initially I thoguht about this without such analogies. I still don’t get where my reasoning is flawed. Could you try different approaches?
Hm. Actually, if you think about the following game, where A is the AI and B is the human:
~A1~A2 ~Bx~+9,-1+10,-1 ~By~ −1,-10+0,+0
The Nash equilibrium of the game is A2,By—that is, not horrible and doesn’t give in.
But if we have two agents facing off that don’t make moves independently, but instead choose winning strategies, there are multiple equilibria. I should really read Strategy of Conflict. The initiative to choose a particular equilibrium, however, is ours for the taking, for obvious temporal reasons. If we choose one of the equilibrium strategies, we dictate the other equilibrium strategy to the AI.
You are probably correct—if it’s possible to plausibly precommit oneself to under no circumstances be influenced by any type of blackmailing, then and only then doesn’t it make sense for the AI to threaten to punish people, that is, then an AI which punishes non-helping persons who precommited themselves to helping under no circumstances is unlikely. The problem is that precommitting oneself to under no circumstances helping might be very hard: An AI will still assign a probability greater than zero to the possibility that I can be influenced by the blackmailing and the more this probability approaches zero, the more will the amount of expected utility for the AI in case that it manages to convince me converge to Y, which means that the punishment I have to expect in case I don’t help will also converge to Y.
But wait! As the probability that I’m influenced by the AI shrinks, the probability that it without any good incentive imposes a punishment converging to Y grows, but since we consider a friendly AI, this will also impose a negative expected utility converging to Y on the AI. This should mean that the expected punishment should shrink much faster the more the probability of my actions being influenced by the AI, as rated by the AI, goes down. Thusly, the more convincingly I state that I won’t be influenced by any kind of blackmailing, the more rapidly will the expected punishment shrink, effectively becoming a minor inconvenience or less, since the AI will also assume that I’ll try to avoid punishment and thusly regulate the probability of me being influenced even further down.
I hope I’m correct at this one. If so, thank you, Manfred.
I hereby state that I will under no circumstances be influenced by any kind of acausal threatening from any kind of future Artificial Intelligence, which means that imposing any kind of punishment upon me is rendered completely useless.
The line of reasoning I followed before coming to this insight was flawed, and I dearly have to apologize for wasting your precious computing power, probable future AI, and your time and effort, Manfred, Mitchell Porter, Multiheaded.
Attempt to summarize: a FAI would likely be too nice to bother with not-perfectly-effective torture, and I suspect that it’s a big ol’ softie, so I’m not feeling threatened much, making it even less effective, and so on in a positive feedback loop.
You are simply mistaken. The analogy to blackmail may be misleading you—maybe try thinking about it without analogy. You might also read up on the subject, for example by reading Eliezer’s TDT paper
I’d like to see other opinions on this because I don’t see that we are proceeding any further.
I now read important parts of the TDT paper (more than just the abstract) and would say I understood at least those parts, though I don’t see anything that would contradict my considerations. I’m sorry, but I’m still not convinced. The analogies serve as a way to make the problem better graspable to intuition, but initially I thoguht about this without such analogies. I still don’t get where my reasoning is flawed. Could you try different approaches?
Hm. Actually, if you think about the following game, where A is the AI and B is the human:
~A1~A2
~Bx~+9,-1+10,-1
~By~ −1,-10+0,+0
The Nash equilibrium of the game is A2,By—that is, not horrible and doesn’t give in.
But if we have two agents facing off that don’t make moves independently, but instead choose winning strategies, there are multiple equilibria. I should really read Strategy of Conflict. The initiative to choose a particular equilibrium, however, is ours for the taking, for obvious temporal reasons. If we choose one of the equilibrium strategies, we dictate the other equilibrium strategy to the AI.
You are probably correct—if it’s possible to plausibly precommit oneself to under no circumstances be influenced by any type of blackmailing, then and only then doesn’t it make sense for the AI to threaten to punish people, that is, then an AI which punishes non-helping persons who precommited themselves to helping under no circumstances is unlikely. The problem is that precommitting oneself to under no circumstances helping might be very hard: An AI will still assign a probability greater than zero to the possibility that I can be influenced by the blackmailing and the more this probability approaches zero, the more will the amount of expected utility for the AI in case that it manages to convince me converge to Y, which means that the punishment I have to expect in case I don’t help will also converge to Y.
But wait! As the probability that I’m influenced by the AI shrinks, the probability that it without any good incentive imposes a punishment converging to Y grows, but since we consider a friendly AI, this will also impose a negative expected utility converging to Y on the AI. This should mean that the expected punishment should shrink much faster the more the probability of my actions being influenced by the AI, as rated by the AI, goes down. Thusly, the more convincingly I state that I won’t be influenced by any kind of blackmailing, the more rapidly will the expected punishment shrink, effectively becoming a minor inconvenience or less, since the AI will also assume that I’ll try to avoid punishment and thusly regulate the probability of me being influenced even further down.
I hope I’m correct at this one. If so, thank you, Manfred.
I hereby state that I will under no circumstances be influenced by any kind of acausal threatening from any kind of future Artificial Intelligence, which means that imposing any kind of punishment upon me is rendered completely useless.
The line of reasoning I followed before coming to this insight was flawed, and I dearly have to apologize for wasting your precious computing power, probable future AI, and your time and effort, Manfred, Mitchell Porter, Multiheaded.
Do I win? I hope I do.
Attempt to summarize: a FAI would likely be too nice to bother with not-perfectly-effective torture, and I suspect that it’s a big ol’ softie, so I’m not feeling threatened much, making it even less effective, and so on in a positive feedback loop.