You are probably correct—if it’s possible to plausibly precommit oneself to under no circumstances be influenced by any type of blackmailing, then and only then doesn’t it make sense for the AI to threaten to punish people, that is, then an AI which punishes non-helping persons who precommited themselves to helping under no circumstances is unlikely. The problem is that precommitting oneself to under no circumstances helping might be very hard: An AI will still assign a probability greater than zero to the possibility that I can be influenced by the blackmailing and the more this probability approaches zero, the more will the amount of expected utility for the AI in case that it manages to convince me converge to Y, which means that the punishment I have to expect in case I don’t help will also converge to Y.
But wait! As the probability that I’m influenced by the AI shrinks, the probability that it without any good incentive imposes a punishment converging to Y grows, but since we consider a friendly AI, this will also impose a negative expected utility converging to Y on the AI. This should mean that the expected punishment should shrink much faster the more the probability of my actions being influenced by the AI, as rated by the AI, goes down. Thusly, the more convincingly I state that I won’t be influenced by any kind of blackmailing, the more rapidly will the expected punishment shrink, effectively becoming a minor inconvenience or less, since the AI will also assume that I’ll try to avoid punishment and thusly regulate the probability of me being influenced even further down.
I hope I’m correct at this one. If so, thank you, Manfred.
I hereby state that I will under no circumstances be influenced by any kind of acausal threatening from any kind of future Artificial Intelligence, which means that imposing any kind of punishment upon me is rendered completely useless.
The line of reasoning I followed before coming to this insight was flawed, and I dearly have to apologize for wasting your precious computing power, probable future AI, and your time and effort, Manfred, Mitchell Porter, Multiheaded.
Attempt to summarize: a FAI would likely be too nice to bother with not-perfectly-effective torture, and I suspect that it’s a big ol’ softie, so I’m not feeling threatened much, making it even less effective, and so on in a positive feedback loop.
You are probably correct—if it’s possible to plausibly precommit oneself to under no circumstances be influenced by any type of blackmailing, then and only then doesn’t it make sense for the AI to threaten to punish people, that is, then an AI which punishes non-helping persons who precommited themselves to helping under no circumstances is unlikely. The problem is that precommitting oneself to under no circumstances helping might be very hard: An AI will still assign a probability greater than zero to the possibility that I can be influenced by the blackmailing and the more this probability approaches zero, the more will the amount of expected utility for the AI in case that it manages to convince me converge to Y, which means that the punishment I have to expect in case I don’t help will also converge to Y.
But wait! As the probability that I’m influenced by the AI shrinks, the probability that it without any good incentive imposes a punishment converging to Y grows, but since we consider a friendly AI, this will also impose a negative expected utility converging to Y on the AI. This should mean that the expected punishment should shrink much faster the more the probability of my actions being influenced by the AI, as rated by the AI, goes down. Thusly, the more convincingly I state that I won’t be influenced by any kind of blackmailing, the more rapidly will the expected punishment shrink, effectively becoming a minor inconvenience or less, since the AI will also assume that I’ll try to avoid punishment and thusly regulate the probability of me being influenced even further down.
I hope I’m correct at this one. If so, thank you, Manfred.
I hereby state that I will under no circumstances be influenced by any kind of acausal threatening from any kind of future Artificial Intelligence, which means that imposing any kind of punishment upon me is rendered completely useless.
The line of reasoning I followed before coming to this insight was flawed, and I dearly have to apologize for wasting your precious computing power, probable future AI, and your time and effort, Manfred, Mitchell Porter, Multiheaded.
Do I win? I hope I do.
Attempt to summarize: a FAI would likely be too nice to bother with not-perfectly-effective torture, and I suspect that it’s a big ol’ softie, so I’m not feeling threatened much, making it even less effective, and so on in a positive feedback loop.