Not particularly. You can estimate the likely loss and likely gain from that utility change, as with anything. As long as you’re reasonably certain that the bottom parts of the utility function are more likely to be accessed through extortion than through other means, this is a rational thing to do. Absent a proper theory of extortion and attendant decision theory, of course.
As long as you’re reasonably certain that the bottom parts of the utility function are more likely to be accessed through extortion than through other means
THIS is the key (along with some explanation of why you think extortion is different than some other interaction with different-valued entities). It’s massively counter to my intuitions—I think bottom parts of utility functions are extremely common in natural circumstances without blaming a cause that can be reasoned or traded with.
Maybe more description of the scenario would help. Presumably there’s no infinity here—there’s a bound to the disutility (for you; presumably it’s utility for me) I can get with my fraction of the cosmos. What do you think the proper reaction of an FAI (or a human, for that matter) is, and why is it different for repeated small events than for one large event?
You can estimate the likely loss and likely gain from that utility change, as with anything.
You can try. Your estimate is likely to be very diffuse and uncertain—the issue is that you are trying to get a handle on the distribution tail and that is quite hard to do (see Taleb’s black swans, etc.)
As long as you’re reasonably certain that the bottom parts of the utility function are more likely to be accessed through extortion than through other means, this is a rational thing to do
Not at all—you’re forgetting the about the magnitude of consequences.
Let’s say you have a blackmailer who wants a pony and she has the capability to meddle with your AI’s sensors. Lo and behold, she walks up to the AI and says “I want a pony! Look, there is a large incoming asteroid on a collision course with Earth. Gimme a pony and I’ll tell you if it’s real”.
Ah, says you the designer. I estimate that the blackmailer is bluffing in 99% of the cases. That “bottom part of the utility function” (aka The Sweet Meteor Of Death) is much more likely to be accessed through extortion, a hundred times more likely, in fact.
Therefore I will instruct the AI to disregard any data that tells it there an incoming asteroid on a collision course. And voila—the blackmailer doesn’t get a pony.
That line of thought seems… misguided. For a quick illustration do s/threat/credible threat/g
Effectively you are trying to estimate The Worst That Could Happen and are telling your AI to discount all outcomes below your estimate.
You will need to trust that estimate A LOT.
Not particularly. You can estimate the likely loss and likely gain from that utility change, as with anything. As long as you’re reasonably certain that the bottom parts of the utility function are more likely to be accessed through extortion than through other means, this is a rational thing to do. Absent a proper theory of extortion and attendant decision theory, of course.
THIS is the key (along with some explanation of why you think extortion is different than some other interaction with different-valued entities). It’s massively counter to my intuitions—I think bottom parts of utility functions are extremely common in natural circumstances without blaming a cause that can be reasoned or traded with.
Think of a total utilitarianism style approach, where you can take any small disutlility and multiply it again and again.
OK. Why would this imply extortion rather than simple poverty?
Because you’re the one creating the multiple instances of disutility, using a fraction of the resources of the cosmos.
Maybe more description of the scenario would help. Presumably there’s no infinity here—there’s a bound to the disutility (for you; presumably it’s utility for me) I can get with my fraction of the cosmos. What do you think the proper reaction of an FAI (or a human, for that matter) is, and why is it different for repeated small events than for one large event?
You can try. Your estimate is likely to be very diffuse and uncertain—the issue is that you are trying to get a handle on the distribution tail and that is quite hard to do (see Taleb’s black swans, etc.)
Not at all—you’re forgetting the about the magnitude of consequences.
Let’s say you have a blackmailer who wants a pony and she has the capability to meddle with your AI’s sensors. Lo and behold, she walks up to the AI and says “I want a pony! Look, there is a large incoming asteroid on a collision course with Earth. Gimme a pony and I’ll tell you if it’s real”.
Ah, says you the designer. I estimate that the blackmailer is bluffing in 99% of the cases. That “bottom part of the utility function” (aka The Sweet Meteor Of Death) is much more likely to be accessed through extortion, a hundred times more likely, in fact.
Therefore I will instruct the AI to disregard any data that tells it there an incoming asteroid on a collision course. And voila—the blackmailer doesn’t get a pony.
What could possibly go wrong?
The sweet meteor of death is well above the z point. Complete human extinction is above the z point.
This hack is not intended to deal with normal extortion, it’s intended to cut off really bad outcomes.
What would these be? Can you give a couple of examples?
Are you basically trying to escape Pascal’s Mugging?
The extortion version of that, yes.