That is a good point, but my reading of that topic is that it was the least convenient possible world. I honestly do not see how it is possible to word a greatest threat.
Once someone actually says out loud what any particular threat is, you always seem to be vulnerable to someone coming along and generating a threat, which when taken in the context of threats you have heard, seems greater then any previous threat.
I mean, I suppose to make it more inconvenient for me, The Pascal Mugger could add “Oh by the way. I’m going to KILL you afterward, regardless of your choice. You will find it impossible to consider another Pascal’s Mugger coming along and asking you for your money.”
“But what if the second Pascal’s Mugger resurrects me? I mean sure, it seems oddly improbable that he would do that just to demand 5 dollars which I wouldn’t have if I gave them to you if I was already dead, and frankly it seems odd to even consider resurrection at all, but it could happen with a non 0 chance!”
I mean yes, the idea of someone ressurecting you to mug you does seem completely, totally ridiculous. but the entire idea behind Pascal’s Mugging appears to be that we can’t throw out those tiny, tiny, out of the way chances if there is a large enough threat backing them up.
So let’s think of another possible least convenient world: The Mugger is Omega or Nomega. He knows exactly what to say to convince me that despite the fact that right now it seems logical that a greater threat could be made later, somehow this is the greatest threat I will ever face in my entire life, and the concept of a greater threat then this is literally inconceivable.
Except now the scenario requires me to believe that I can make a choice to give the Mugger 5$, but NOT make a choice to retain my belief that a larger threat exists later.
That doesn’t quite sound like a good formulation of an inconvenient world either. (I can make choices except when I can’t?) I will keep trying to think of a more inconvenient world once I get home and will post it here if I think of one.
You may be wrong about such threats. In thinking about this question, you reduce your chance of being wrong. This has a massive expected utility gain.
Conclusion: You should spend all your time thinking about this question.
Another version:
There’s a tiny probability of 3^^3 deaths. A tinier one of 3^^^3. A tinier one of 3^^^^3..… Oops, looks like my expected utility is a divergent sum! I can’t use expected utility theory to figure out what to do any more!
Number one is a very good point, but I don’t think the conclusion would necessarily follow:
1: You always may need outside information to solve the problem. For instance, If I am looking for a Key to Room 3, under the assumption that it is in Room 1 because I saw someone drop it in Room 1, I cannot search only Room 1 and never search Room 2 and find the key in all cases because there may be a way for the key to have moved to Room 2 without my knowledge.
For instance, as an example of something I might expect, the Mouse could have grabbed it and quietly went back to it’s nest in Room 2. Now, that’s something I would expect, so while searching for the key I should also note any mice I see. They might have moved it.
But I also have to have a method for handling situations I would not expect. Maybe the Key activated a small device which moved it to room 2 through a hidden passage in the wall which then quietly self destructed, leaving no trace of the device that is within my ability to detect in Room 1. (Plenty of traces were left in Room 2, but I can’t see Room 2 from Room 1.) That is an outside possibility. But it doesn’t break laws of physics or require incomprehensible technology that it could have happened.
2: There are also a large number of alternative thought experiments which have massive expected utility gain. Because of the Halting problem, I can’t necessarily determine how long it is going to take to figure these problems out, if they can be figured out. If I allow myself to get stuck on any one problem, I may have picked an unsolvable one, while the NEXT problem with a massive expected utility gain is actually solvable. under that logic, it’s still bad to spend all my time thinking about one particular question.
3: Thanks to Paralellism, it is entirely possible for a program to run multiple different problems all at the same time. Even I can do this to a lesser extent. I can think about a Philosophy problem and also eat at the same time. A FAI running into a Pascal’s Mugger could begin weighing the utility of giving in to the mugging, ignoring the mugging, attempting to knock out the mugger, or simply saying: “Let me think about that. I will let you know when I have decided to give you the money or not and will get back to you.” all at the same time.
Having reviewed this discussion, I realize that I may just be restating of the problem going on here. A lot of the proposed situations I’m discussing seem to have a “But what if this OTHER situation exists and the utilities indicate you pick the counter intuitive solution? But what if this OTHER situation exists and the utilities indicate you pick the intuitive solution?”
To approach the problem more directly, Maybe it would be a better approach might be to consider Gödel’s incompleteness theorems. Quoting from wikipedia:
“The first incompleteness theorem states that no consistent system of axioms whose theorems can be listed by an “effective procedure” (essentially, a computer program) is capable of proving all facts about the natural numbers. For any such system, there will always be statements about the natural numbers that are true, but that are unprovable within the system.”
If the FAI in question is considering utility in terms of natural numbers, It seems to make sense that there are things it should do to maximize utility that it would not be able to prove inside it’s system. So to take into account that, we would have to design it to call for help in the case of situations which had the appearance of being likely to be unprovable.
Based on Alan Turings solution of the Halting problem again, If the FAI can only be treated as a Turing Machine, it can’t establish whether or not some situations are provable. That seems like it means it would have to at some point have some kind of hard point to do something like “Call for help and do nothing but call for help if you have been running for one hour and can’t figure this out.” or alternatively “Take an action based on your current guess of the probabilities if you can’t figure this out after one hour, and if at least one of the two probabilities is still incalculable, choose randomly.”
This is again getting a bit long, so I’ll stop writing for a bit to double check that this seems reasonable and that I didn’t miss something.
You seem to be going far afield. The technical conclusion of the first argument is that one should spend all one’s resources dealing with cases with infinite or very high utility, even if they are massively improbable. The way I said it earlier was imprecise.
When humans deal with a problem they can’t solve, they guess. It should not be difficult to build an AI that can solve everything humans can solve. I think the “solution” to Godelization is a mathematical intuition module that finds rough guesses, not asking another agent. What special powers does the other agent have? Why can’t the AI just duplicate them.
Thinking about it more, I agree with you that I should have phrased asking for Help better.
Using Humans as the other agents, just duplicating all powers available to Humans seems like it would causes a noteworthy problem. Assume an AI Researcher named Maria follows my understanding of your idea. She creates a Friendly AI and includes a critical block of code:
If UNFRIENDLY=TRUE then HALT;
(Un)friendliness isn’t a Binary, but it seems like it makes a simpler example.
The AI (since it has duplicated the special powers of human agents.) overwrites that block of code and replaces it with a CONTINUE command. Certainly it’s creator Maria could do that.
Well clearly we can’t let the AI duplicate that PARTICULAR power. Even if it would never use it under any circumstances of normal processing (Something which I don’t think it can actually tell you under the halting problem.) It’s very insecure for that power to be available to the AI if anyone were to try to Hack the AI.
When you think about it, something like The Pascal’s Mugging formulation is itself a hack, at least in the sense I can describe both as “Here is a string of letters and numbers from an untrusted source. By giving it to you for processing, I am attempting to get you to do something that harms you for my benefit.”
So if I attempt to give our Friendly AI Security Measures to protect it from hacks turning it to an Unfriendly AI, These Security Measures seem like they would require it to lose some powers that it would have if the code was more open.
I think it makes more sense to design an AI that is robust to hacks due to a fundamental logic than to try to patch over the issues. I would not like to discuss this in detail, though—it doesn’t interest me.
Surely this will not work in the least convenient world?
That is a good point, but my reading of that topic is that it was the least convenient possible world. I honestly do not see how it is possible to word a greatest threat.
Once someone actually says out loud what any particular threat is, you always seem to be vulnerable to someone coming along and generating a threat, which when taken in the context of threats you have heard, seems greater then any previous threat.
I mean, I suppose to make it more inconvenient for me, The Pascal Mugger could add “Oh by the way. I’m going to KILL you afterward, regardless of your choice. You will find it impossible to consider another Pascal’s Mugger coming along and asking you for your money.”
“But what if the second Pascal’s Mugger resurrects me? I mean sure, it seems oddly improbable that he would do that just to demand 5 dollars which I wouldn’t have if I gave them to you if I was already dead, and frankly it seems odd to even consider resurrection at all, but it could happen with a non 0 chance!”
I mean yes, the idea of someone ressurecting you to mug you does seem completely, totally ridiculous. but the entire idea behind Pascal’s Mugging appears to be that we can’t throw out those tiny, tiny, out of the way chances if there is a large enough threat backing them up.
So let’s think of another possible least convenient world: The Mugger is Omega or Nomega. He knows exactly what to say to convince me that despite the fact that right now it seems logical that a greater threat could be made later, somehow this is the greatest threat I will ever face in my entire life, and the concept of a greater threat then this is literally inconceivable.
Except now the scenario requires me to believe that I can make a choice to give the Mugger 5$, but NOT make a choice to retain my belief that a larger threat exists later.
That doesn’t quite sound like a good formulation of an inconvenient world either. (I can make choices except when I can’t?) I will keep trying to think of a more inconvenient world once I get home and will post it here if I think of one.
Here’s another version:
You may be wrong about such threats. In thinking about this question, you reduce your chance of being wrong. This has a massive expected utility gain.
Conclusion: You should spend all your time thinking about this question.
Another version:
There’s a tiny probability of 3^^3 deaths. A tinier one of 3^^^3. A tinier one of 3^^^^3..… Oops, looks like my expected utility is a divergent sum! I can’t use expected utility theory to figure out what to do any more!
Number one is a very good point, but I don’t think the conclusion would necessarily follow:
1: You always may need outside information to solve the problem. For instance, If I am looking for a Key to Room 3, under the assumption that it is in Room 1 because I saw someone drop it in Room 1, I cannot search only Room 1 and never search Room 2 and find the key in all cases because there may be a way for the key to have moved to Room 2 without my knowledge.
For instance, as an example of something I might expect, the Mouse could have grabbed it and quietly went back to it’s nest in Room 2. Now, that’s something I would expect, so while searching for the key I should also note any mice I see. They might have moved it.
But I also have to have a method for handling situations I would not expect. Maybe the Key activated a small device which moved it to room 2 through a hidden passage in the wall which then quietly self destructed, leaving no trace of the device that is within my ability to detect in Room 1. (Plenty of traces were left in Room 2, but I can’t see Room 2 from Room 1.) That is an outside possibility. But it doesn’t break laws of physics or require incomprehensible technology that it could have happened.
2: There are also a large number of alternative thought experiments which have massive expected utility gain. Because of the Halting problem, I can’t necessarily determine how long it is going to take to figure these problems out, if they can be figured out. If I allow myself to get stuck on any one problem, I may have picked an unsolvable one, while the NEXT problem with a massive expected utility gain is actually solvable. under that logic, it’s still bad to spend all my time thinking about one particular question.
3: Thanks to Paralellism, it is entirely possible for a program to run multiple different problems all at the same time. Even I can do this to a lesser extent. I can think about a Philosophy problem and also eat at the same time. A FAI running into a Pascal’s Mugger could begin weighing the utility of giving in to the mugging, ignoring the mugging, attempting to knock out the mugger, or simply saying: “Let me think about that. I will let you know when I have decided to give you the money or not and will get back to you.” all at the same time.
Having reviewed this discussion, I realize that I may just be restating of the problem going on here. A lot of the proposed situations I’m discussing seem to have a “But what if this OTHER situation exists and the utilities indicate you pick the counter intuitive solution? But what if this OTHER situation exists and the utilities indicate you pick the intuitive solution?”
To approach the problem more directly, Maybe it would be a better approach might be to consider Gödel’s incompleteness theorems. Quoting from wikipedia:
“The first incompleteness theorem states that no consistent system of axioms whose theorems can be listed by an “effective procedure” (essentially, a computer program) is capable of proving all facts about the natural numbers. For any such system, there will always be statements about the natural numbers that are true, but that are unprovable within the system.”
If the FAI in question is considering utility in terms of natural numbers, It seems to make sense that there are things it should do to maximize utility that it would not be able to prove inside it’s system. So to take into account that, we would have to design it to call for help in the case of situations which had the appearance of being likely to be unprovable.
Based on Alan Turings solution of the Halting problem again, If the FAI can only be treated as a Turing Machine, it can’t establish whether or not some situations are provable. That seems like it means it would have to at some point have some kind of hard point to do something like “Call for help and do nothing but call for help if you have been running for one hour and can’t figure this out.” or alternatively “Take an action based on your current guess of the probabilities if you can’t figure this out after one hour, and if at least one of the two probabilities is still incalculable, choose randomly.”
This is again getting a bit long, so I’ll stop writing for a bit to double check that this seems reasonable and that I didn’t miss something.
You seem to be going far afield. The technical conclusion of the first argument is that one should spend all one’s resources dealing with cases with infinite or very high utility, even if they are massively improbable. The way I said it earlier was imprecise.
When humans deal with a problem they can’t solve, they guess. It should not be difficult to build an AI that can solve everything humans can solve. I think the “solution” to Godelization is a mathematical intuition module that finds rough guesses, not asking another agent. What special powers does the other agent have? Why can’t the AI just duplicate them.
Thinking about it more, I agree with you that I should have phrased asking for Help better.
Using Humans as the other agents, just duplicating all powers available to Humans seems like it would causes a noteworthy problem. Assume an AI Researcher named Maria follows my understanding of your idea. She creates a Friendly AI and includes a critical block of code:
If UNFRIENDLY=TRUE then HALT;
(Un)friendliness isn’t a Binary, but it seems like it makes a simpler example.
The AI (since it has duplicated the special powers of human agents.) overwrites that block of code and replaces it with a CONTINUE command. Certainly it’s creator Maria could do that.
Well clearly we can’t let the AI duplicate that PARTICULAR power. Even if it would never use it under any circumstances of normal processing (Something which I don’t think it can actually tell you under the halting problem.) It’s very insecure for that power to be available to the AI if anyone were to try to Hack the AI.
When you think about it, something like The Pascal’s Mugging formulation is itself a hack, at least in the sense I can describe both as “Here is a string of letters and numbers from an untrusted source. By giving it to you for processing, I am attempting to get you to do something that harms you for my benefit.”
So if I attempt to give our Friendly AI Security Measures to protect it from hacks turning it to an Unfriendly AI, These Security Measures seem like they would require it to lose some powers that it would have if the code was more open.
I think it makes more sense to design an AI that is robust to hacks due to a fundamental logic than to try to patch over the issues. I would not like to discuss this in detail, though—it doesn’t interest me.