I tend to write stuff that gets voted down to oblivion. If you are voting this down, could the first few of you comment why? I’d appreciate learning how I don’t fit in in case this is another of those posts that doesn’t. Thanks in advance.
I extend Martin Gardner’s idea. From a causal perspective, it would make no difference if the BOTH boxes were transparent. At 7 AM the alien puts down two transparent boxes in front of you, one with $1000 in it and one with either $1,000,000 or $0 in it. You can look in and see which he chose for you.
You have personally witnessed at least a thousand trials, and have heard from thousands of other people that you can question, many whom you know well, that have each witnessed at least 1000 trials. What you have seen, and heard these others have seen, is that the alien has NEVER been wrong, that EVERY time he put $1,000,000 in box B, the human took only box B, and EVERY time he put nothing in box B, the human took box A. Further, you have seen at least 20 cases where the Alien put $,1000,000 in box B and $1,000 in box A, and even though the boxes were fixed and transparent, you have seen the human choose only box B in all those cases. Further, you have been told by the other 1000 observers, some of whom you know quite well, and all of whom you have been able to question as you wish, that each of them have witnessed at least 20 cases where both boxes had money in them and the human chose only box B with $1,000,000.
You are sitting there with two boxes in front of you and you can see $1,000,000 in box B and $1,000 in box A. What do you do?
“Rationalists always win.” Clearly an important part of this problem is BELIEF in the existence of, even the possibility of, an intelligence which can have such predictive powers. Faced with two boxes with money in them, you have the opportunity to stick a fork in this theory once and for all, or you can join the religion of the Omnisicient Alien and pick just the $1,000,000.
Amazingly, even if there is no $1,000,000 in front of you, only $1,000, you have the same choice. As stated, the Alien has correctly predicted every time in the past that when the human chose box A, box B was always empty. For a mere $1,000 you can provide the incredibly valuable information that Aliens are not what they seem to be. You can choose only the empty box B and help free humanity free of the superstitious belief in Omnisicient Aliens.
Can it be rational to pick only box B? In this case I maintain, you are not a rationalist, but a convert to the religion of the omniscient Alien. A rationalist can see with no doubt that he would be $1000 richer taking both boxes, AND he could put a hole in this deification process of the alien.
In my opiinion, this version of the problem exposes the likelihood that there is trickery going on. For 100s or 1000s of shows in a row, Siegfried and Roy made a tiger appear from thin air in a thin and isolated cage suspended in empty space above an empty stage. What are the chances that this was really a case of teleportation or spontaneous generation, and not just a trick where a regular tiger was moved some physically understandable way? And yet if you were priveleged to see this trick performed 100 times, most of you would be amazed every time.
What are the chances that there is not some trickery with this alien? Perhaps this alien created AIs which would choose as he knew they would, clothed them in human flesh, and pre-planted them over decades before coming to earth to create the appearance of being able to predict what humans would do. Is this explanation of what we are seeing really LESS likely than that an alien really has this kind of predictive and computational ability? Even humans with a few $1000 can create illusions, I give you the close-up room at the Magic Castle in L.A. as a place where you can go to see the laws of physics violated with as little as $10 investment in props. Is it more likely that an alien could set up an elaborate “sting” on humanity over a hundred years, or that all humans really are predictable down to how they pick a random number, flip a coin, choose someone from the phone book to flip a coin, or read the least significant digit on a voltmeter attachd to a battery they picked at random from an object in their house?
Occams razor says the Alien is tricking you. Everything else is a believe in magic, a return to religion with God as Omniscient Aliens.
This misses the point of Newcomb’s problem entirely. The stuff about boxes and Omega is just an intuition pump; Newcomb’s problem itself is more properly written as a computer program, which contains none of that other stuff. It is common to complain that no real-world scenario will ever correspond to that program, but that is true only in the same sense that the world can never contain the frictionless pulleys, perfect vacuums and rigid objects that come up in physics problems. It’s not that complications like friction and the possibility of being deceived about the rules don’t matter, but rather that you have to solve the simplified problem first before you add those complications back in. In decision theory, “Omega” is short for “without any complications not explicitly mentioned in the problem statement”, so if you start adding in possibilities like illusionists then it isn’t Newcomb’s problem anymore.
My intuition has been pumped hard by this problem. My intuition is that it violates what we know about physics to be able to predict what each of 6 billion human beings will do confronted with the two boxes after one hour’s time elapsed.
The particular physics I think is violated is quantum mechanical uncertainty. What we believe we know from quantum mechanical uncertainty is that there are a myriad of microscopic processes of which the outcome in our world cannot be predicted. We encase this result from quantum mechanics in at least two possible interpretations labeled Copenhagen and Many Worlds. But both of these interpretations have in common that for a myriad of common events starting at t1, there are multiple mutually exclusive possible outcomes possible at time t2>t1 that are, as far as either Copenhagen or MWI interpretations allow, intrinsically unpredictable at time t1. That is, at least two possible universes at time t2 are completely consistent with the single example universe at time t1: one in which one of these quantum events has turned out one way, and one in which it has turned out another way.
So now the question comes: does this have ANYTHING to do with Newcomb’s problem? And it is trivial to make sure it does. During the hour I have between when the alien sets the boxes in front of me and when I must choose, I acquire a geiger counter, and I open up the stopwatch application on my iPhone. I tune the geiger counter using lead foil and possibly some medical isotopes so that it is triggering on average about once ever 60 seconds. I start the stopwatch, wait until it has run at least 15 seconds, and then stop it next time I hear a click from the geiger counter. I look at the least significant bit on the stopwatch, which is tenths-of-a-second on my iPhone. If that number is even I will pick two boxes if that number is odd, I will pick just box B.
As far as we know from Schrödinger’s cat gedankedonks, the exact time of emissions of radioactive decay particles is quantumly “random.” In Copenhagen, the collapse is at a random time, in many worlds, there is a different version of the universe for each possible decay time. Either way, for the Alien to have filled that box correctly he must be either
1) Able to predict the outcome of quantum phenomenon in a way that our physics currently believes is impossible
2) have flipped a coin and gotten lucky.
Now, with thousands of humans chosen to play this game, what are the chances that I am the only one chosen who includes a quantum coin toss in his choosing mechanism? Either the chances are low, in which case chances of the alien pulling off this scam are falling as 1/2^N where N is the number of quantum coin tosses among his choosers, OR the Alien is cheating.
The Alien’s form of cheating might be one of many things. Perhaps he can correctly predict what SOME humans will do, and he only offers the game to those humans, in which case he will not have offered the game to me or any humans of my ilk.
My intuition has been pumped. I have been shown a gedanken problem which I think has some components equivalent to “assume a circle with four corners,” or “assume 2+2=5″ or some other counterfactual that is just so counter to the factuals in OUR world that pointing out this counterfactuality is the resolution to the paradox.
The things that rule out God as a good hypothesis is not his name, it is his properties. Perhaps the limited Omniscience of being able to predict reliably what any human will do in an hour when confronted with Newcomb’s boxes is god-line enough to be tossed out with God from the list of good hypotheticals. It looks that way to me.
If I am right, we don’t need to develop a decision theory that lets a Friendly AI self-modify to pick one box and still call the whole endeavour rational.
If you allow randomization, you have an underspecified problem again. But you can fix it easily enough by saying that Omega fills the box with the same probability that you one-box.
Here’s a variant that may help your intuition. Supppose that rather than let you pick directly, Omega asks you to write a computer program that implements whatever strategy you would have used, and that program chooses one or two boxes. In that case, the prediction would be trivial, and you would certainly want to provide a program that one-boxed.
Now suppose that instead of writing a computer program, you are one. Because you’e been uploaded, say. In that case, you would want to be a program that one-boxes.
The thing is, due to the physics underlying your brain, you are a computer program. A very complicated, randomized computer program which can’t always be predicted by any means other than simulating it and can’t necessarily be simulated without using resources that aren’t available in the universe. But that’s Omega’s problem. Yours is just choosing a number of boxes.
The original specification of Newcomb’s problem had the alien empty box B if he predicted I would use a random number generator. I’m not sure why Eliezer removed that restriction, but he did and that is a big part of what I writing about.
If you already believe that a PHYSICAL random number generator can be built based no quantum processes, and that such a generator can be interfaced with a computer and therefore called by, controlled by, with results read by a computer, then you don’t need to bother with the details in the next paragraph. The purpose of the next paragraph is to outline the design of such a quantum random number generator.
Get a beta radiation detector with computer interface. Computer must have appropriate two way interface and appropriate library to control and read the radiation detector. Computer must be set up with radiation detector and a beta radiation source (commercially available.)
First part of computer program runs and reads out average rate at which beta particles are being detected. Beta source is moved far away from detector, and it is verified that detector detects at less than once per 10 seconds, on average. Source is moved slowly towards detector until average detector rate is once per 2 seconds or higher. Source can be moved under computer control to make this all a pre=specified program. I would test this program before hardcoding numbers like 10 s and 2 s and the distance ranges the sample was moved, the point would be to get something where the pulses are slow compared to the computer time resolution, but fast compared to any “background” detection rate from this detector.
Now my program freezes the source in place, and runs a 20 s counter. When the 20 s counter is up, the program records the time of the very next beta particle it sees to whatever resolution the computer offers, but at least 1 ms resolution. The computer looks at the tenths-of-second digit in a decimal representation of the time using any onboard clock you care about. Perhaps it is time since the computer program was turned on in order to make it spedifiably simple. If that tenths of a second digit is even, computer chooses two boxes. If that tenths of a second digit is odd, computer chooses only box B.
I believe for this Alien to predict “my” choice, (my computer programs choice) it must be able to predict details of beta decay of my beta emitting sample. Beta decay is a fairly simple atomic decay process which is well characterized by relatively simple quantum mechanics, but which has as best physicists know, an unpredictable actual time that each beta decay will occur.
Now I don’t know why Eliezer eliminated the “Alien empties box if you choose box randomly” but my point here is I can with asymptotically certain probability break the “winning” streak of the alien at predicting what humans will do, as I am able to get other humans to employ this technique. Either that or 1) QM as we know it is wrong or 2) the Alien is cheating, i.e., not doing what EY says he is doing.
Assuming EY got rid of “you lose if you go random” from the Alien’s response for a reason, I think he is doing the equivalent of assuming pi = 22⁄7 exactly or that a square has only 3 sides, or SOME such thing where we are no longer in our universe when considering the problem.
That EY might be coming up with a decision theory that applies only to universes other than our own is not what I think he intends.
Seconding jimrandomh: you seem to be talking about issues that don’t matter to decision theory very much. Let me reframe.
My own interest in the topic was sparked by Eliezer’s remark about “AIs that know each other’s source code”. As far as I understand, his interest in decision theory isn’t purely academic, it’s supposed to be applied to building an AI. So the simplest possible approach is to try “solving decision theory” for deterministic programs that are dropped into various weird setups. It’s not even necessary to explicitly disallow randomization: the predictor can give you a pony if it can prove you cooperate, and no pony otherwise. This way it’s in your interest in some situations to be provably cooperative.
Now, if you’re an AI that can modify your own source code, you will self-modify to become “provably cooperative” in precisely those situations where the payoff structure makes it beneficial. (And correspondingly “credibly threatening” in those situations that call for credible threats, I guess.) Classifying such situations, and mechanical ways of reasoning about them, is the whole point of our decision theory studies. Of course no one can prohibit you from randomizing in adversarial situations, e.g. if you assign a higher utility to proving Omega wrong than to getting a pony.
I definitely appreciate your and jimrandomh’s comments. I am rereading Eliezer’s paper again in light of these comments and clearly getting more on the “decision theory” page as I go.
Provably cooperative seems problematic, but maybe not. As a concept certainly useful. But is there any way to PROVE that the AI is actually running the code she shows you? I suspect probably not.
Also, where I was coming from with my comments may be a misunderstanding of what Eliezer was doing with Newcomb but it may not. At least in other posts, if not in this paper, he has said “rational means winning” and that a self-modifying AI would modify itself to be provably precommitted to box B in Newcomb’s problem. What I think about there are two problems, one of which Eliezer touches on, the other which he doesn’t.
First that he touches on: if the Alien is simply rewarding people for being irrational than its not clear we want an AI to self-modify to win Newcomb’s problem. Clearly an all-powerful alien who threatens humanities existence if it doesn’t worship him, maybe we do want an AI to abandon its rationality for that, but I’m not sure, and what you have here is “assuming God comes along and tells us all to toe the line or go to hell, what does Decision theory tell us to do?” Well the main issue there might be being actually sure that it is God that has come along and not just the man-behind-the-curtain, i.e. a trickster who has your dopey AI thinking it is god and abandoning its rationality, i.e. being hijacked by trickery.
The 2nd issue is: there must be some very high level or reliability required when you are contemplating action predicated on very unlikely hypotheses. If our friendly self-modifying AI sees 1000 instances of an Alien providing Newcomb’s boxes (and 1000 is the number in Eliezer’s paper), I don’t want it concluding 1000 = certainty because it doesn’t. Especially in a complex world where even finite humans using last century’s technolgies can trick the crap out of other humans. If a self-modifying friendly AI sees something come along which appears to violate physics in order to provide a seemingly causal paradox which is laden with the emotion of a million dollars or a cure for your daughter’s cancer, then the last thing I want that AI to do is to modify itself BEFORE it properly estimates the probabilities that the Alien is actually no smarter than Siegfried and Roy.
Its not concievable to me that resistance to getting tricked and properly understanding the influence of evidence especially when that evidence may be provided by an Alien even smarter and with more resources than Siegried and Roy is NOT part of decision theory. Maybe it is not the part Eliezer wants to discuss here.
In any case, I am rereading Eliezer’s paper and will know more about Decision theory before my next comment. Thank you for your comments in that regard, I am finding I flow through Eliezer’s paper more fluidly now after reading those comments.
is there any way to PROVE that the AI is actually running the code she shows you?
Nope; certainty is impossible to come by in worlds that contain a sufficiently powerful deceiver. That said, compiling the code she shows you on a different machine and having her shut herself down would be relatively compelling evidence in similar cases that don’t posit an arbitrarily powerful deceiver.
If both boxes are transparent, then the problem is underspecified for agents whose action depends on what they see unless you add a rule to cover them. That doesn’t mean that the parts of the problem which you have specified (namely, what happens to unconditional one-boxers and unconditional two-boxers) are invalid, just that you missed a case.
Actually, if a real-world analog to Newcomb’s Problem ever came up in my real life, there’s a not-insignificant chance that I would turn down the $1000 in the transparent box as well and just walk away—that is, that I would zero-box—under the general principle that if I don’t trust the motives of the person setting up the game I do better not to take any of the choices they are encouraging me to take, no matter how obvious the choices may seem. Maybe I’ve wandered into the next Batman movie the box is poisoned or something.
Of course, if you insist on rejecting the setup to Newcomb’s Problem rather than cooperating with it, you’ll never get to see whether there’s anything valuable being set up.
I tend to write stuff that gets voted down to oblivion. If you are voting this down, could the first few of you comment why? I’d appreciate learning how I don’t fit in in case this is another of those posts that doesn’t. Thanks in advance.
I extend Martin Gardner’s idea. From a causal perspective, it would make no difference if the BOTH boxes were transparent. At 7 AM the alien puts down two transparent boxes in front of you, one with $1000 in it and one with either $1,000,000 or $0 in it. You can look in and see which he chose for you.
You have personally witnessed at least a thousand trials, and have heard from thousands of other people that you can question, many whom you know well, that have each witnessed at least 1000 trials. What you have seen, and heard these others have seen, is that the alien has NEVER been wrong, that EVERY time he put $1,000,000 in box B, the human took only box B, and EVERY time he put nothing in box B, the human took box A. Further, you have seen at least 20 cases where the Alien put $,1000,000 in box B and $1,000 in box A, and even though the boxes were fixed and transparent, you have seen the human choose only box B in all those cases. Further, you have been told by the other 1000 observers, some of whom you know quite well, and all of whom you have been able to question as you wish, that each of them have witnessed at least 20 cases where both boxes had money in them and the human chose only box B with $1,000,000.
You are sitting there with two boxes in front of you and you can see $1,000,000 in box B and $1,000 in box A. What do you do?
“Rationalists always win.” Clearly an important part of this problem is BELIEF in the existence of, even the possibility of, an intelligence which can have such predictive powers. Faced with two boxes with money in them, you have the opportunity to stick a fork in this theory once and for all, or you can join the religion of the Omnisicient Alien and pick just the $1,000,000.
Amazingly, even if there is no $1,000,000 in front of you, only $1,000, you have the same choice. As stated, the Alien has correctly predicted every time in the past that when the human chose box A, box B was always empty. For a mere $1,000 you can provide the incredibly valuable information that Aliens are not what they seem to be. You can choose only the empty box B and help free humanity free of the superstitious belief in Omnisicient Aliens.
Can it be rational to pick only box B? In this case I maintain, you are not a rationalist, but a convert to the religion of the omniscient Alien. A rationalist can see with no doubt that he would be $1000 richer taking both boxes, AND he could put a hole in this deification process of the alien.
In my opiinion, this version of the problem exposes the likelihood that there is trickery going on. For 100s or 1000s of shows in a row, Siegfried and Roy made a tiger appear from thin air in a thin and isolated cage suspended in empty space above an empty stage. What are the chances that this was really a case of teleportation or spontaneous generation, and not just a trick where a regular tiger was moved some physically understandable way? And yet if you were priveleged to see this trick performed 100 times, most of you would be amazed every time.
What are the chances that there is not some trickery with this alien? Perhaps this alien created AIs which would choose as he knew they would, clothed them in human flesh, and pre-planted them over decades before coming to earth to create the appearance of being able to predict what humans would do. Is this explanation of what we are seeing really LESS likely than that an alien really has this kind of predictive and computational ability? Even humans with a few $1000 can create illusions, I give you the close-up room at the Magic Castle in L.A. as a place where you can go to see the laws of physics violated with as little as $10 investment in props. Is it more likely that an alien could set up an elaborate “sting” on humanity over a hundred years, or that all humans really are predictable down to how they pick a random number, flip a coin, choose someone from the phone book to flip a coin, or read the least significant digit on a voltmeter attachd to a battery they picked at random from an object in their house?
Occams razor says the Alien is tricking you. Everything else is a believe in magic, a return to religion with God as Omniscient Aliens.
This misses the point of Newcomb’s problem entirely. The stuff about boxes and Omega is just an intuition pump; Newcomb’s problem itself is more properly written as a computer program, which contains none of that other stuff. It is common to complain that no real-world scenario will ever correspond to that program, but that is true only in the same sense that the world can never contain the frictionless pulleys, perfect vacuums and rigid objects that come up in physics problems. It’s not that complications like friction and the possibility of being deceived about the rules don’t matter, but rather that you have to solve the simplified problem first before you add those complications back in. In decision theory, “Omega” is short for “without any complications not explicitly mentioned in the problem statement”, so if you start adding in possibilities like illusionists then it isn’t Newcomb’s problem anymore.
My intuition has been pumped hard by this problem. My intuition is that it violates what we know about physics to be able to predict what each of 6 billion human beings will do confronted with the two boxes after one hour’s time elapsed.
The particular physics I think is violated is quantum mechanical uncertainty. What we believe we know from quantum mechanical uncertainty is that there are a myriad of microscopic processes of which the outcome in our world cannot be predicted. We encase this result from quantum mechanics in at least two possible interpretations labeled Copenhagen and Many Worlds. But both of these interpretations have in common that for a myriad of common events starting at t1, there are multiple mutually exclusive possible outcomes possible at time t2>t1 that are, as far as either Copenhagen or MWI interpretations allow, intrinsically unpredictable at time t1. That is, at least two possible universes at time t2 are completely consistent with the single example universe at time t1: one in which one of these quantum events has turned out one way, and one in which it has turned out another way.
So now the question comes: does this have ANYTHING to do with Newcomb’s problem? And it is trivial to make sure it does. During the hour I have between when the alien sets the boxes in front of me and when I must choose, I acquire a geiger counter, and I open up the stopwatch application on my iPhone. I tune the geiger counter using lead foil and possibly some medical isotopes so that it is triggering on average about once ever 60 seconds. I start the stopwatch, wait until it has run at least 15 seconds, and then stop it next time I hear a click from the geiger counter. I look at the least significant bit on the stopwatch, which is tenths-of-a-second on my iPhone. If that number is even I will pick two boxes if that number is odd, I will pick just box B.
As far as we know from Schrödinger’s cat gedankedonks, the exact time of emissions of radioactive decay particles is quantumly “random.” In Copenhagen, the collapse is at a random time, in many worlds, there is a different version of the universe for each possible decay time. Either way, for the Alien to have filled that box correctly he must be either 1) Able to predict the outcome of quantum phenomenon in a way that our physics currently believes is impossible 2) have flipped a coin and gotten lucky.
Now, with thousands of humans chosen to play this game, what are the chances that I am the only one chosen who includes a quantum coin toss in his choosing mechanism? Either the chances are low, in which case chances of the alien pulling off this scam are falling as 1/2^N where N is the number of quantum coin tosses among his choosers, OR the Alien is cheating.
The Alien’s form of cheating might be one of many things. Perhaps he can correctly predict what SOME humans will do, and he only offers the game to those humans, in which case he will not have offered the game to me or any humans of my ilk.
My intuition has been pumped. I have been shown a gedanken problem which I think has some components equivalent to “assume a circle with four corners,” or “assume 2+2=5″ or some other counterfactual that is just so counter to the factuals in OUR world that pointing out this counterfactuality is the resolution to the paradox.
The things that rule out God as a good hypothesis is not his name, it is his properties. Perhaps the limited Omniscience of being able to predict reliably what any human will do in an hour when confronted with Newcomb’s boxes is god-line enough to be tossed out with God from the list of good hypotheticals. It looks that way to me.
If I am right, we don’t need to develop a decision theory that lets a Friendly AI self-modify to pick one box and still call the whole endeavour rational.
If you allow randomization, you have an underspecified problem again. But you can fix it easily enough by saying that Omega fills the box with the same probability that you one-box.
Here’s a variant that may help your intuition. Supppose that rather than let you pick directly, Omega asks you to write a computer program that implements whatever strategy you would have used, and that program chooses one or two boxes. In that case, the prediction would be trivial, and you would certainly want to provide a program that one-boxed.
Now suppose that instead of writing a computer program, you are one. Because you’e been uploaded, say. In that case, you would want to be a program that one-boxes.
The thing is, due to the physics underlying your brain, you are a computer program. A very complicated, randomized computer program which can’t always be predicted by any means other than simulating it and can’t necessarily be simulated without using resources that aren’t available in the universe. But that’s Omega’s problem. Yours is just choosing a number of boxes.
The original specification of Newcomb’s problem had the alien empty box B if he predicted I would use a random number generator. I’m not sure why Eliezer removed that restriction, but he did and that is a big part of what I writing about.
If you already believe that a PHYSICAL random number generator can be built based no quantum processes, and that such a generator can be interfaced with a computer and therefore called by, controlled by, with results read by a computer, then you don’t need to bother with the details in the next paragraph. The purpose of the next paragraph is to outline the design of such a quantum random number generator.
Get a beta radiation detector with computer interface. Computer must have appropriate two way interface and appropriate library to control and read the radiation detector. Computer must be set up with radiation detector and a beta radiation source (commercially available.)
First part of computer program runs and reads out average rate at which beta particles are being detected. Beta source is moved far away from detector, and it is verified that detector detects at less than once per 10 seconds, on average. Source is moved slowly towards detector until average detector rate is once per 2 seconds or higher. Source can be moved under computer control to make this all a pre=specified program. I would test this program before hardcoding numbers like 10 s and 2 s and the distance ranges the sample was moved, the point would be to get something where the pulses are slow compared to the computer time resolution, but fast compared to any “background” detection rate from this detector.
Now my program freezes the source in place, and runs a 20 s counter. When the 20 s counter is up, the program records the time of the very next beta particle it sees to whatever resolution the computer offers, but at least 1 ms resolution. The computer looks at the tenths-of-second digit in a decimal representation of the time using any onboard clock you care about. Perhaps it is time since the computer program was turned on in order to make it spedifiably simple. If that tenths of a second digit is even, computer chooses two boxes. If that tenths of a second digit is odd, computer chooses only box B.
I believe for this Alien to predict “my” choice, (my computer programs choice) it must be able to predict details of beta decay of my beta emitting sample. Beta decay is a fairly simple atomic decay process which is well characterized by relatively simple quantum mechanics, but which has as best physicists know, an unpredictable actual time that each beta decay will occur.
Now I don’t know why Eliezer eliminated the “Alien empties box if you choose box randomly” but my point here is I can with asymptotically certain probability break the “winning” streak of the alien at predicting what humans will do, as I am able to get other humans to employ this technique. Either that or 1) QM as we know it is wrong or 2) the Alien is cheating, i.e., not doing what EY says he is doing.
Assuming EY got rid of “you lose if you go random” from the Alien’s response for a reason, I think he is doing the equivalent of assuming pi = 22⁄7 exactly or that a square has only 3 sides, or SOME such thing where we are no longer in our universe when considering the problem.
That EY might be coming up with a decision theory that applies only to universes other than our own is not what I think he intends.
Seconding jimrandomh: you seem to be talking about issues that don’t matter to decision theory very much. Let me reframe.
My own interest in the topic was sparked by Eliezer’s remark about “AIs that know each other’s source code”. As far as I understand, his interest in decision theory isn’t purely academic, it’s supposed to be applied to building an AI. So the simplest possible approach is to try “solving decision theory” for deterministic programs that are dropped into various weird setups. It’s not even necessary to explicitly disallow randomization: the predictor can give you a pony if it can prove you cooperate, and no pony otherwise. This way it’s in your interest in some situations to be provably cooperative.
Now, if you’re an AI that can modify your own source code, you will self-modify to become “provably cooperative” in precisely those situations where the payoff structure makes it beneficial. (And correspondingly “credibly threatening” in those situations that call for credible threats, I guess.) Classifying such situations, and mechanical ways of reasoning about them, is the whole point of our decision theory studies. Of course no one can prohibit you from randomizing in adversarial situations, e.g. if you assign a higher utility to proving Omega wrong than to getting a pony.
I definitely appreciate your and jimrandomh’s comments. I am rereading Eliezer’s paper again in light of these comments and clearly getting more on the “decision theory” page as I go.
Provably cooperative seems problematic, but maybe not. As a concept certainly useful. But is there any way to PROVE that the AI is actually running the code she shows you? I suspect probably not.
Also, where I was coming from with my comments may be a misunderstanding of what Eliezer was doing with Newcomb but it may not. At least in other posts, if not in this paper, he has said “rational means winning” and that a self-modifying AI would modify itself to be provably precommitted to box B in Newcomb’s problem. What I think about there are two problems, one of which Eliezer touches on, the other which he doesn’t.
First that he touches on: if the Alien is simply rewarding people for being irrational than its not clear we want an AI to self-modify to win Newcomb’s problem. Clearly an all-powerful alien who threatens humanities existence if it doesn’t worship him, maybe we do want an AI to abandon its rationality for that, but I’m not sure, and what you have here is “assuming God comes along and tells us all to toe the line or go to hell, what does Decision theory tell us to do?” Well the main issue there might be being actually sure that it is God that has come along and not just the man-behind-the-curtain, i.e. a trickster who has your dopey AI thinking it is god and abandoning its rationality, i.e. being hijacked by trickery.
The 2nd issue is: there must be some very high level or reliability required when you are contemplating action predicated on very unlikely hypotheses. If our friendly self-modifying AI sees 1000 instances of an Alien providing Newcomb’s boxes (and 1000 is the number in Eliezer’s paper), I don’t want it concluding 1000 = certainty because it doesn’t. Especially in a complex world where even finite humans using last century’s technolgies can trick the crap out of other humans. If a self-modifying friendly AI sees something come along which appears to violate physics in order to provide a seemingly causal paradox which is laden with the emotion of a million dollars or a cure for your daughter’s cancer, then the last thing I want that AI to do is to modify itself BEFORE it properly estimates the probabilities that the Alien is actually no smarter than Siegfried and Roy.
Its not concievable to me that resistance to getting tricked and properly understanding the influence of evidence especially when that evidence may be provided by an Alien even smarter and with more resources than Siegried and Roy is NOT part of decision theory. Maybe it is not the part Eliezer wants to discuss here.
In any case, I am rereading Eliezer’s paper and will know more about Decision theory before my next comment. Thank you for your comments in that regard, I am finding I flow through Eliezer’s paper more fluidly now after reading those comments.
Nope; certainty is impossible to come by in worlds that contain a sufficiently powerful deceiver. That said, compiling the code she shows you on a different machine and having her shut herself down would be relatively compelling evidence in similar cases that don’t posit an arbitrarily powerful deceiver.
None of that seems relevant to decision theory.
If both boxes are transparent, then the problem is underspecified for agents whose action depends on what they see unless you add a rule to cover them. That doesn’t mean that the parts of the problem which you have specified (namely, what happens to unconditional one-boxers and unconditional two-boxers) are invalid, just that you missed a case.
Actually, if a real-world analog to Newcomb’s Problem ever came up in my real life, there’s a not-insignificant chance that I would turn down the $1000 in the transparent box as well and just walk away—that is, that I would zero-box—under the general principle that if I don’t trust the motives of the person setting up the game I do better not to take any of the choices they are encouraging me to take, no matter how obvious the choices may seem. Maybe I’ve wandered into the next Batman movie the box is poisoned or something.
Of course, if you insist on rejecting the setup to Newcomb’s Problem rather than cooperating with it, you’ll never get to see whether there’s anything valuable being set up.
I think inherent in the problem is the condition that you fully understand what is going on and you know you aren’t part of some weird trick.
It’s not realistic, but being realistic isn’t the point of the problem.