This misses the point of Newcomb’s problem entirely. The stuff about boxes and Omega is just an intuition pump; Newcomb’s problem itself is more properly written as a computer program, which contains none of that other stuff. It is common to complain that no real-world scenario will ever correspond to that program, but that is true only in the same sense that the world can never contain the frictionless pulleys, perfect vacuums and rigid objects that come up in physics problems. It’s not that complications like friction and the possibility of being deceived about the rules don’t matter, but rather that you have to solve the simplified problem first before you add those complications back in. In decision theory, “Omega” is short for “without any complications not explicitly mentioned in the problem statement”, so if you start adding in possibilities like illusionists then it isn’t Newcomb’s problem anymore.
My intuition has been pumped hard by this problem. My intuition is that it violates what we know about physics to be able to predict what each of 6 billion human beings will do confronted with the two boxes after one hour’s time elapsed.
The particular physics I think is violated is quantum mechanical uncertainty. What we believe we know from quantum mechanical uncertainty is that there are a myriad of microscopic processes of which the outcome in our world cannot be predicted. We encase this result from quantum mechanics in at least two possible interpretations labeled Copenhagen and Many Worlds. But both of these interpretations have in common that for a myriad of common events starting at t1, there are multiple mutually exclusive possible outcomes possible at time t2>t1 that are, as far as either Copenhagen or MWI interpretations allow, intrinsically unpredictable at time t1. That is, at least two possible universes at time t2 are completely consistent with the single example universe at time t1: one in which one of these quantum events has turned out one way, and one in which it has turned out another way.
So now the question comes: does this have ANYTHING to do with Newcomb’s problem? And it is trivial to make sure it does. During the hour I have between when the alien sets the boxes in front of me and when I must choose, I acquire a geiger counter, and I open up the stopwatch application on my iPhone. I tune the geiger counter using lead foil and possibly some medical isotopes so that it is triggering on average about once ever 60 seconds. I start the stopwatch, wait until it has run at least 15 seconds, and then stop it next time I hear a click from the geiger counter. I look at the least significant bit on the stopwatch, which is tenths-of-a-second on my iPhone. If that number is even I will pick two boxes if that number is odd, I will pick just box B.
As far as we know from Schrödinger’s cat gedankedonks, the exact time of emissions of radioactive decay particles is quantumly “random.” In Copenhagen, the collapse is at a random time, in many worlds, there is a different version of the universe for each possible decay time. Either way, for the Alien to have filled that box correctly he must be either
1) Able to predict the outcome of quantum phenomenon in a way that our physics currently believes is impossible
2) have flipped a coin and gotten lucky.
Now, with thousands of humans chosen to play this game, what are the chances that I am the only one chosen who includes a quantum coin toss in his choosing mechanism? Either the chances are low, in which case chances of the alien pulling off this scam are falling as 1/2^N where N is the number of quantum coin tosses among his choosers, OR the Alien is cheating.
The Alien’s form of cheating might be one of many things. Perhaps he can correctly predict what SOME humans will do, and he only offers the game to those humans, in which case he will not have offered the game to me or any humans of my ilk.
My intuition has been pumped. I have been shown a gedanken problem which I think has some components equivalent to “assume a circle with four corners,” or “assume 2+2=5″ or some other counterfactual that is just so counter to the factuals in OUR world that pointing out this counterfactuality is the resolution to the paradox.
The things that rule out God as a good hypothesis is not his name, it is his properties. Perhaps the limited Omniscience of being able to predict reliably what any human will do in an hour when confronted with Newcomb’s boxes is god-line enough to be tossed out with God from the list of good hypotheticals. It looks that way to me.
If I am right, we don’t need to develop a decision theory that lets a Friendly AI self-modify to pick one box and still call the whole endeavour rational.
If you allow randomization, you have an underspecified problem again. But you can fix it easily enough by saying that Omega fills the box with the same probability that you one-box.
Here’s a variant that may help your intuition. Supppose that rather than let you pick directly, Omega asks you to write a computer program that implements whatever strategy you would have used, and that program chooses one or two boxes. In that case, the prediction would be trivial, and you would certainly want to provide a program that one-boxed.
Now suppose that instead of writing a computer program, you are one. Because you’e been uploaded, say. In that case, you would want to be a program that one-boxes.
The thing is, due to the physics underlying your brain, you are a computer program. A very complicated, randomized computer program which can’t always be predicted by any means other than simulating it and can’t necessarily be simulated without using resources that aren’t available in the universe. But that’s Omega’s problem. Yours is just choosing a number of boxes.
The original specification of Newcomb’s problem had the alien empty box B if he predicted I would use a random number generator. I’m not sure why Eliezer removed that restriction, but he did and that is a big part of what I writing about.
If you already believe that a PHYSICAL random number generator can be built based no quantum processes, and that such a generator can be interfaced with a computer and therefore called by, controlled by, with results read by a computer, then you don’t need to bother with the details in the next paragraph. The purpose of the next paragraph is to outline the design of such a quantum random number generator.
Get a beta radiation detector with computer interface. Computer must have appropriate two way interface and appropriate library to control and read the radiation detector. Computer must be set up with radiation detector and a beta radiation source (commercially available.)
First part of computer program runs and reads out average rate at which beta particles are being detected. Beta source is moved far away from detector, and it is verified that detector detects at less than once per 10 seconds, on average. Source is moved slowly towards detector until average detector rate is once per 2 seconds or higher. Source can be moved under computer control to make this all a pre=specified program. I would test this program before hardcoding numbers like 10 s and 2 s and the distance ranges the sample was moved, the point would be to get something where the pulses are slow compared to the computer time resolution, but fast compared to any “background” detection rate from this detector.
Now my program freezes the source in place, and runs a 20 s counter. When the 20 s counter is up, the program records the time of the very next beta particle it sees to whatever resolution the computer offers, but at least 1 ms resolution. The computer looks at the tenths-of-second digit in a decimal representation of the time using any onboard clock you care about. Perhaps it is time since the computer program was turned on in order to make it spedifiably simple. If that tenths of a second digit is even, computer chooses two boxes. If that tenths of a second digit is odd, computer chooses only box B.
I believe for this Alien to predict “my” choice, (my computer programs choice) it must be able to predict details of beta decay of my beta emitting sample. Beta decay is a fairly simple atomic decay process which is well characterized by relatively simple quantum mechanics, but which has as best physicists know, an unpredictable actual time that each beta decay will occur.
Now I don’t know why Eliezer eliminated the “Alien empties box if you choose box randomly” but my point here is I can with asymptotically certain probability break the “winning” streak of the alien at predicting what humans will do, as I am able to get other humans to employ this technique. Either that or 1) QM as we know it is wrong or 2) the Alien is cheating, i.e., not doing what EY says he is doing.
Assuming EY got rid of “you lose if you go random” from the Alien’s response for a reason, I think he is doing the equivalent of assuming pi = 22⁄7 exactly or that a square has only 3 sides, or SOME such thing where we are no longer in our universe when considering the problem.
That EY might be coming up with a decision theory that applies only to universes other than our own is not what I think he intends.
Seconding jimrandomh: you seem to be talking about issues that don’t matter to decision theory very much. Let me reframe.
My own interest in the topic was sparked by Eliezer’s remark about “AIs that know each other’s source code”. As far as I understand, his interest in decision theory isn’t purely academic, it’s supposed to be applied to building an AI. So the simplest possible approach is to try “solving decision theory” for deterministic programs that are dropped into various weird setups. It’s not even necessary to explicitly disallow randomization: the predictor can give you a pony if it can prove you cooperate, and no pony otherwise. This way it’s in your interest in some situations to be provably cooperative.
Now, if you’re an AI that can modify your own source code, you will self-modify to become “provably cooperative” in precisely those situations where the payoff structure makes it beneficial. (And correspondingly “credibly threatening” in those situations that call for credible threats, I guess.) Classifying such situations, and mechanical ways of reasoning about them, is the whole point of our decision theory studies. Of course no one can prohibit you from randomizing in adversarial situations, e.g. if you assign a higher utility to proving Omega wrong than to getting a pony.
I definitely appreciate your and jimrandomh’s comments. I am rereading Eliezer’s paper again in light of these comments and clearly getting more on the “decision theory” page as I go.
Provably cooperative seems problematic, but maybe not. As a concept certainly useful. But is there any way to PROVE that the AI is actually running the code she shows you? I suspect probably not.
Also, where I was coming from with my comments may be a misunderstanding of what Eliezer was doing with Newcomb but it may not. At least in other posts, if not in this paper, he has said “rational means winning” and that a self-modifying AI would modify itself to be provably precommitted to box B in Newcomb’s problem. What I think about there are two problems, one of which Eliezer touches on, the other which he doesn’t.
First that he touches on: if the Alien is simply rewarding people for being irrational than its not clear we want an AI to self-modify to win Newcomb’s problem. Clearly an all-powerful alien who threatens humanities existence if it doesn’t worship him, maybe we do want an AI to abandon its rationality for that, but I’m not sure, and what you have here is “assuming God comes along and tells us all to toe the line or go to hell, what does Decision theory tell us to do?” Well the main issue there might be being actually sure that it is God that has come along and not just the man-behind-the-curtain, i.e. a trickster who has your dopey AI thinking it is god and abandoning its rationality, i.e. being hijacked by trickery.
The 2nd issue is: there must be some very high level or reliability required when you are contemplating action predicated on very unlikely hypotheses. If our friendly self-modifying AI sees 1000 instances of an Alien providing Newcomb’s boxes (and 1000 is the number in Eliezer’s paper), I don’t want it concluding 1000 = certainty because it doesn’t. Especially in a complex world where even finite humans using last century’s technolgies can trick the crap out of other humans. If a self-modifying friendly AI sees something come along which appears to violate physics in order to provide a seemingly causal paradox which is laden with the emotion of a million dollars or a cure for your daughter’s cancer, then the last thing I want that AI to do is to modify itself BEFORE it properly estimates the probabilities that the Alien is actually no smarter than Siegfried and Roy.
Its not concievable to me that resistance to getting tricked and properly understanding the influence of evidence especially when that evidence may be provided by an Alien even smarter and with more resources than Siegried and Roy is NOT part of decision theory. Maybe it is not the part Eliezer wants to discuss here.
In any case, I am rereading Eliezer’s paper and will know more about Decision theory before my next comment. Thank you for your comments in that regard, I am finding I flow through Eliezer’s paper more fluidly now after reading those comments.
is there any way to PROVE that the AI is actually running the code she shows you?
Nope; certainty is impossible to come by in worlds that contain a sufficiently powerful deceiver. That said, compiling the code she shows you on a different machine and having her shut herself down would be relatively compelling evidence in similar cases that don’t posit an arbitrarily powerful deceiver.
This misses the point of Newcomb’s problem entirely. The stuff about boxes and Omega is just an intuition pump; Newcomb’s problem itself is more properly written as a computer program, which contains none of that other stuff. It is common to complain that no real-world scenario will ever correspond to that program, but that is true only in the same sense that the world can never contain the frictionless pulleys, perfect vacuums and rigid objects that come up in physics problems. It’s not that complications like friction and the possibility of being deceived about the rules don’t matter, but rather that you have to solve the simplified problem first before you add those complications back in. In decision theory, “Omega” is short for “without any complications not explicitly mentioned in the problem statement”, so if you start adding in possibilities like illusionists then it isn’t Newcomb’s problem anymore.
My intuition has been pumped hard by this problem. My intuition is that it violates what we know about physics to be able to predict what each of 6 billion human beings will do confronted with the two boxes after one hour’s time elapsed.
The particular physics I think is violated is quantum mechanical uncertainty. What we believe we know from quantum mechanical uncertainty is that there are a myriad of microscopic processes of which the outcome in our world cannot be predicted. We encase this result from quantum mechanics in at least two possible interpretations labeled Copenhagen and Many Worlds. But both of these interpretations have in common that for a myriad of common events starting at t1, there are multiple mutually exclusive possible outcomes possible at time t2>t1 that are, as far as either Copenhagen or MWI interpretations allow, intrinsically unpredictable at time t1. That is, at least two possible universes at time t2 are completely consistent with the single example universe at time t1: one in which one of these quantum events has turned out one way, and one in which it has turned out another way.
So now the question comes: does this have ANYTHING to do with Newcomb’s problem? And it is trivial to make sure it does. During the hour I have between when the alien sets the boxes in front of me and when I must choose, I acquire a geiger counter, and I open up the stopwatch application on my iPhone. I tune the geiger counter using lead foil and possibly some medical isotopes so that it is triggering on average about once ever 60 seconds. I start the stopwatch, wait until it has run at least 15 seconds, and then stop it next time I hear a click from the geiger counter. I look at the least significant bit on the stopwatch, which is tenths-of-a-second on my iPhone. If that number is even I will pick two boxes if that number is odd, I will pick just box B.
As far as we know from Schrödinger’s cat gedankedonks, the exact time of emissions of radioactive decay particles is quantumly “random.” In Copenhagen, the collapse is at a random time, in many worlds, there is a different version of the universe for each possible decay time. Either way, for the Alien to have filled that box correctly he must be either 1) Able to predict the outcome of quantum phenomenon in a way that our physics currently believes is impossible 2) have flipped a coin and gotten lucky.
Now, with thousands of humans chosen to play this game, what are the chances that I am the only one chosen who includes a quantum coin toss in his choosing mechanism? Either the chances are low, in which case chances of the alien pulling off this scam are falling as 1/2^N where N is the number of quantum coin tosses among his choosers, OR the Alien is cheating.
The Alien’s form of cheating might be one of many things. Perhaps he can correctly predict what SOME humans will do, and he only offers the game to those humans, in which case he will not have offered the game to me or any humans of my ilk.
My intuition has been pumped. I have been shown a gedanken problem which I think has some components equivalent to “assume a circle with four corners,” or “assume 2+2=5″ or some other counterfactual that is just so counter to the factuals in OUR world that pointing out this counterfactuality is the resolution to the paradox.
The things that rule out God as a good hypothesis is not his name, it is his properties. Perhaps the limited Omniscience of being able to predict reliably what any human will do in an hour when confronted with Newcomb’s boxes is god-line enough to be tossed out with God from the list of good hypotheticals. It looks that way to me.
If I am right, we don’t need to develop a decision theory that lets a Friendly AI self-modify to pick one box and still call the whole endeavour rational.
If you allow randomization, you have an underspecified problem again. But you can fix it easily enough by saying that Omega fills the box with the same probability that you one-box.
Here’s a variant that may help your intuition. Supppose that rather than let you pick directly, Omega asks you to write a computer program that implements whatever strategy you would have used, and that program chooses one or two boxes. In that case, the prediction would be trivial, and you would certainly want to provide a program that one-boxed.
Now suppose that instead of writing a computer program, you are one. Because you’e been uploaded, say. In that case, you would want to be a program that one-boxes.
The thing is, due to the physics underlying your brain, you are a computer program. A very complicated, randomized computer program which can’t always be predicted by any means other than simulating it and can’t necessarily be simulated without using resources that aren’t available in the universe. But that’s Omega’s problem. Yours is just choosing a number of boxes.
The original specification of Newcomb’s problem had the alien empty box B if he predicted I would use a random number generator. I’m not sure why Eliezer removed that restriction, but he did and that is a big part of what I writing about.
If you already believe that a PHYSICAL random number generator can be built based no quantum processes, and that such a generator can be interfaced with a computer and therefore called by, controlled by, with results read by a computer, then you don’t need to bother with the details in the next paragraph. The purpose of the next paragraph is to outline the design of such a quantum random number generator.
Get a beta radiation detector with computer interface. Computer must have appropriate two way interface and appropriate library to control and read the radiation detector. Computer must be set up with radiation detector and a beta radiation source (commercially available.)
First part of computer program runs and reads out average rate at which beta particles are being detected. Beta source is moved far away from detector, and it is verified that detector detects at less than once per 10 seconds, on average. Source is moved slowly towards detector until average detector rate is once per 2 seconds or higher. Source can be moved under computer control to make this all a pre=specified program. I would test this program before hardcoding numbers like 10 s and 2 s and the distance ranges the sample was moved, the point would be to get something where the pulses are slow compared to the computer time resolution, but fast compared to any “background” detection rate from this detector.
Now my program freezes the source in place, and runs a 20 s counter. When the 20 s counter is up, the program records the time of the very next beta particle it sees to whatever resolution the computer offers, but at least 1 ms resolution. The computer looks at the tenths-of-second digit in a decimal representation of the time using any onboard clock you care about. Perhaps it is time since the computer program was turned on in order to make it spedifiably simple. If that tenths of a second digit is even, computer chooses two boxes. If that tenths of a second digit is odd, computer chooses only box B.
I believe for this Alien to predict “my” choice, (my computer programs choice) it must be able to predict details of beta decay of my beta emitting sample. Beta decay is a fairly simple atomic decay process which is well characterized by relatively simple quantum mechanics, but which has as best physicists know, an unpredictable actual time that each beta decay will occur.
Now I don’t know why Eliezer eliminated the “Alien empties box if you choose box randomly” but my point here is I can with asymptotically certain probability break the “winning” streak of the alien at predicting what humans will do, as I am able to get other humans to employ this technique. Either that or 1) QM as we know it is wrong or 2) the Alien is cheating, i.e., not doing what EY says he is doing.
Assuming EY got rid of “you lose if you go random” from the Alien’s response for a reason, I think he is doing the equivalent of assuming pi = 22⁄7 exactly or that a square has only 3 sides, or SOME such thing where we are no longer in our universe when considering the problem.
That EY might be coming up with a decision theory that applies only to universes other than our own is not what I think he intends.
Seconding jimrandomh: you seem to be talking about issues that don’t matter to decision theory very much. Let me reframe.
My own interest in the topic was sparked by Eliezer’s remark about “AIs that know each other’s source code”. As far as I understand, his interest in decision theory isn’t purely academic, it’s supposed to be applied to building an AI. So the simplest possible approach is to try “solving decision theory” for deterministic programs that are dropped into various weird setups. It’s not even necessary to explicitly disallow randomization: the predictor can give you a pony if it can prove you cooperate, and no pony otherwise. This way it’s in your interest in some situations to be provably cooperative.
Now, if you’re an AI that can modify your own source code, you will self-modify to become “provably cooperative” in precisely those situations where the payoff structure makes it beneficial. (And correspondingly “credibly threatening” in those situations that call for credible threats, I guess.) Classifying such situations, and mechanical ways of reasoning about them, is the whole point of our decision theory studies. Of course no one can prohibit you from randomizing in adversarial situations, e.g. if you assign a higher utility to proving Omega wrong than to getting a pony.
I definitely appreciate your and jimrandomh’s comments. I am rereading Eliezer’s paper again in light of these comments and clearly getting more on the “decision theory” page as I go.
Provably cooperative seems problematic, but maybe not. As a concept certainly useful. But is there any way to PROVE that the AI is actually running the code she shows you? I suspect probably not.
Also, where I was coming from with my comments may be a misunderstanding of what Eliezer was doing with Newcomb but it may not. At least in other posts, if not in this paper, he has said “rational means winning” and that a self-modifying AI would modify itself to be provably precommitted to box B in Newcomb’s problem. What I think about there are two problems, one of which Eliezer touches on, the other which he doesn’t.
First that he touches on: if the Alien is simply rewarding people for being irrational than its not clear we want an AI to self-modify to win Newcomb’s problem. Clearly an all-powerful alien who threatens humanities existence if it doesn’t worship him, maybe we do want an AI to abandon its rationality for that, but I’m not sure, and what you have here is “assuming God comes along and tells us all to toe the line or go to hell, what does Decision theory tell us to do?” Well the main issue there might be being actually sure that it is God that has come along and not just the man-behind-the-curtain, i.e. a trickster who has your dopey AI thinking it is god and abandoning its rationality, i.e. being hijacked by trickery.
The 2nd issue is: there must be some very high level or reliability required when you are contemplating action predicated on very unlikely hypotheses. If our friendly self-modifying AI sees 1000 instances of an Alien providing Newcomb’s boxes (and 1000 is the number in Eliezer’s paper), I don’t want it concluding 1000 = certainty because it doesn’t. Especially in a complex world where even finite humans using last century’s technolgies can trick the crap out of other humans. If a self-modifying friendly AI sees something come along which appears to violate physics in order to provide a seemingly causal paradox which is laden with the emotion of a million dollars or a cure for your daughter’s cancer, then the last thing I want that AI to do is to modify itself BEFORE it properly estimates the probabilities that the Alien is actually no smarter than Siegfried and Roy.
Its not concievable to me that resistance to getting tricked and properly understanding the influence of evidence especially when that evidence may be provided by an Alien even smarter and with more resources than Siegried and Roy is NOT part of decision theory. Maybe it is not the part Eliezer wants to discuss here.
In any case, I am rereading Eliezer’s paper and will know more about Decision theory before my next comment. Thank you for your comments in that regard, I am finding I flow through Eliezer’s paper more fluidly now after reading those comments.
Nope; certainty is impossible to come by in worlds that contain a sufficiently powerful deceiver. That said, compiling the code she shows you on a different machine and having her shut herself down would be relatively compelling evidence in similar cases that don’t posit an arbitrarily powerful deceiver.
None of that seems relevant to decision theory.