If you see a full box, then you must be going to one-box if the predictor really is perfect.
Huh? If I’m a two-boxer, the predictor can still make a simulation of me, show it a simulated full box, and see what happens. It’s easy to formalize, with computer programs for the agent and the predictor.
I’ve already addressed this in the article above, but my understanding is as follows: This is one of those circumstances where it is important to differentiate between you being in a situation and a simulation of you being in a situation. I really should write a post about this—but in order for a simulation to be accurate it simply has to make the same decisions in decision theory problems. It doesn’t have to have anything else the same—in fact, it could be an anti-rational agent with the opposite utility function.
Note, that I’m not claiming that an agent can ever tell whether it is in the real world or in a simulation, but that’s not the point. I’m adopting the viewpoint of an external observer which can tell the difference.
I think the key here is to think about what is happening both in terms of philosophy and mathematics, but you only seem interested in the former?
I (somewhat) agree that there are cases where you need to keep identity separate between levels of simulation (which “you” may or may not be at the outermost of). But I don’t think it matters to this problem. When you add “perfect” to the descriptor, it’s pretty much just you. It makes every relevant decision identically.
When you are trying to really break down a problem, I think it is good practise to assume they are separate at the start. You can then immediately justify talking about a simulation as you in a certain sense, but starting with them separate is key.
I may not have gotten to the part where it matters that they’re separate (in perfect simulation/alignment cases). But no harm in it. Just please don’t obscure the fundamental implication that in such a universe, free will is purely an illusion.
I couldn’t understand your comment, so I wrote a small Haskell program to show that two-boxing in the transparent Newcomb problem is a consistent outcome. What parts of it do you disagree with?
Okay, I have to admit that that’s kind of cool; but on the other hand, that also completely misses the point.
I think we need to backtrack. A maths proof can be valid, but the conclusion false if at least one premise is false right? So unless a problem has already been formally defined it’s not enough to just throw down a maths proof, but you also have to justify that you’ve formalised it correctly.
In other words, the claim isn’t that your program is incorrect, it’s that it requires more justification than you might think in order to persuasively show that it correctly represents Newcomb’s problem. Maybe you think understanding this isn’t particularly important, but I think knowing exactly what is going on is key to understanding how to construct logical-counterfactuals in general.
I actually don’t know Haskell, but I’ll take a stab at decoding it tonight or tomorrow. Open-box Newcomb’s is normally stated as “you see a full box”, not “you or a simulation of you sees a full box”. I agree with this reinterpretation, but I disagree with glossing it over.
My point was that if we take the problem description super-literally as you seeing the box and not a simulation of you, then you must one-box. Of course, since this provides a trivial decision problem, we’ll want to reinterpret it in some way and that’s what I’m providing a justification for.
I see, thanks, that makes it clearer. There’s no disagreement, you’re trying to justify the approach that people are already using. Sorry about the noise.
in fact, it could be an anti-rational agent with the opposite utility function.
These two people might look the same, the might be identical on a quantum level, but one of them is a largely rational agent, and the other is an anti-rational agent with the opposite utility function.
I think that calling something an anti-rational agent with the opposite utility function is a wierd description that doesn’t cut reality at its joints. The is a simple notion of a perfect sphere. There is also a simple notion of a perfect optimizer. Real world objects aren’t perfect spheres, but some are pretty close. Thus “sphere” is a useful approximation, and “sphere + error term” is a useful description. Real agents aren’t perfect optimisers, (ignoring contived goals like “1 for doing whatever you were going to do anyway, 0 else”) but some are pretty close, hence “utility function + biases” is a useful description. This makes the notion of an anti-rational agent with opposite utility function like an inside out sphere with its surface offset inwards by twice the radius. Its a cack handed description of a simple object in terms of a totally different simple object and a huge error term.
This is one of those circumstances where it is important to differentiate between you being in a situation and a simulation of you being in a situation.
I actually don’t think that there is a general procedure to tell what is you, and what is a simulation of you. Standard argument about slowly replacing neurons with nanomachines, slowly porting it to software, slowly abstracting and proving theorems about it rather than running it directly.
It is an entirely meaningful utility function to only care about copies of your algorithm that are running on certain kinds of hardware. That makes you a “biochemical brains running this algorithm” mazimizer. The paperclip maximizer doesn’t care about any copy of its algorithm. Humans worrying about whether the predictors simulation is detailed enough to really suffer is due to specific features of human morality. From the perspective of the paperclip maximizer doing decision theory, what we care about is logical correlation.
“I actually don’t think that there is a general procedure to tell what is you, and what is a simulation of you”—Let’s suppose I promise to sell you an autographed Michael Jackson CD. But then it turns out that the CD wasn’t signed by Michael, but by me. Now I’m really good at forgeries, so good in fact that my signature matches his atom to atom. Haven’t I still lied?
Imagine sitting outside the universe, and being given an exact description of everything that happened within the universe. From this perspective you can see who signed what.
You can also see whether your thoughts are happening in biology or silicon or whatever.
My point isn’t “you can’t tell whether or not your in a simulation so there is no difference”, my point is that there is no sharp cut off point between simulation and not simulation. We have a “know it when you see it” definition with ambiguous edge cases. Decision theory can’t have different rules for dealing with dogs and not dogs because some things are on the ambiguous edge of dogginess. Likewise decision theory can’t have different rules for you, copies of you and simulations of you as there is no sharp cut off. If you want to propose a continuous “simulatedness” parameter, and explain where that gets added to decision theory, go ahead. (Or propose some sharp cutoff)
Some people want to act as though a simulation of you is automatically you and my argument is that it is bad practise to assume this. I’m much more open to the idea that some simulations might be you in some sense than the claim that all are. This seems compatible with a fuzzy cut-off.
Huh? If I’m a two-boxer, the predictor can still make a simulation of me, show it a simulated full box, and see what happens. It’s easy to formalize, with computer programs for the agent and the predictor.
I’ve already addressed this in the article above, but my understanding is as follows: This is one of those circumstances where it is important to differentiate between you being in a situation and a simulation of you being in a situation. I really should write a post about this—but in order for a simulation to be accurate it simply has to make the same decisions in decision theory problems. It doesn’t have to have anything else the same—in fact, it could be an anti-rational agent with the opposite utility function.
Note, that I’m not claiming that an agent can ever tell whether it is in the real world or in a simulation, but that’s not the point. I’m adopting the viewpoint of an external observer which can tell the difference.
I think the key here is to think about what is happening both in terms of philosophy and mathematics, but you only seem interested in the former?
I (somewhat) agree that there are cases where you need to keep identity separate between levels of simulation (which “you” may or may not be at the outermost of). But I don’t think it matters to this problem. When you add “perfect” to the descriptor, it’s pretty much just you. It makes every relevant decision identically.
When you are trying to really break down a problem, I think it is good practise to assume they are separate at the start. You can then immediately justify talking about a simulation as you in a certain sense, but starting with them separate is key.
I may not have gotten to the part where it matters that they’re separate (in perfect simulation/alignment cases). But no harm in it. Just please don’t obscure the fundamental implication that in such a universe, free will is purely an illusion.
I haven’t been defending free will in my posts at all
I couldn’t understand your comment, so I wrote a small Haskell program to show that two-boxing in the transparent Newcomb problem is a consistent outcome. What parts of it do you disagree with?
Okay, I have to admit that that’s kind of cool; but on the other hand, that also completely misses the point.
I think we need to backtrack. A maths proof can be valid, but the conclusion false if at least one premise is false right? So unless a problem has already been formally defined it’s not enough to just throw down a maths proof, but you also have to justify that you’ve formalised it correctly.
Well, the program is my formalization. All the premises are right there. You should be able to point out where you disagree.
In other words, the claim isn’t that your program is incorrect, it’s that it requires more justification than you might think in order to persuasively show that it correctly represents Newcomb’s problem. Maybe you think understanding this isn’t particularly important, but I think knowing exactly what is going on is key to understanding how to construct logical-counterfactuals in general.
I actually don’t know Haskell, but I’ll take a stab at decoding it tonight or tomorrow. Open-box Newcomb’s is normally stated as “you see a full box”, not “you or a simulation of you sees a full box”. I agree with this reinterpretation, but I disagree with glossing it over.
My point was that if we take the problem description super-literally as you seeing the box and not a simulation of you, then you must one-box. Of course, since this provides a trivial decision problem, we’ll want to reinterpret it in some way and that’s what I’m providing a justification for.
I see, thanks, that makes it clearer. There’s no disagreement, you’re trying to justify the approach that people are already using. Sorry about the noise.
Not at all. Your comments helped me realise that I needed to make some edits to my post.
These two people might look the same, the might be identical on a quantum level, but one of them is a largely rational agent, and the other is an anti-rational agent with the opposite utility function.
I think that calling something an anti-rational agent with the opposite utility function is a wierd description that doesn’t cut reality at its joints. The is a simple notion of a perfect sphere. There is also a simple notion of a perfect optimizer. Real world objects aren’t perfect spheres, but some are pretty close. Thus “sphere” is a useful approximation, and “sphere + error term” is a useful description. Real agents aren’t perfect optimisers, (ignoring contived goals like “1 for doing whatever you were going to do anyway, 0 else”) but some are pretty close, hence “utility function + biases” is a useful description. This makes the notion of an anti-rational agent with opposite utility function like an inside out sphere with its surface offset inwards by twice the radius. Its a cack handed description of a simple object in terms of a totally different simple object and a huge error term.
I actually don’t think that there is a general procedure to tell what is you, and what is a simulation of you. Standard argument about slowly replacing neurons with nanomachines, slowly porting it to software, slowly abstracting and proving theorems about it rather than running it directly.
It is an entirely meaningful utility function to only care about copies of your algorithm that are running on certain kinds of hardware. That makes you a “biochemical brains running this algorithm” mazimizer. The paperclip maximizer doesn’t care about any copy of its algorithm. Humans worrying about whether the predictors simulation is detailed enough to really suffer is due to specific features of human morality. From the perspective of the paperclip maximizer doing decision theory, what we care about is logical correlation.
“I actually don’t think that there is a general procedure to tell what is you, and what is a simulation of you”—Let’s suppose I promise to sell you an autographed Michael Jackson CD. But then it turns out that the CD wasn’t signed by Michael, but by me. Now I’m really good at forgeries, so good in fact that my signature matches his atom to atom. Haven’t I still lied?
Imagine sitting outside the universe, and being given an exact description of everything that happened within the universe. From this perspective you can see who signed what.
You can also see whether your thoughts are happening in biology or silicon or whatever.
My point isn’t “you can’t tell whether or not your in a simulation so there is no difference”, my point is that there is no sharp cut off point between simulation and not simulation. We have a “know it when you see it” definition with ambiguous edge cases. Decision theory can’t have different rules for dealing with dogs and not dogs because some things are on the ambiguous edge of dogginess. Likewise decision theory can’t have different rules for you, copies of you and simulations of you as there is no sharp cut off. If you want to propose a continuous “simulatedness” parameter, and explain where that gets added to decision theory, go ahead. (Or propose some sharp cutoff)
Some people want to act as though a simulation of you is automatically you and my argument is that it is bad practise to assume this. I’m much more open to the idea that some simulations might be you in some sense than the claim that all are. This seems compatible with a fuzzy cut-off.