However, I hypothesise that whatever method Omega uses, if the CDT agent knows the method, it will one-box. It is only a “magical Omega” that throws CDT off.
I suspect that if you try to formalize the version of “CDT” that makes the above statement true, it will look suspiciously similar to our formulations of UDT.
This is an interesting question. I believe, the “knowledgeable CDT” would be equivalent to TDT, but not UDT. Now that I understand better what UDT is, I think it is strictly stronger than CDT/TDT, since it is allowed to arrive at decisions by proving things indirectly, whereas CDT/TDT must always resort to an equivalent of either simulation (simulating the other agent) or reasoning by symmetry (which is equivalent to reasoning from being in a simulation, as in the OP).
For example, if I formalize the “knowledgeable version of CDT” as you suggested, then: knowing Omega’s method is equivalent to having its source code. Then CDT using direct causality to arrive at decision would be equivalent to simulating Omega. If Omega itself is a simulator, then CDT could never stop. However, in this case CDT can arrive at decision using anthropic simulation reasoning = reasoning by symmetry, as in the OP. But if Omega is not a simulator, but some kind of a theorem prover, then CDT would be able to simulate it, in which case it will two-box and fail… Hmm, that was actually a problem for UDT as well...
If the agent has more computing power than Omega and is stupid enough to go ahead and simulate Omega, that’s called the ASP problem, and you’re right that it’s a problem for UDT too.
After more thinking, it seems the problem is not in simulation as such, but in (1) having free will + (2) knowing the prediction beforehand. The only self-consistent solution is “two-box”...
Augh, gRR, at this rate you’ll soon be making actual new progress, but only if you force yourself to be more thorough. As Eliezer’s Quirrell said, “You must continue thinking”. A good habit is to always try to push a little bit past the point where you think you have everything figured out.
Vladimir Nesov has just suggested that the agent might choose not to simulate the predictor, but instead make a decision quickly (using only a small fraction of the available resources) to give the predictor a chance at figuring out things about the agent. I don’t know how to formalize this idea in general, but it looks like it might yield a nice solution to the ASP problem someday.
It’s interesting not being my past self and being able to understand that problem.
Because strategies based on simulation of the predictor are opaque to the predictor, while strategies based on high-level reasoning are transparent to the predictor, the problem is no longer just determined by the agent’s final decisions—it’s not in the same class as Newcomb’s problem anymore. It’s a computation-dependent problem, but it’s not quite in the same class as a two box problem that rewards you for picking options alphabetically (the AlphaBeta problem :D).
I agree with Vladimir’s idea that the UDT agent formalized in your original post might still be able to handle it without any extensions, if it finds a short proof that includes some gnarly self-reference (See note). The AlphaBeta problem, on the other hand, is unwinnable for any utility-maximizer without the ability to suspend its own utility-maximizing. This is interesting, because it seems like the ASP problem is also more “reasonable” than the AlphaBeta problem.
(note): As a sketch: The existence of a proof that one-boxing means maximum utility that is less than N is equivalent to both boxes being filled, and if no such proof exists, only one box is filled. If the proven-maximum-utility-meaning action is always taken, then the maximum available utility is when one box is taken and both boxes are full. The optimal action is always This proof is less than N. By the power vested in me by Loeb’s theorem...
the problem is no longer just determined by the agent’s final decisions
Right.
It’s interesting not being my past self and being able to understand that problem.
Congratulations :-) Now I’ll do the thing that Wei usually does, and ask you if something specific in the problem description was tripping you up? How would you rephrase it to make your past self understand it faster?
How would you rephrase it to make your past self understand it faster?
Include a link to Wei Dai’s analysis of the absentminded driver problem, with a short blurb explaining why your theorem-proving agent is like that and not like CDT, maybe. But that would have had only a faint hope of success :P
Thanks you for the kind words! I do constantly make stupid mistakes because of not thinking enough...
Vladimir Nesov’s idea is to remove (2), so that the agent won’t know the prediction beforehand. But I wonder if it might still be possible to remove (1) - to allow the agent sometimes to know its decision beforehand without blowing up. I feel like a perpetuum mobile inventor here… but, I didn’t see an actual proof that it’s impossible… My latest attempt is here.
Taboo “free will”. An action is constructed depending on agent’s knowledge, and therefore on circumstances that formed that knowledge. If this dependence can be characterized by action being consequentialist relative to agent’s stated preference, then the action was chosen correctly. A consequentialist action is one that optimizes the values of its effect, and so the action must depend (via argmax) on the dependence of its effect on the action. If the action is chosen based on different considerations, it’s no longer a consequentialist action.
So, for example, knowing what the action is going to be is not a problem, a problem would be if the action is chosen to fulfill the prediction, because that wouldn’t be a consequentialist reason for an action.
I meant “free will” in the sense that the agent is not allowed to know (prove) its decision before it is made (as in “playing chicken with the universe” rule), as otherwise it becomes inconsistent. As far as I understand, “knowing what the action is going to be” is a problem. Isn’t it?
Having knowledge of the decision lying around is not a problem, the problem is if it’s used in construction of the decision itself in such a way that the resulting decision is not consequentialist. The diagonal rule allows breaking the dependence of your knowledge on your decision, if it hypothetically existed, so that it becomes easier to prove that the decision procedure produces a consequentialist decision.
Also, the decision can’t be “inconsistent”, as it’s part of the territory. Agent’s knowledge may be inconsistent, that is useless, but even then there is fact of the matter of what its gibbering self decides.
I meant agent (its proof system) becoming inconsistent, of course, not its decision. Bad wording on my part.
The problem, as I see it, is that the standard UDT agent (its proof system) is not allowed to prove that it will do a certain action (or that it will not do some action). Because then it will prove stupid counterfactuals, which will make it change its decision, which will make its proof wrong, which will make its proof system inconsistent.
I think this is a serious limitation. Maybe it is impossible to define well-behaved consequentialist agents without this limitation, but I didn’t see an actual proof...
Because then it will prove stupid counterfactuals, which will make it change its decision, which will make its proof wrong, which will make its proof system inconsistent.
This is not how it works. The inference system is consistent, and nothing can be changed, only determined in a stupid way. It’s not false that one-boxing implies that you get $31, if you in fact two-box; your inference system doesn’t need to be inconsistent to produce that argument.
But what if the system proves it will one-box, then forms a counterfactual that two-boxing will get it 10^10$ and so two-boxes. This makes the thing that it proved false, which makes the system inconsistent.
If we know the inference system to be consistent, this proves that the line of reasoning you describe can’t happen. Indeed this is essentially the way we prove that the diagonal step guarantees that the agent doesn’t infer its decision: if it did, that would make its inference system unsound, and we assume it’s not. So what happens is that if the system proves that it will one-box, it doesn’t prove that two-boxing leads to $10^10, instead it proves something that would make it one-box, such as that two-boxing leads to minus $300.
The system is sound. Therefore, it doesn’t prove (before the agent decides) that the agent will one-box. Presence of the diagonal step guarantees that a proof of the agent one-boxing is not encountered (before the agent decides).
Well, exactly, that’s what I said: the agent is not allowed to prove that it will do a certain action before its decision is made. This is a limitation. My hypothesis: it is not a necessary limitation for a well-behaved consequentialist agent. Here is an attempt at writing an agent without this limitation.
def A(): Forall i try to prove A()==A_i() Let UX(X) be the function that results from the source of U() if all calls “A_i()”, for which the system was able to prove A()==A_i() were changed to “eval(X)”. For some action p, prove that for any possible action q, UX(“return q”) ⇐ UX(“return p”). Return p.
Assume A_2() is a weak predictor, and A() is able to simulate it and know its prediction. Nevertheless, A() is still able to prove A()==A_2(), without getting a contradiction (which a regular UDT agent would get from the “playing chicken” rule). So the agent will prove that it will “one-box”, and the weak predictor will prove the same.
1) I can’t really parse it because the first step mentions A_i. What if the problem statement is slightly obfuscated and doesn’t contain any easily recognizable A_i? Can you rephrase your proposal in terms of trying to prove things about U and A?
2) Does your implementation of A use some proof length limit on the first step? On the second step? Or does it somehow stop early when the first proof is found?
1) Yes, this is a problem. I don’t have a good general solution, but as a working first step:
Step 0: enumerate all parameterless function calls within the code of U. Call them A_i, for i=1..N.
[ETA: I assume the code is in a pure functional language here. Haskell rules! :)]
2) Computation resource limits can be added. Let’s say L1 for step 1 (proving A_i() == A()), and L2 for step 3 (finding the action). Then, the system would use, say, L1/N for each of the A_i-s (if a proof is found sooner, other proofs can use more resources, or the system may just loop aimlessly if for some reason all resources must be spent).
For the step 3, proofs are not actually needed. If L2 is sufficiently large, UX(“return p”) can be simply evaluated for all p.
Your proposal seems to be equivalent to something I proposed a while ago: have A try to find a function U’ such that U() is provably equal to U’(A()), then return argmax(U’). By itself this doesn’t even solve Newcomb’s problem (never mind ASP), because there are many possible U’ satisfying the above condition, and no easy way to guarantee that the right one will be found first. There could be nasty Loebian traps. For example, if your order of proof enumeration is unlucky enough, you could find U’=”return 1000″ with no dependence on A at all, then two-box and make that equation true.
You mean, the U’, for which U’(A()) is provably equal to U() depends on the order in which you enumerate U’-s? Hmm, counterintuitive, but it makes sense… A() with a different order would be a different A(), so it would produce a different U’...
Just as a wild shot, what would happen if you try every possible ordering of U’-s and then choose the ordering that produces the best result? Let’s say there is a maximal length for U’ that you are willing to consider, and you have a loop iterating over all possible orderings of U’-s of up to that length. In each iteration of the loop you try to prove things in the specific order, find the argmax, store it, restore the proof system back to its initial state, and go to the next iteration. When everything finishes, use the best found ordering.
I expect, the results will now depend on the order in which the orderings are considered...
For example, if your order of proof enumeration is unlucky enough, you could find U’=”return 1000″ with no dependence on A at all, then two-box and make that equation true.
I wonder if reflection on your own inference system could make things like this go away...
You need to find a simple U’, such that you can construct its argmax. The wrong/useless U’ are just those you can’t compute argmax of (except for the cases like one you point out, which is the same problem for normal ADT, losing control over U). U’ is what I called “utility function” (inferred from definition of utility value U) in this post. Finding moral arguments [A=A1 ⇒ U=U1] is specifically a method for inferring a utility function, but the goal is to find argmax-able U’ (dependence of utility value on your decision).
Is free will inconsistent with knowing the answer after? For example, if you choose to two box, and I watch you, I will know you chose to two box. Does this violate your free will? If not, why should it matter when I’m standing while I know this?
Your knowledge about my decision, before or after, is not relevant to “free will” as I mean it here. Free will exists in my mind. It is my lack of knowledge, before the decision, of what my decision will be.
If I can prove that I will decide X before actually deciding X, then I don’t have free will in this sense.
I suspect that if you try to formalize the version of “CDT” that makes the above statement true, it will look suspiciously similar to our formulations of UDT.
This is an interesting question. I believe, the “knowledgeable CDT” would be equivalent to TDT, but not UDT. Now that I understand better what UDT is, I think it is strictly stronger than CDT/TDT, since it is allowed to arrive at decisions by proving things indirectly, whereas CDT/TDT must always resort to an equivalent of either simulation (simulating the other agent) or reasoning by symmetry (which is equivalent to reasoning from being in a simulation, as in the OP).
For example, if I formalize the “knowledgeable version of CDT” as you suggested, then: knowing Omega’s method is equivalent to having its source code. Then CDT using direct causality to arrive at decision would be equivalent to simulating Omega. If Omega itself is a simulator, then CDT could never stop. However, in this case CDT can arrive at decision using anthropic simulation reasoning = reasoning by symmetry, as in the OP. But if Omega is not a simulator, but some kind of a theorem prover, then CDT would be able to simulate it, in which case it will two-box and fail… Hmm, that was actually a problem for UDT as well...
If the agent has more computing power than Omega and is stupid enough to go ahead and simulate Omega, that’s called the ASP problem, and you’re right that it’s a problem for UDT too.
After more thinking, it seems the problem is not in simulation as such, but in (1) having free will + (2) knowing the prediction beforehand. The only self-consistent solution is “two-box”...
Augh, gRR, at this rate you’ll soon be making actual new progress, but only if you force yourself to be more thorough. As Eliezer’s Quirrell said, “You must continue thinking”. A good habit is to always try to push a little bit past the point where you think you have everything figured out.
Vladimir Nesov has just suggested that the agent might choose not to simulate the predictor, but instead make a decision quickly (using only a small fraction of the available resources) to give the predictor a chance at figuring out things about the agent. I don’t know how to formalize this idea in general, but it looks like it might yield a nice solution to the ASP problem someday.
It’s interesting not being my past self and being able to understand that problem.
Because strategies based on simulation of the predictor are opaque to the predictor, while strategies based on high-level reasoning are transparent to the predictor, the problem is no longer just determined by the agent’s final decisions—it’s not in the same class as Newcomb’s problem anymore. It’s a computation-dependent problem, but it’s not quite in the same class as a two box problem that rewards you for picking options alphabetically (the AlphaBeta problem :D).
I agree with Vladimir’s idea that the UDT agent formalized in your original post might still be able to handle it without any extensions, if it finds a short proof that includes some gnarly self-reference (See note). The AlphaBeta problem, on the other hand, is unwinnable for any utility-maximizer without the ability to suspend its own utility-maximizing. This is interesting, because it seems like the ASP problem is also more “reasonable” than the AlphaBeta problem.
(note): As a sketch: The existence of a proof that one-boxing means maximum utility that is less than N is equivalent to both boxes being filled, and if no such proof exists, only one box is filled. If the proven-maximum-utility-meaning action is always taken, then the maximum available utility is when one box is taken and both boxes are full. The optimal action is always This proof is less than N. By the power vested in me by Loeb’s theorem...
Right.
Congratulations :-) Now I’ll do the thing that Wei usually does, and ask you if something specific in the problem description was tripping you up? How would you rephrase it to make your past self understand it faster?
Include a link to Wei Dai’s analysis of the absentminded driver problem, with a short blurb explaining why your theorem-proving agent is like that and not like CDT, maybe. But that would have had only a faint hope of success :P
Thanks you for the kind words! I do constantly make stupid mistakes because of not thinking enough...
Vladimir Nesov’s idea is to remove (2), so that the agent won’t know the prediction beforehand. But I wonder if it might still be possible to remove (1) - to allow the agent sometimes to know its decision beforehand without blowing up. I feel like a perpetuum mobile inventor here… but, I didn’t see an actual proof that it’s impossible… My latest attempt is here.
Taboo “free will”. An action is constructed depending on agent’s knowledge, and therefore on circumstances that formed that knowledge. If this dependence can be characterized by action being consequentialist relative to agent’s stated preference, then the action was chosen correctly. A consequentialist action is one that optimizes the values of its effect, and so the action must depend (via argmax) on the dependence of its effect on the action. If the action is chosen based on different considerations, it’s no longer a consequentialist action.
So, for example, knowing what the action is going to be is not a problem, a problem would be if the action is chosen to fulfill the prediction, because that wouldn’t be a consequentialist reason for an action.
I meant “free will” in the sense that the agent is not allowed to know (prove) its decision before it is made (as in “playing chicken with the universe” rule), as otherwise it becomes inconsistent. As far as I understand, “knowing what the action is going to be” is a problem. Isn’t it?
Having knowledge of the decision lying around is not a problem, the problem is if it’s used in construction of the decision itself in such a way that the resulting decision is not consequentialist. The diagonal rule allows breaking the dependence of your knowledge on your decision, if it hypothetically existed, so that it becomes easier to prove that the decision procedure produces a consequentialist decision.
Also, the decision can’t be “inconsistent”, as it’s part of the territory. Agent’s knowledge may be inconsistent, that is useless, but even then there is fact of the matter of what its gibbering self decides.
I meant agent (its proof system) becoming inconsistent, of course, not its decision. Bad wording on my part.
The problem, as I see it, is that the standard UDT agent (its proof system) is not allowed to prove that it will do a certain action (or that it will not do some action). Because then it will prove stupid counterfactuals, which will make it change its decision, which will make its proof wrong, which will make its proof system inconsistent.
I think this is a serious limitation. Maybe it is impossible to define well-behaved consequentialist agents without this limitation, but I didn’t see an actual proof...
This is not how it works. The inference system is consistent, and nothing can be changed, only determined in a stupid way. It’s not false that one-boxing implies that you get $31, if you in fact two-box; your inference system doesn’t need to be inconsistent to produce that argument.
But what if the system proves it will one-box, then forms a counterfactual that two-boxing will get it 10^10$ and so two-boxes. This makes the thing that it proved false, which makes the system inconsistent.
If we know the inference system to be consistent, this proves that the line of reasoning you describe can’t happen. Indeed this is essentially the way we prove that the diagonal step guarantees that the agent doesn’t infer its decision: if it did, that would make its inference system unsound, and we assume it’s not. So what happens is that if the system proves that it will one-box, it doesn’t prove that two-boxing leads to $10^10, instead it proves something that would make it one-box, such as that two-boxing leads to minus $300.
Hmmm. Wait, doesn’t diagonal step immediately make the system inconsistent as soon as the system proves the agent will one-box?
The system is sound. Therefore, it doesn’t prove (before the agent decides) that the agent will one-box. Presence of the diagonal step guarantees that a proof of the agent one-boxing is not encountered (before the agent decides).
Well, exactly, that’s what I said: the agent is not allowed to prove that it will do a certain action before its decision is made. This is a limitation. My hypothesis: it is not a necessary limitation for a well-behaved consequentialist agent. Here is an attempt at writing an agent without this limitation.
I don’t understand yet… Could you explain in more detail how your proposal would work in the ASP problem?
I wrote that comment before I saw ASP, but let’s try:
def U():
box1 = 1000
box2 = (A() == 2) ? 0 : 1000000
return box2 + ((A_2() == 2) ? box1 : 0)
def A():
Forall i try to prove A()==A_i()
Let UX(X) be the function that results from the source of U() if all calls “A_i()”, for which the system was able to prove A()==A_i() were changed to “eval(X)”.
For some action p, prove that for any possible action q, UX(“return q”) ⇐ UX(“return p”).
Return p.
Assume A_2() is a weak predictor, and A() is able to simulate it and know its prediction. Nevertheless, A() is still able to prove A()==A_2(), without getting a contradiction (which a regular UDT agent would get from the “playing chicken” rule). So the agent will prove that it will “one-box”, and the weak predictor will prove the same.
Interesting.
1) I can’t really parse it because the first step mentions A_i. What if the problem statement is slightly obfuscated and doesn’t contain any easily recognizable A_i? Can you rephrase your proposal in terms of trying to prove things about U and A?
2) Does your implementation of A use some proof length limit on the first step? On the second step? Or does it somehow stop early when the first proof is found?
1) Yes, this is a problem. I don’t have a good general solution, but as a working first step:
Step 0: enumerate all parameterless function calls within the code of U. Call them A_i, for i=1..N.
[ETA: I assume the code is in a pure functional language here. Haskell rules! :)]
2) Computation resource limits can be added. Let’s say L1 for step 1 (proving A_i() == A()), and L2 for step 3 (finding the action). Then, the system would use, say, L1/N for each of the A_i-s (if a proof is found sooner, other proofs can use more resources, or the system may just loop aimlessly if for some reason all resources must be spent).
For the step 3, proofs are not actually needed. If L2 is sufficiently large, UX(“return p”) can be simply evaluated for all p.
Your proposal seems to be equivalent to something I proposed a while ago: have A try to find a function U’ such that U() is provably equal to U’(A()), then return argmax(U’). By itself this doesn’t even solve Newcomb’s problem (never mind ASP), because there are many possible U’ satisfying the above condition, and no easy way to guarantee that the right one will be found first. There could be nasty Loebian traps. For example, if your order of proof enumeration is unlucky enough, you could find U’=”return 1000″ with no dependence on A at all, then two-box and make that equation true.
You mean, the U’, for which U’(A()) is provably equal to U() depends on the order in which you enumerate U’-s? Hmm, counterintuitive, but it makes sense… A() with a different order would be a different A(), so it would produce a different U’...
Just as a wild shot, what would happen if you try every possible ordering of U’-s and then choose the ordering that produces the best result? Let’s say there is a maximal length for U’ that you are willing to consider, and you have a loop iterating over all possible orderings of U’-s of up to that length. In each iteration of the loop you try to prove things in the specific order, find the argmax, store it, restore the proof system back to its initial state, and go to the next iteration. When everything finishes, use the best found ordering.
I expect, the results will now depend on the order in which the orderings are considered...
[EDIT: Yes, this is stupid. Please disregard]
I wonder if reflection on your own inference system could make things like this go away...
That would be cool. How?
You need to find a simple U’, such that you can construct its argmax. The wrong/useless U’ are just those you can’t compute argmax of (except for the cases like one you point out, which is the same problem for normal ADT, losing control over U). U’ is what I called “utility function” (inferred from definition of utility value U) in this post. Finding moral arguments [A=A1 ⇒ U=U1] is specifically a method for inferring a utility function, but the goal is to find argmax-able U’ (dependence of utility value on your decision).
Is free will inconsistent with knowing the answer after? For example, if you choose to two box, and I watch you, I will know you chose to two box. Does this violate your free will? If not, why should it matter when I’m standing while I know this?
Your knowledge about my decision, before or after, is not relevant to “free will” as I mean it here. Free will exists in my mind. It is my lack of knowledge, before the decision, of what my decision will be.
If I can prove that I will decide X before actually deciding X, then I don’t have free will in this sense.