It’s still misinterpretation of Wei Dai’s discussion of probability. What you described is not UDT, and not even a decision theory
I have added a link (pdf) to a complete description of what a UDT algorithm is. I am confident that there are no “misinterpretations” there, but I would be grateful if you pointed out any that you perceive.
I believe it is an accurate description of UDT as presented in the original post, although incomplete knowledge about P_i can be accommodated without changing the formalism, by including all alternatives (completely described this time) enabled by available knowledge about the corresponding world programs, in the list {P_i} (which is the usual reading of “possible world”). Also note that in this post Wei Dai corrected the format of the decisions from individual input/output instances to global strategy-selection.
incomplete knowledge about P_i can be accommodated without changing the formalism, by including all alternatives (completely described this time) enabled by available knowledge about the corresponding world programs, in the list {P_i}
How important is it that the list {P_i} be finite? If P_i is one of the programs in our initial list that we’re uncertain about, couldn’t there be infinitely many alternative programs P_i1, P_i2, . . . behind whatever we know about P_i?
I was thinking that incomplete knowledge about the P_i could be captured (within the formalism) with the mathematical intuition function. (Though it would then make less sense to call it a specifically mathematical intuition.)
Also note that in this post Wei Dai corrected the format of the decisions from individual input/output instances to global strategy-selection.
In principle, it doesn’t matter, because you can represent a countable list of programs as a single program that takes an extra parameter (but then you’ll need to be more careful about the notion of “execution histories”), and more generally you can just include all possible programs in the list and express the level to which you care about the specific programs in the way mathematical intuition ranks their probability and the way utility function ranks their possible semantics.
On execution histories: note that a program is a nice finite inductive definition of how that program behaves, while it’s unclear what an “execution history” is, since it’s an infinite object and so it needs to be somehow finitely described. Also, if, as in the example above you have the world program taking parameters (e.g. a universal machine that takes a Goedel number of a world program as parameter), you’ll have different executions depending on parameter. But if you see a program as a set of axioms for a logical theory defining the program’s behavior, then execution histories can just be different sets of axioms defining program’s behavior in a different way. These different sets of axioms could describe the same theories, or different theories, and can include specific facts about what happens during program execution on so and so parameters. Equivalence of such theories will depend on what you assume about the agent (i.e. if you add different assumptions about the agent to the theories, you get different theories, and so different equivalences), which is what mathematical intuition is trying to estimate.
It’s not accurate to describe strategies as mappings f: X->Y. A strategy can be interactive: it takes input, produces an output, and then environment can prepare another input depending on this output, and so on. Think normalization in lambda calculus. So, the agent’s strategy is specified by a program, but generally speaking this program is untyped.
Let’s assume that there is a single world program, as described here. Then, if A is the agent’s program known to the agent, B is one possible strategy for that program, given in form of a program, X is the world program known to the agent, and Y is one of the possible world execution histories of X given that A behaves like B, again given in form of a program, then mathematical intuition M(B,Y) returns the probability that the statement (A~B ⇒ X~Y) is true, where A~B stands for “A behaves like B”, and similarly for X and Y. (This taps into the ambient control analysis of decision theory.)
It’s not accurate to describe strategies as mappings f: X->Y.
I’m following this paragraph from Wei Dai’s post on UDT1.1:
[U]pon receiving input X, [the agent] would put that input aside and first iterate through all possible input/output mappings that it could implement and determine the logical consequence of choosing each one upon the executions of the world programs that it cares about. After determining the optimal S that best satisfies its preferences, it then outputs S(X).
So, “input/output mappings” is Wei Dai’s language. Does he not mean mappings between the set of possible inputs and the set of possible outputs?
A strategy can be interactive: it takes input, produces an output, and then environment can prepare another input depending on this output, and so on.
It seems to me that this could be captured by the right function f: X → Y. The set I of input-output mappings could be a big collections of GLUTs. Why wouldn’t that suffice for Wei Dai’s purposes?
ETA: And it feels weird typing out “Wei Dai” in full all the time. But the name looks like it might be Asian to me, so I don’t know which part is the surname and which is the given name.
And it feels weird typing out “Wei Dai” in full all the time. But the name looks like it might be Asian to me, so I don’t know which part is the surname and which is the given name.
I’ve been wondering why people keep using my full name around here. Yes, the name is Chinese, but since I live in the US I follow the given-name-first convention. Feel free to call me “Wei”.
No, you can’t represent an interactive strategy by a single input to output mapping. That post made a step in the right direction, but stopped short of victory :-). But I must admit, I forgot about that detail in the second post, so you’ve correctly rendered Wei’s algorithm, although using untyped strategies would further improve on that.
No, you can’t represent an interactive strategy by a single input to output mapping.
Why not?
BTW, in UDT1.1 (as well as UDT1), “input” consists of the agent’s entire memory of the past as well as its current perceptions. Thought I’d mention that in case there’s a misunderstanding there.
… okay, this question allowed me to make a bit of progress. Taking as a starting point the setting of this comment (that we are estimating the probability of (A~B ⇒ X~Y) being true, where A and X are respectively agent’s and environment’s programs, B and Y programs representing agent’s strategy and outcome for environment), and the observations made here and here, we get a scheme for local decision-making.
Instead of trying to decide the whole strategy, we can just decide the local action. Then, the agent program, and “input” consisting of observations and memories, together make up the description of where the agent is in the environment, and thus where its control will be applied. The action that the agent considers can then be local, just something the agent does at this very moment, and the alternatives for this action are alternative statements about the agent: thus, instead of considering a statement A~B for agent’s program A and various whole strategies B, we consider just predicates like action1(A) and action2(A) which assert A to choose action 1 or action 2 in this particular situation, and which don’t assert anything else about its behavior in other situations or on other counterfactuals. Taking into account other actions that the agent might have to make in the past or in the future happens automatically, because the agent works with complete description of environment, even if under severe logical uncertainty. Thus, decision-making happens “one bit at a time”, and the agent’s strategy mostly exists in the environment, not under in any way direct control by the agent, but still controlled in the same sense everything in the environment is.
Thus, in the simplest case of a binary local decision, mathematical intuition would only take as explicit argument a single bit, which indicates what assertion is being made about [agent’s program together with memory and observations], and that is all. No maps, no untyped strategies.
This solution was unavailable to me when I thought about explicit control, because the agent has to coordinate with itself, rely on what it can in fact decide in other situations and not what it should optimally decide, but it’s a natural step in the setting of ambient control, because the incorrect counterfactuals are completely banished out of consideration, and environment describes what the agent will actually do on other occasions.
Going back to the post explicit optimization of global strategy, the agent doesn’t need to figure out the global strategy! Each of the agent copies is allowed to make the decision locally, while observing the other copy as part of the environment (in fact, it’s the same problem as “general coordination problem” I described on the DT list, back when I was clueless about this approach).
Each of the agent copies is allowed to make the decision locally, while observing the other copy as part of the environment
Well, that was my approach in UDT1, but then I found a problem that UDT1 apparently can’t solve, so I switched to optimizing over the global strategy (and named that UDT1.1).
Can you re-read explicit optimization of global strategy and let me know what you think about it now? What I called “logical correlation” (using Eliezer’s terminology) seems to be what you call “ambient control”. The point of that post was that it seems an insufficiently powerful tool for even two agents with the same preferences to solve the general coordination problem amongst themselves, if they only explicitly optimize the local decision and depend on “logical correlation”/”ambient control” to implicitly optimize the global strategy.
If you think there is some way to get around that problem, I’m eager to hear it.
So far as I can see, your mistake was assuming “symmetry”, and dropping probabilities. There is no symmetry, only one of the possibilities is what will actually happen, and the other (which I’m back to believing since the last post on DT list) is inconsistent, though you are unlikely to be able to actually prove any such inconsistency. You can’t say that since (S(1)=A ⇒ S(2)=B) therefore (S(1)=B ⇒ S(2)=A). One of the counterfactuals is inconsistent, so if S(1) is in fact A, then S(1)=B implies anything. But what you are dealing with are probabilities of these statements (which possibly means proof search schemes trying to prove these statements and making a certain number of elementary assumptions, the number that works as the length of programs in universal probability distribution). These probabilities will paint a picture of what you expect the other copy to do, depending on what you do, and this doesn’t at all have to be symmetric.
If there is to be no symmetry between “S(1)=A ⇒ S(2)=B” and “S(1)=B ⇒ S(2)=A”, then something in the algorithm has to treat the two cases differently. In UDT1 there is no such thing to break the symmetry, as far as I can tell, so it would treat them symmetrically and fail on the problem one way or another. Probabilities don’t seem to help since I don’t see why UDT1 would assign them different probabilities.
If you have an idea how the symmetry might be broken, can you explain it in more detail?
With your forbearance, I’ll set up the problem in the notation of my write-up of UDT1.
There is only one world-program P in this problem. The world-program runs the UDT1 algorithm twice, feeding it input “1” on one run, and feeding it input “2“ on the other run. I’ll call these respective runs “Run1” and “Run2”.
The set of inputs for the UDT1 algorithm is X = {1, 2}.
The set of outputs for the UDT1 algorithm is Y = {A, B}.
There are four possible execution histories for P:
E, in which Run1 outputs A, Run2 outputs A, and each gets $0.
F, in which Run1 outputs A, Run2 outputs B, and each gets $10.
G, in which Run1 outputs B, Run2 outputs A, and each gets $10.
H, in which Run1 outputs B, Run2 outputs B, and each gets $0.
The utility function U for the UDT1 algorithm is defined as follows:
U(E) = 0.
U(F) = 20.
U(G) = 20.
U(H) = 0.
Now we want to choose a mathematical intuition function M so that Run1 and Run2 don’t give the same output. This mathematical intuition function does have to satisfy a couple of constraints:
For each choice of input X and output Y, the function M(X, Y, –) must be a normalized probability distribution on {E, F, G, H}.
The mathematical intuition needs to meet certain minimal standards to deserve its name. For example, we need to have M(1, B, E) = 0. The algorithm should know that P isn’t going to execute according to E if the algorithm returns B on input 1.
But these constraints still leave us with enough freedom in how we set up the mathematical intuition. In particular, we can set
M(1, A, F) = 1, and all other values of M(1, A, –) equal to zero;
M(1, B, H) = 1, and all other values of M(1, B, –) equal to zero;
M(2, A, E) = 1, and all other values of M(2, A, –) equal to zero;
M(2, B, F) = 1, and all other values of M(2, B, –) equal to zero.
Thus, in Run1, the algorithm computes that, if it outputs A, then execution history F would transpire, so the agent would get utility U(F) = 20. But if Run1 were to output B, then H would transpire, yielding utility U(H) = 0. Therefore, Run1 outputs A.
Similarly, Run2 computes that its outputting A would result in E, with utility 0, while outputting B would result in F, with utility 20. Therefore, Run2 outputs B.
Hence, execution history F transpires, and the algorithm reaps $20.
ETA: And, as a bonus, this mathematical intuition really makes sense. For, suppose that we held everything equal, except that we do some surgery so that Run1 outputs B. Since everything else is equal, Run2 is still going to output B. And that really would put us in history H, just as Run1 predicted when it evaluated M(1, B, H) = 1.
That’s cheating, you haven’t explained anything, you’ve just chosen the strategies and baptized them with mathematical intuition magically knowing them from the start.
I’m not sure what you mean by “cheating”. Wei Dai doesn’t claim to have explained where the mathematical intuition comes from, and I don’t either. The point is, I could build a UDT1 agent with that mathematical intuition, and the agent would behave correctly if it were to encounter the scenario that Wei describes. How I came up with that mathematical intuition is an open problem. But the agent that I build with it falls under the scope of UDT1. It is not necessary to pass to UDT1.1 to find such an agent.
I’m giving an existence proof: There exist UDT1 agents that perform correctly in Wei’s scenario. Furthermore, the mathematical intuition used by the agent that I exhibit evaluates counterfactuals in a reasonable way (see my edit to the comment).
Wei Dai doesn’t claim to have explained where the mathematical intuition comes from, and I don’t either.
There is a difference between not specifying the structure of an unknown phenomenon for which we still have no explanation, and assigning the phenomenon an arbitrary structure without giving an explanation. Even though you haven’t violated the formalism, mathematical intuition is not supposed to magically rationalize your (or mine) conclusions.
How I came up with that mathematical intuition is an open problem.
No it’s not, you’ve chosen it so that it “proves” what we believe to be a correct conclusion.
I’m giving an existence proof: There exist UDT1 agents that perform correctly in Wei’s scenario.
Since you can force the agent to pick any of the available actions by appropriately manipulating its mathematical intuition, you can “prove” that there is an agent that performs correctly in any given situation, so long as you can forge its mathematical intuition for every such situation. You can also “prove” that there is an agent that makes the worst possible choice, in exactly the same way.
How I came up with that mathematical intuition is an open problem.
No it’s not, you’ve chosen it so that it “proves” what we believe to be a correct conclusion.
This is kind of interesting. In Wei’s problem, I believe that I can force a winning mathematical intuition with just a few additional conditions, none of which assume that we know the correct conclusion. They seem like reasonable conditions to me, though maybe further reflection will reveal counterexamples.
Using my notation from this comment, we have to find right-hand values for the following 16 equations.
M(1, A, E) = . M(1, A, F) = . M(1, A, G) = . M(1, A, H) = .
M(1, B, E) = . M(1, B, F) = . M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = . M(2, A, G) = . M(2, A, H) = .
M(2, B, E) = . M(2, B, F) = . M(2, B, G) = . M(2, B, H) = .
In addition to the conditions that I mentioned in that comment, I add the following,
Binary: Each probability distribution M(X, Y, –) is binary. That is, the mathematical intuition is certain about which execution history would follow from a given output on a given input.
Accuracy: The mathematical intuition, being certain, should be accurate. That is, if the agent expects a certain amount of utility when it produces its output, then it should really get that utility.
(Those both seem sorta plausible in such a simple problem.)
Counterfactual Accuracy: The mathematical intuition should behave well under counterfactual surgery, in the sense that I used in the edit to the comment linked above. More precisely, suppose that the algorithm outputs Yi on input Xi for all i. Suppose that, for a single fixed value of j, we surgically interfered with the algorithm’s execution to make it output Y’j instead of Yj on input Xj. Let E’ be the execution history that would result from this. Then we ought to have that M(Xj, Y’j, E’) = 1.
I suspect that the counterfactual accuracy condition needs to be replaced with something far more subtle to deal with other problems, even in the binary case.
Nonetheless, it seems interesting that, in this case, we don’t need to use any prior knowledge about which mathematical intuitions win.
I’ll proceed by filling in the array above entry-by-entry. We can fill in half the entries right away from the definitions of the execution histories:
M(1, A, E) = . M(1, A, F) = . M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .
Now we have to consider cases. Starting with the upper-left corner, the value of M(1, A, E) will be either 0 or 1.
Case I: Suppose that M(1, A, E) = 0. Normalization forces M(1, A, F) = 1:
M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .
Now, in the second row, the value of M(1, B, G) will be either 0 or 1.
Case I A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:
M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .
We have filled in enough entries to see that Run1 will output A. (Recall that U(F) = 20 and U(H) = 0.) Thus, if Run2 outputs A, then E will happen, not G. Similarly, if Run2 outputs B, then F will happen, not H. This allows us to complete the mathematical intuition function:
M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = 1 M(2, A, F) = 0 M(2, A, G) = 0 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 1 M(2, B, G) = 0 M(2, B, H) = 0
Under this mathematical intuition function, Run1 outputs A and Run2 outputs B. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.
Case I B: Suppose that M(1, B, G) = 1 in the second row. Normalization forces M(1, B, H) = 0:
M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .
In this case, Run1 will need to use a tie-breaker, because it predicts utility 20 from both outputs. There are two cases, one for each possible tie-breaker.
Case I B i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:
M(1, A, E) = 0 M(1, A, F) = 1 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = 1 M(2, A, F) = 0 M(2, A, G) = 0 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 1 M(2, B, G) = 0 M(2, B, H) = 0
Hence, Run2 will output B. But this function fails the counterfactual accuracy condition. It predicts execution history G if Run1 were to output B, when in fact the execution history would be H. Thus we throw out this function.
Case I B ii: Suppose that the tie-breaker leads Run1 to output B. Then, similar to Case I B i, the resulting function fails the counterfactual accuracy test. (Run2 will output A. The resulting function predicts history F if Run1 were to output A, when in fact the history would be E.) Thus we throw out this function.
Therefore, in Case I, all functions either win or are ineligible.
Case II: Suppose that M(1, A, E) = 1. Normalization forces M(1, A, F) = 0, getting us to
M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = . M(1, B, H) = .
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .
Now, in the second row, the value of M(1, B, G) will be either 0 or 1.
Case II A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:
M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .
In this case, Run1 will need to use a tie-breaker, because it predicts utility 0 from both outputs. There are two cases, one for each possible tie-breaker.
Case II A i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:
M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = 1 M(2, A, F) = 0 M(2, A, G) = 0 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 1 M(2, B, G) = 0 M(2, B, H) = 0
Hence, Run2 will output B. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.
Case II A ii: Suppose that the tie-breaker leads Run1 to output B. If Run2 outputs A, then G will happen, not E. And if Run2 outputs B, then H will happen, not F. This gives us a complete mathematical intuition:
M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 0 M(1, B, H) = 1
M(2, A, E) = 0 M(2, A, F) = 0 M(2, A, G) = 1 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 0 M(2, B, G) = 0 M(2, B, H) = 1
Hence, Run2 will output A. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.
Case II B: Suppose that M(1, B, G) = 1. Normalization forces M(1, B, H) = 0:
M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = . M(2, A, F) = 0 M(2, A, G) = . M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = . M(2, B, G) = 0 M(2, B, H) = .
We have filled in enough entries to see that Run1 will output B. (Recall that U(E) = 0 and U(G) = 20.) Thus, if Run2 outputs A, then G will happen, not E. Similarly, if Run2 outputs B, then H will happen, not F. This allows us to complete the mathematical intuition function:
M(1, A, E) = 1 M(1, A, F) = 0 M(1, A, G) = 0 M(1, A, H) = 0
M(1, B, E) = 0 M(1, B, F) = 0 M(1, B, G) = 1 M(1, B, H) = 0
M(2, A, E) = 0 M(2, A, F) = 0 M(2, A, G) = 1 M(2, A, H) = 0
M(2, B, E) = 0 M(2, B, F) = 0 M(2, B, G) = 0 M(2, B, H) = 1
Under this mathematical intuition function, Run1 outputs B and Run2 outputs A. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.
Therefore, all cases lead to mathematical intuitions that either win or are ineligible.
ETA: And I just discovered that there’s a length-limit on comments.
Do you think your idea is applicable to multi-player games, which is ultimately what we’re after? (I don’t see how to do it myself.) Take a look at this post, which I originally wrote for another mailing list:
In
http://lesswrong.com/lw/1s5/explicit_optimization_of_global_strategy_fixing_a/ I
gave an example of a coordination game for two identical agents with the
same (non-indexical) preferences and different inputs. The two agents had to
choose different outputs in order to maximize their preferences, and I tried
to explain why it seemed to me that they couldn’t do this by a logical
correlation type reasoning alone.
A harder version of this problem involves two agents with different
preferences, but are otherwise identical. For simplicity let’s assume they
both care only about what happens in one particular world program (and
therefore have no uncertainty about each other’s source code). This may not
be the right way to frame the question, which is part of my confusion. But
anyway, let the choices be C and D, and consider this payoff matrix (and
suppose randomized strategies are not possible):
0,0 4,5
5,4 0,0
Here’s the standard PD matrix for comparison:
3,3 0,5
5,0 1,1
Nesov’s intuitions at
http://lesswrong.com/lw/1vv/the_blackmail_equation/1qk9 make sense to me in
this context. It seems that if these two agents are to achieve the 4,5 or
5,4 outcome, it has to be through some sort of “jumbles of wires”
consideration, since there is no “principled” way to decide between the two,
as far as I can tell. But what is that reasoning exactly? Does anyone
understand acausal game theory (is this a good name?) well enough to walk me
through how these two agents might arrive at one of the intuitively correct
answers (and also show that the same type of reasoning gives a intuitively
correct answer for PD)?
If my way of framing the question is not a good one, I’d like to see any
kind of worked-out example in this vein.
It’s tempting to take a step back and consider the coordination game from the point of view of the agent before-observation, as it gives a nice equivalence between the copies, control over the consequences for both copies from a common source. This comes with a simple algorithm, an actual explanation. But as I suspect you intended to communicate in this comment, this is not very interesting, because it’s not a general case: in two-player games the other player is not your copy, and wasn’t one any time previous. But if we try to consider the actions of agent after-observation, of the two copies diverged, there seems to be no nice solution anymore.
It’s clear how the agent before-observations controls the copies after, and so how its decisions about the strategy of reacting to future observations control both copies, coordinate them. It’s far from clear how a copy that received one observation can control a copy that received the other observation. Parts control the whole, but not conversely. Yet the coordination problem could be posed about two agents that have nothing in common, and we’d expect there to be a solution to that as well. Thus I expect the coordination problem with two copies to have a local solution, apart from the solution of deciding in advance, as you describe in the post.
My comment to which you linked is clearly flawed in at least one respect: it assumes that to control a structure B with agent A, B has to be defined in terms of A. This is still an explicit control mindset, what I call acausal control, but it’s wrong, not as general as ambient control, where you are allowed to discover new dependencies, or UDT, where the discovery of new dependencies is implicit in mathematical intuition.
It’ll take much better understanding of theories of consequences, the process of their exploration, preference defined over them, to give specific examples, and I don’t expect these examples to be transparent (but maybe there is a simple proof that the decisions will be correct, that doesn’t point out the specific details of the decision-making process).
Do you think your idea is applicable to multi-player games, which is ultimately what we’re after? (I don’t see how to do it myself.) Take a look at this post, which I originally wrote for another mailing list:
In http://lesswrong.com/lw/1s5/explicit_optimization_of_global_strategy_fixing_a/ I gave an example of a coordination game for two identical agents with the same (non-indexical) preferences and different inputs. The two agents had to choose different outputs in order to maximize their preferences, and I tried to explain why it seemed to me that they couldn’t do this by a logical correlation type reasoning alone.
I think that there may have been a communication failure here. The comment that you’re replying to is specifically about that exact game, the one in your post Explicit Optimization of Global Strategy (Fixing a Bug in UDT1). The communication failure is my fault, because I had assumed that you had been following along with the conversation.
Here is the relevant context:
In this comment, I re-posed your game from the “explicit optimization” post in the notation of my write-up of UDT. In that comment, I gave an example of a mathematical intuition such that a UDT1 agent with that mathematical intuition would win the game.
In reply, Vladimir pointed out that the real problem is not to show that there exists a winning mathematical intuition. Rather, the problem is to give a general formal decision procedure that picks out a winning mathematical intuition. Cooking up a mathematical intuition that “proves” what I already believe to be the correct conclusion is “cheating”.
The purpose of the comment that you’re replying to was to answer Vladimir’s criticism. I show that, for this particular game (the one in your “explicit optimization” post), the winning mathematical intuitions are the only ones that meet certain reasonable criteria. The point is that these “reasonable criteria” do not involve any assumption about what the agent should do in the game.
Actually, I had been following your discussion with Nesov, but I’m not sure if your comment adequately answered his objection. So rather than commenting on that, I wanted to ask whether your approach of using “reasonable criteria” to narrow down mathematical intuitions can be generalized to deal with the harder problem of multi-player games. (If it can’t, then perhaps the discussion is moot.)
I see. I misunderstood the grandparent to be saying that your “explicit optimization” LW post had originally appeared on another mailing list, and I thought that you were directing me to it to see what I had to say about the game there. I was confused because this whole conversation already centered around that very game :).
I show that, for this particular game (the one in your “explicit optimization” post), the winning mathematical intuitions are the only ones that meet certain reasonable criteria.
(1) Which one of them will actually be given? (2) If there is no sense in which some of these “reasonable” conclusions are better than each other, why do you single them out, rather than mathematical intuitions expressing uncertainty about the outcomes that would express the lack of priority of some of these outcomes over others?
I don’t find the certainty of conclusions a reasonable assumption, in particular because, as you can see, you can’t unambiguously decide which of the conclusions is the right one, and so can’t the agent.
I claim to be giving, at best, a subset of “reasonable criteria” for mathematical intuition functions. Any UDT1-builder who uses a superset of these criteria, and who has enough decision criteria to decide which UDT1 agent to write, will write an agent who wins Wei’s game. In this case, it would suffice to have the criteria I mentioned plus a lexicographic tie-breaker (as in UDT1.1). I’m not optimistic that that will hold in general.
(I also wouldn’t be surprised to see an example showing that my “counterfactual accuracy” condition, as stated, rules out all winning UDT1 algorithms in some other game. I find it pretty unlikely that it suffices to deal with mathematical counterfactuals in such a simple way, even given the binary certainty and accuracy conditions.)
My point was only that the criteria above already suffice to narrow the field of options for the builder down to winning options. Hence, whatever superset of these criteria the builder uses, this superset doesn’t need to include any knowledge about which possible UDT1 agent would win.
(2) If there is no sense in which some of these “reasonable” conclusions are better than each other, why do you single them out, rather than mathematical intuitions expressing uncertainty about the outcomes that would express the lack of priority of some of these outcomes over others?
I don’t follow. Are you suggesting that I could just as reasonably have made it a condition of any acceptable mathematical intuition function that M(1, A, E) = 0.5 ?
I don’t find the certainty of conclusions a reasonable assumption, in particular because, as you can see, you can’t unambiguously decide which of the conclusions is the right one, and so can’t the agent.
If I (the builder/writer) really couldn’t decide which mathematical intuition function to use, then the agent won’t come to exist in the first place. If I can’t choose among the two options that remain after I apply the described criteria, then I will be frozen in indecision, and no agent will get built or written. I take it that this is your point.
But if I do have enough additional criteria to decide (which in this case could be just a lexicographic tie-breaker), then I don’t see what is unreasonable about the “certainty of conclusions” assumption for this game.
If I (the builder/writer) really couldn’t decide which mathematical intuition function to use, then the agent won’t come to exist in the first place.
You don’t pick the output of mathematical intuition in a particular case, mathematical intuition is a general algorithm that works based on world programs, outcomes, and your proposed decisions. It’s computationally intensive, its results are not specified in advance based on intuition, on the contrary the algorithm is what stands for intuition. With more resources, this algorithm will produce different probabilities, as it comes to understand the problem better. And you just pick the algorithm. What you can say about its outcome is a matter of understanding the requirements for such general algorithm, and predicting what it must therefore compute. Absolute certainty of the algorithm, for example, would imply that the algorithm managed to logically infer that the outcome would be so and so, and I don’t see how it’s possible to do that, given the problem statement. If it’s unclear how to infer what will happen, then mathematical intuition should be uncertain (but it can know something to tilt the balance one way a little bit, perhaps enough to decide the coordination problem!)
There is a single ideal mathematical intuition, which, given a particular amount of resources, and a particular game, determines a unique function M mapping {inputs} x {outputs} x {execution histories} --> [0,1] for a UDT1 agent in that game. This ideal mathematical intuition (IMI) is defined by the very nature of logical or mathematical inference under computational limitation. So, in particular, it’s not something that you can talk about choosing using some arbitrary tie-breaker like lexicographic order.
Now, maybe the IMI requires that the function M be binary in some particular game with some particular amount of resources. Or maybe the IMI requires a non-binary function M for all amounts of computational resources in that game. Unless you can explain exactly why the IMI requires a binary function M for this particular game, you haven’t really made progress on the kinds of questions that we’re interested in.
More or less. Of course there is no point in going for a “single” mathematical intuition, but the criteria for choosing one shouldn’t be specific to a particular game. Mathematical intuition primarily works with the world program, trying to estimate how plausible it is that this world program will be equivalent to a given history definition, under the condition that the agent produces given output.
Let me see if I understand your point. Are you saying the following?
Some UDT1 agents perform correctly in the scenario, but some don’t. To not be “cheating”, you need to provide a formal decision theory (or at least make some substantial progress towards providing one) that explains why the agent’s builder would choose to build one of the UDT1 agents that do perform correctly.
Not quite. UDT is not an engineering problem, it’s a science problem. There is a mystery in what mathematical intuition is supposed to be, not just a question of tackling it on. The current understanding allows to instantiate incorrect UDT agents, but that’s a failure of understanding, not a problem with UDT agents. By studying the setting more, we’ll learn more about what mathematical intuition is, which will show some of the old designs incorrect.
You say “Not quite”, but this is still looking like what I tried to capture with my paraphrase. I was asking if you were saying the following:
A full solution that was a pure extension (not revision) of UDT1 [since I was trying to work within UDT1] would have to take the form of a formal DT such that a builder with that DT would have to choose to build a correct UDT1 agent.
Yeah, that works; though of course the revised decision theories will most certainly not be formal extensions of UDT1, they might give guidelines on designing good UDT1-compliant agents.
...and this leads to another bit of progress, on the structure of mathematical intuition. The key exercise is to try to explain explicit optimization of strategies, as described in your post, in terms of local ambient control that determines only the action. The strategies then become the assumptions of the proof search that tries to guess what will be the global outcome. The solution comes from two agents being equivalent without the observation, thus “input” is automatically extracted from the agents’ code (if we assume the input to be just part of the code by the time decision problem gets stated). I’ll describe it in more detail later, if this pans out.
With your forbearance, I’ll set up the problem in the notation of my write-up of UTD1.
There is only one world-program P in this problem. The world-program runs the UTD algorithm twice, feeding it input “1” on one run, and feeding it input “2“ on the other run. I’ll call these respective runs “Run1” and “Run2”.
The set of inputs for the UTD1 algorithm is X = {1, 2}.
The set of outputs for the UTD1 algorithm is Y = {A, B}.
There are four possible execution histories for P:
E, in which Run1 outputs A, Run2 outputs A, and each gets $0.
F, in which Run1 outputs A, Run2 outputs B, and each gets $10.
G, in which Run1 outputs B, Run2 outputs A, and each gets $10.
H, in which Run1 outputs B, Run2 outputs B, and each gets $0.
The utility function U for the UTD1 algorithm is defined as follows:
U(E) = 0.
U(F) = 20.
U(G) = 20.
U(H) = 0.
Now we want to choose a mathematical intuition function M so that Run1 and Run2 don’t give the same output. This mathematical intuition function does have to satisfy a couple of constraints:
For each choice of input X and output Y, the function M(X, Y, –) must be a normalized probability distribution on {E, F, G, H}.
The mathematical intuition needs to meet certain minimal standards to deserve its name. For example, we need to have M(1, B, E) = 0. The algorithm should know that P isn’t going to execute according to E if the algorithm returns B on input 1.
But these constraints still leave us with enough freedom in how we set up the mathematical intuition. In particular, we can set
M(1, A, F) = 1, and all other values of M(1, A, –) equal to zero;
M(1, B, H) = 1, and all other values of M(1, B, –) equal to zero;
M(2, A, H) = 1, and all other values of M(2, A, –) equal to zero;
M(2, B, F) = 1, and all other values of M(2, B, –) equal to zero.
Thus, in Run1, the algorithm computes that, if it outputs A, then execution history F would transpire, so the agent would get utility U(F) = 20. But if Run1 outputs B, then H would transpire, yielding utility U(H) = 0. Therefore, Run1 outputs A.
Similarly, Run2 computes that its outputting A would result in H, with utility 0, while outputting B would result in F, with utility 20. Therefore, Run2 outputs B.
Hence, execution history F transpires, and the algorithm reaps $20.
The symmetry is broken by “1” being different from “2″. The probabilities express logical uncertainty, and so essentially depend on what happens to be provable given finite resources and epistemic state of the agent, for which implementation detail matters. The asymmetry is thus hidden in mathematical intuition, and is not visible in the parts of UDT explicitly described.
...but on the other hand, you don’t need the “input” at all, if decision-making is about figuring out the strategy. You can just have a strategy that produces the output, with no explicit input. The history of input can remain implicit in the agent’s program, which is available anyway.
BTW, in UDT1.1 (as well as UDT1), “input” consists of the agent’s entire memory of the past as well as its current perceptions. Thought I’d mention that in case there’s a misunderstanding there.
BTW, in UDT1.1 (as well as UDT1), “input” consists of the agent’s entire memory of the past as well as its current perceptions. Thought I’d mention that in case there’s a misunderstanding there.
Yes, that works too. On second thought, extracting output in this exact manner, while pushing everything else to the “input” allows to pose a problem specifically about the output in this particular situation, so as to optimize the activity for figuring out this output, rather than the whole strategy, of which right now you only need this aspect and no more.
I was having trouble understanding what strategy couldn’t be captured by a function X → Y. After all, what could possibly determine the output of an algorithm other than its source code and whatever input it remembers getting on that particular run? Just to be clear, do you now agree that every strategy is captured by some function f: X → Y mapping inputs to outputs?
One potential problem is that there are infinitely many input-output mappings. The agent can’t assume a bound on the memory it will have, so it can’t assume a bound on the lengths of inputs X that it will someday need to plug into an input-output mapping f.
Unlike the case where there are potentially infinitely many programs P1, P2, . . ., it’s not clear to me that it’s enough to wrap up an infinte set I of input-output mappings into some finite program that generates them. This is because the UDT1.1 agent needs to compute a sum for every element of I. So, if the set I is infinite, the number of sums to be computed will be infinite. Having a finite description of I won’t help here, at least not with a brute-force UDT1.1 algorithm.
Any infinite thing in any given problem statement is already presented to you with a finite description. All you have to do is transform that finite description of an infinite object so as to get a finite description of a solution of your problem posed about the infinite object.
Any infinite thing in any given problem statement is already presented to you with a finite description. All you have to do is transform that finite description of an infinite object so as to get a finite description of a solution of your problem posed about the infinite object.
Right. I agree.
But, to make Wei’s formal description of UDT1.1 work, there is a difference between
dealing with a finite description of an infinite execution history Ei and
dealing with a finite description of an infinite set I of input-output maps.
The difference is this: The execution histories only get fed into the utility function U and the mathematical intuition function (which I denote by M). These two functions are taken to be black boxes in Wei’s description of UDT1.1. His purpose is not to explain how these functions work, so he isn’t responsible for explaining how they deal with finite descriptions of infinite things. Therefore, the potential infinitude of the execution histories is not a problem for what he was trying to do.
In contrast, the part of the algorithm that he describes explicitly does require computing an expected utility for every input-output map and then selecting the input-output map that yielded the largest expected utility. Thus, if I is infinite, the brute-force version of UDT1.1 requires the agent to find a maximum from among infinitely many expected utilities. That means that the brute-force version just doesn’t work in this case. Merely saying that you have a finite description of I is not enough to say in general how you are finding the maximum from among infinitely many expected utilities. In fact, it seems possible that there may be no maximum.
Actually, in both UDT1 and UDT1.1, there is a similar issue with the possibility of having infinitely many possible execution-history sequences . In both versions of UDT, you have to perform a sum over all such sequences. Even if you have a finite description of the set E of such sequences, a complete description of UDT still needs to explain how you are performing the sum over the infinitely many elements of the set. In particular, it’s not obvious that this sum is always well-defined.
...but the action could be a natural number, no? It’s entirely OK if there is no maximum—the available computational resources then limit how good a strategy the agent manages to implement (“Define as big a natural number as you can!”). The “algorithm” is descriptive, it’s really a definition of optimality of a decision, not specification of how this decision is to be computed. You can sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.
The “algorithm” is descriptive, it’s really a definition of optimality of a decision, not specification of how this decision is to be computed. You can sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.
Okay. I didn’t know that the specification of how to compute was explicitly understood to be incomplete in this way. Of course, the description could only be improved by being more specific about just when you can “sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.”
BTW, in UDT1.1 (as well as UDT1), “input” consists of the agent’s entire memory of the past as well as its current perceptions. Thought I’d mention that in case there’s a misunderstanding there.
Yes, that works too. If you can make “inputs and outputs” into anything, sure you can represent any strategy with them. I’m not sure it makes sense to separate them in that case though—just make the whole strategy as the sole product of decision-making; interpretation of this product is a separate question, and doesn’t necessarily need to be addressed at all.
I have added a link (pdf) to a complete description of what a UDT algorithm is. I am confident that there are no “misinterpretations” there, but I would be grateful if you pointed out any that you perceive.
I believe it is an accurate description of UDT as presented in the original post, although incomplete knowledge about P_i can be accommodated without changing the formalism, by including all alternatives (completely described this time) enabled by available knowledge about the corresponding world programs, in the list {P_i} (which is the usual reading of “possible world”). Also note that in this post Wei Dai corrected the format of the decisions from individual input/output instances to global strategy-selection.
How important is it that the list {P_i} be finite? If P_i is one of the programs in our initial list that we’re uncertain about, couldn’t there be infinitely many alternative programs P_i1, P_i2, . . . behind whatever we know about P_i?
I was thinking that incomplete knowledge about the P_i could be captured (within the formalism) with the mathematical intuition function. (Though it would then make less sense to call it a specifically mathematical intuition.)
I’ve added a description of UDT1.1 to my pdf.
In principle, it doesn’t matter, because you can represent a countable list of programs as a single program that takes an extra parameter (but then you’ll need to be more careful about the notion of “execution histories”), and more generally you can just include all possible programs in the list and express the level to which you care about the specific programs in the way mathematical intuition ranks their probability and the way utility function ranks their possible semantics.
On execution histories: note that a program is a nice finite inductive definition of how that program behaves, while it’s unclear what an “execution history” is, since it’s an infinite object and so it needs to be somehow finitely described. Also, if, as in the example above you have the world program taking parameters (e.g. a universal machine that takes a Goedel number of a world program as parameter), you’ll have different executions depending on parameter. But if you see a program as a set of axioms for a logical theory defining the program’s behavior, then execution histories can just be different sets of axioms defining program’s behavior in a different way. These different sets of axioms could describe the same theories, or different theories, and can include specific facts about what happens during program execution on so and so parameters. Equivalence of such theories will depend on what you assume about the agent (i.e. if you add different assumptions about the agent to the theories, you get different theories, and so different equivalences), which is what mathematical intuition is trying to estimate.
It’s not accurate to describe strategies as mappings f: X->Y. A strategy can be interactive: it takes input, produces an output, and then environment can prepare another input depending on this output, and so on. Think normalization in lambda calculus. So, the agent’s strategy is specified by a program, but generally speaking this program is untyped.
Let’s assume that there is a single world program, as described here. Then, if A is the agent’s program known to the agent, B is one possible strategy for that program, given in form of a program, X is the world program known to the agent, and Y is one of the possible world execution histories of X given that A behaves like B, again given in form of a program, then mathematical intuition M(B,Y) returns the probability that the statement (A~B ⇒ X~Y) is true, where A~B stands for “A behaves like B”, and similarly for X and Y. (This taps into the ambient control analysis of decision theory.)
I’m following this paragraph from Wei Dai’s post on UDT1.1:
So, “input/output mappings” is Wei Dai’s language. Does he not mean mappings between the set of possible inputs and the set of possible outputs?
It seems to me that this could be captured by the right function f: X → Y. The set I of input-output mappings could be a big collections of GLUTs. Why wouldn’t that suffice for Wei Dai’s purposes?
ETA: And it feels weird typing out “Wei Dai” in full all the time. But the name looks like it might be Asian to me, so I don’t know which part is the surname and which is the given name.
I’ve been wondering why people keep using my full name around here. Yes, the name is Chinese, but since I live in the US I follow the given-name-first convention. Feel free to call me “Wei”.
No, you can’t represent an interactive strategy by a single input to output mapping. That post made a step in the right direction, but stopped short of victory :-). But I must admit, I forgot about that detail in the second post, so you’ve correctly rendered Wei’s algorithm, although using untyped strategies would further improve on that.
Why not?
BTW, in UDT1.1 (as well as UDT1), “input” consists of the agent’s entire memory of the past as well as its current perceptions. Thought I’d mention that in case there’s a misunderstanding there.
… okay, this question allowed me to make a bit of progress. Taking as a starting point the setting of this comment (that we are estimating the probability of (A~B ⇒ X~Y) being true, where A and X are respectively agent’s and environment’s programs, B and Y programs representing agent’s strategy and outcome for environment), and the observations made here and here, we get a scheme for local decision-making.
Instead of trying to decide the whole strategy, we can just decide the local action. Then, the agent program, and “input” consisting of observations and memories, together make up the description of where the agent is in the environment, and thus where its control will be applied. The action that the agent considers can then be local, just something the agent does at this very moment, and the alternatives for this action are alternative statements about the agent: thus, instead of considering a statement A~B for agent’s program A and various whole strategies B, we consider just predicates like action1(A) and action2(A) which assert A to choose action 1 or action 2 in this particular situation, and which don’t assert anything else about its behavior in other situations or on other counterfactuals. Taking into account other actions that the agent might have to make in the past or in the future happens automatically, because the agent works with complete description of environment, even if under severe logical uncertainty. Thus, decision-making happens “one bit at a time”, and the agent’s strategy mostly exists in the environment, not under in any way direct control by the agent, but still controlled in the same sense everything in the environment is.
Thus, in the simplest case of a binary local decision, mathematical intuition would only take as explicit argument a single bit, which indicates what assertion is being made about [agent’s program together with memory and observations], and that is all. No maps, no untyped strategies.
This solution was unavailable to me when I thought about explicit control, because the agent has to coordinate with itself, rely on what it can in fact decide in other situations and not what it should optimally decide, but it’s a natural step in the setting of ambient control, because the incorrect counterfactuals are completely banished out of consideration, and environment describes what the agent will actually do on other occasions.
Going back to the post explicit optimization of global strategy, the agent doesn’t need to figure out the global strategy! Each of the agent copies is allowed to make the decision locally, while observing the other copy as part of the environment (in fact, it’s the same problem as “general coordination problem” I described on the DT list, back when I was clueless about this approach).
Well, that was my approach in UDT1, but then I found a problem that UDT1 apparently can’t solve, so I switched to optimizing over the global strategy (and named that UDT1.1).
Can you re-read explicit optimization of global strategy and let me know what you think about it now? What I called “logical correlation” (using Eliezer’s terminology) seems to be what you call “ambient control”. The point of that post was that it seems an insufficiently powerful tool for even two agents with the same preferences to solve the general coordination problem amongst themselves, if they only explicitly optimize the local decision and depend on “logical correlation”/”ambient control” to implicitly optimize the global strategy.
If you think there is some way to get around that problem, I’m eager to hear it.
So far as I can see, your mistake was assuming “symmetry”, and dropping probabilities. There is no symmetry, only one of the possibilities is what will actually happen, and the other (which I’m back to believing since the last post on DT list) is inconsistent, though you are unlikely to be able to actually prove any such inconsistency. You can’t say that since (S(1)=A ⇒ S(2)=B) therefore (S(1)=B ⇒ S(2)=A). One of the counterfactuals is inconsistent, so if S(1) is in fact A, then S(1)=B implies anything. But what you are dealing with are probabilities of these statements (which possibly means proof search schemes trying to prove these statements and making a certain number of elementary assumptions, the number that works as the length of programs in universal probability distribution). These probabilities will paint a picture of what you expect the other copy to do, depending on what you do, and this doesn’t at all have to be symmetric.
If there is to be no symmetry between “S(1)=A ⇒ S(2)=B” and “S(1)=B ⇒ S(2)=A”, then something in the algorithm has to treat the two cases differently. In UDT1 there is no such thing to break the symmetry, as far as I can tell, so it would treat them symmetrically and fail on the problem one way or another. Probabilities don’t seem to help since I don’t see why UDT1 would assign them different probabilities.
If you have an idea how the symmetry might be broken, can you explain it in more detail?
I think that Vladimir is right if he is saying that UDT1 can handle the problem in your Explicit Optimization of Global Strategy post.
With your forbearance, I’ll set up the problem in the notation of my write-up of UDT1.
There is only one world-program P in this problem. The world-program runs the UDT1 algorithm twice, feeding it input “1” on one run, and feeding it input “2“ on the other run. I’ll call these respective runs “Run1” and “Run2”.
The set of inputs for the UDT1 algorithm is X = {1, 2}.
The set of outputs for the UDT1 algorithm is Y = {A, B}.
There are four possible execution histories for P:
E, in which Run1 outputs A, Run2 outputs A, and each gets $0.
F, in which Run1 outputs A, Run2 outputs B, and each gets $10.
G, in which Run1 outputs B, Run2 outputs A, and each gets $10.
H, in which Run1 outputs B, Run2 outputs B, and each gets $0.
The utility function U for the UDT1 algorithm is defined as follows:
U(E) = 0.
U(F) = 20.
U(G) = 20.
U(H) = 0.
Now we want to choose a mathematical intuition function M so that Run1 and Run2 don’t give the same output. This mathematical intuition function does have to satisfy a couple of constraints:
For each choice of input X and output Y, the function M(X, Y, –) must be a normalized probability distribution on {E, F, G, H}.
The mathematical intuition needs to meet certain minimal standards to deserve its name. For example, we need to have M(1, B, E) = 0. The algorithm should know that P isn’t going to execute according to E if the algorithm returns B on input 1.
But these constraints still leave us with enough freedom in how we set up the mathematical intuition. In particular, we can set
M(1, A, F) = 1, and all other values of M(1, A, –) equal to zero;
M(1, B, H) = 1, and all other values of M(1, B, –) equal to zero;
M(2, A, E) = 1, and all other values of M(2, A, –) equal to zero;
M(2, B, F) = 1, and all other values of M(2, B, –) equal to zero.
Thus, in Run1, the algorithm computes that, if it outputs A, then execution history F would transpire, so the agent would get utility U(F) = 20. But if Run1 were to output B, then H would transpire, yielding utility U(H) = 0. Therefore, Run1 outputs A.
Similarly, Run2 computes that its outputting A would result in E, with utility 0, while outputting B would result in F, with utility 20. Therefore, Run2 outputs B.
Hence, execution history F transpires, and the algorithm reaps $20.
ETA: And, as a bonus, this mathematical intuition really makes sense. For, suppose that we held everything equal, except that we do some surgery so that Run1 outputs B. Since everything else is equal, Run2 is still going to output B. And that really would put us in history H, just as Run1 predicted when it evaluated M(1, B, H) = 1.
That’s cheating, you haven’t explained anything, you’ve just chosen the strategies and baptized them with mathematical intuition magically knowing them from the start.
I’m not sure what you mean by “cheating”. Wei Dai doesn’t claim to have explained where the mathematical intuition comes from, and I don’t either. The point is, I could build a UDT1 agent with that mathematical intuition, and the agent would behave correctly if it were to encounter the scenario that Wei describes. How I came up with that mathematical intuition is an open problem. But the agent that I build with it falls under the scope of UDT1. It is not necessary to pass to UDT1.1 to find such an agent.
I’m giving an existence proof: There exist UDT1 agents that perform correctly in Wei’s scenario. Furthermore, the mathematical intuition used by the agent that I exhibit evaluates counterfactuals in a reasonable way (see my edit to the comment).
There is a difference between not specifying the structure of an unknown phenomenon for which we still have no explanation, and assigning the phenomenon an arbitrary structure without giving an explanation. Even though you haven’t violated the formalism, mathematical intuition is not supposed to magically rationalize your (or mine) conclusions.
No it’s not, you’ve chosen it so that it “proves” what we believe to be a correct conclusion.
Since you can force the agent to pick any of the available actions by appropriately manipulating its mathematical intuition, you can “prove” that there is an agent that performs correctly in any given situation, so long as you can forge its mathematical intuition for every such situation. You can also “prove” that there is an agent that makes the worst possible choice, in exactly the same way.
This is kind of interesting. In Wei’s problem, I believe that I can force a winning mathematical intuition with just a few additional conditions, none of which assume that we know the correct conclusion. They seem like reasonable conditions to me, though maybe further reflection will reveal counterexamples.
Using my notation from this comment, we have to find right-hand values for the following 16 equations.
In addition to the conditions that I mentioned in that comment, I add the following,
Binary: Each probability distribution M(X, Y, –) is binary. That is, the mathematical intuition is certain about which execution history would follow from a given output on a given input.
Accuracy: The mathematical intuition, being certain, should be accurate. That is, if the agent expects a certain amount of utility when it produces its output, then it should really get that utility.
(Those both seem sorta plausible in such a simple problem.)
Counterfactual Accuracy: The mathematical intuition should behave well under counterfactual surgery, in the sense that I used in the edit to the comment linked above. More precisely, suppose that the algorithm outputs Yi on input Xi for all i. Suppose that, for a single fixed value of j, we surgically interfered with the algorithm’s execution to make it output Y’j instead of Yj on input Xj. Let E’ be the execution history that would result from this. Then we ought to have that M(Xj, Y’j, E’) = 1.
I suspect that the counterfactual accuracy condition needs to be replaced with something far more subtle to deal with other problems, even in the binary case.
Nonetheless, it seems interesting that, in this case, we don’t need to use any prior knowledge about which mathematical intuitions win.
I’ll proceed by filling in the array above entry-by-entry. We can fill in half the entries right away from the definitions of the execution histories:
Now we have to consider cases. Starting with the upper-left corner, the value of M(1, A, E) will be either 0 or 1.
Case I: Suppose that M(1, A, E) = 0. Normalization forces M(1, A, F) = 1:
Now, in the second row, the value of M(1, B, G) will be either 0 or 1.
Case I A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:
We have filled in enough entries to see that Run1 will output A. (Recall that U(F) = 20 and U(H) = 0.) Thus, if Run2 outputs A, then E will happen, not G. Similarly, if Run2 outputs B, then F will happen, not H. This allows us to complete the mathematical intuition function:
Under this mathematical intuition function, Run1 outputs A and Run2 outputs B. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.
Case I B: Suppose that M(1, B, G) = 1 in the second row. Normalization forces M(1, B, H) = 0:
In this case, Run1 will need to use a tie-breaker, because it predicts utility 20 from both outputs. There are two cases, one for each possible tie-breaker.
Case I B i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:
Hence, Run2 will output B. But this function fails the counterfactual accuracy condition. It predicts execution history G if Run1 were to output B, when in fact the execution history would be H. Thus we throw out this function.
Case I B ii: Suppose that the tie-breaker leads Run1 to output B. Then, similar to Case I B i, the resulting function fails the counterfactual accuracy test. (Run2 will output A. The resulting function predicts history F if Run1 were to output A, when in fact the history would be E.) Thus we throw out this function.
Therefore, in Case I, all functions either win or are ineligible.
Case II: Suppose that M(1, A, E) = 1. Normalization forces M(1, A, F) = 0, getting us to
Now, in the second row, the value of M(1, B, G) will be either 0 or 1.
Case II A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:
In this case, Run1 will need to use a tie-breaker, because it predicts utility 0 from both outputs. There are two cases, one for each possible tie-breaker.
Case II A i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:
Hence, Run2 will output B. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.
Case II A ii: Suppose that the tie-breaker leads Run1 to output B. If Run2 outputs A, then G will happen, not E. And if Run2 outputs B, then H will happen, not F. This gives us a complete mathematical intuition:
Hence, Run2 will output A. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.
Case II B: Suppose that M(1, B, G) = 1. Normalization forces M(1, B, H) = 0:
We have filled in enough entries to see that Run1 will output B. (Recall that U(E) = 0 and U(G) = 20.) Thus, if Run2 outputs A, then G will happen, not E. Similarly, if Run2 outputs B, then H will happen, not F. This allows us to complete the mathematical intuition function:
Under this mathematical intuition function, Run1 outputs B and Run2 outputs A. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.
Therefore, all cases lead to mathematical intuitions that either win or are ineligible.
ETA: And I just discovered that there’s a length-limit on comments.
Do you think your idea is applicable to multi-player games, which is ultimately what we’re after? (I don’t see how to do it myself.) Take a look at this post, which I originally wrote for another mailing list:
In http://lesswrong.com/lw/1s5/explicit_optimization_of_global_strategy_fixing_a/ I gave an example of a coordination game for two identical agents with the same (non-indexical) preferences and different inputs. The two agents had to choose different outputs in order to maximize their preferences, and I tried to explain why it seemed to me that they couldn’t do this by a logical correlation type reasoning alone.
A harder version of this problem involves two agents with different preferences, but are otherwise identical. For simplicity let’s assume they both care only about what happens in one particular world program (and therefore have no uncertainty about each other’s source code). This may not be the right way to frame the question, which is part of my confusion. But anyway, let the choices be C and D, and consider this payoff matrix (and suppose randomized strategies are not possible):
Here’s the standard PD matrix for comparison:
Nesov’s intuitions at http://lesswrong.com/lw/1vv/the_blackmail_equation/1qk9 make sense to me in this context. It seems that if these two agents are to achieve the 4,5 or 5,4 outcome, it has to be through some sort of “jumbles of wires” consideration, since there is no “principled” way to decide between the two, as far as I can tell. But what is that reasoning exactly? Does anyone understand acausal game theory (is this a good name?) well enough to walk me through how these two agents might arrive at one of the intuitively correct answers (and also show that the same type of reasoning gives a intuitively correct answer for PD)?
If my way of framing the question is not a good one, I’d like to see any kind of worked-out example in this vein.
It’s tempting to take a step back and consider the coordination game from the point of view of the agent before-observation, as it gives a nice equivalence between the copies, control over the consequences for both copies from a common source. This comes with a simple algorithm, an actual explanation. But as I suspect you intended to communicate in this comment, this is not very interesting, because it’s not a general case: in two-player games the other player is not your copy, and wasn’t one any time previous. But if we try to consider the actions of agent after-observation, of the two copies diverged, there seems to be no nice solution anymore.
It’s clear how the agent before-observations controls the copies after, and so how its decisions about the strategy of reacting to future observations control both copies, coordinate them. It’s far from clear how a copy that received one observation can control a copy that received the other observation. Parts control the whole, but not conversely. Yet the coordination problem could be posed about two agents that have nothing in common, and we’d expect there to be a solution to that as well. Thus I expect the coordination problem with two copies to have a local solution, apart from the solution of deciding in advance, as you describe in the post.
My comment to which you linked is clearly flawed in at least one respect: it assumes that to control a structure B with agent A, B has to be defined in terms of A. This is still an explicit control mindset, what I call acausal control, but it’s wrong, not as general as ambient control, where you are allowed to discover new dependencies, or UDT, where the discovery of new dependencies is implicit in mathematical intuition.
It’ll take much better understanding of theories of consequences, the process of their exploration, preference defined over them, to give specific examples, and I don’t expect these examples to be transparent (but maybe there is a simple proof that the decisions will be correct, that doesn’t point out the specific details of the decision-making process).
I think that there may have been a communication failure here. The comment that you’re replying to is specifically about that exact game, the one in your post Explicit Optimization of Global Strategy (Fixing a Bug in UDT1). The communication failure is my fault, because I had assumed that you had been following along with the conversation.
Here is the relevant context:
In this comment, I re-posed your game from the “explicit optimization” post in the notation of my write-up of UDT. In that comment, I gave an example of a mathematical intuition such that a UDT1 agent with that mathematical intuition would win the game.
In reply, Vladimir pointed out that the real problem is not to show that there exists a winning mathematical intuition. Rather, the problem is to give a general formal decision procedure that picks out a winning mathematical intuition. Cooking up a mathematical intuition that “proves” what I already believe to be the correct conclusion is “cheating”.
The purpose of the comment that you’re replying to was to answer Vladimir’s criticism. I show that, for this particular game (the one in your “explicit optimization” post), the winning mathematical intuitions are the only ones that meet certain reasonable criteria. The point is that these “reasonable criteria” do not involve any assumption about what the agent should do in the game.
Actually, I had been following your discussion with Nesov, but I’m not sure if your comment adequately answered his objection. So rather than commenting on that, I wanted to ask whether your approach of using “reasonable criteria” to narrow down mathematical intuitions can be generalized to deal with the harder problem of multi-player games. (If it can’t, then perhaps the discussion is moot.)
I see. I misunderstood the grandparent to be saying that your “explicit optimization” LW post had originally appeared on another mailing list, and I thought that you were directing me to it to see what I had to say about the game there. I was confused because this whole conversation already centered around that very game :).
(1) Which one of them will actually be given? (2) If there is no sense in which some of these “reasonable” conclusions are better than each other, why do you single them out, rather than mathematical intuitions expressing uncertainty about the outcomes that would express the lack of priority of some of these outcomes over others?
I don’t find the certainty of conclusions a reasonable assumption, in particular because, as you can see, you can’t unambiguously decide which of the conclusions is the right one, and so can’t the agent.
I claim to be giving, at best, a subset of “reasonable criteria” for mathematical intuition functions. Any UDT1-builder who uses a superset of these criteria, and who has enough decision criteria to decide which UDT1 agent to write, will write an agent who wins Wei’s game. In this case, it would suffice to have the criteria I mentioned plus a lexicographic tie-breaker (as in UDT1.1). I’m not optimistic that that will hold in general.
(I also wouldn’t be surprised to see an example showing that my “counterfactual accuracy” condition, as stated, rules out all winning UDT1 algorithms in some other game. I find it pretty unlikely that it suffices to deal with mathematical counterfactuals in such a simple way, even given the binary certainty and accuracy conditions.)
My point was only that the criteria above already suffice to narrow the field of options for the builder down to winning options. Hence, whatever superset of these criteria the builder uses, this superset doesn’t need to include any knowledge about which possible UDT1 agent would win.
I don’t follow. Are you suggesting that I could just as reasonably have made it a condition of any acceptable mathematical intuition function that M(1, A, E) = 0.5 ?
If I (the builder/writer) really couldn’t decide which mathematical intuition function to use, then the agent won’t come to exist in the first place. If I can’t choose among the two options that remain after I apply the described criteria, then I will be frozen in indecision, and no agent will get built or written. I take it that this is your point.
But if I do have enough additional criteria to decide (which in this case could be just a lexicographic tie-breaker), then I don’t see what is unreasonable about the “certainty of conclusions” assumption for this game.
You don’t pick the output of mathematical intuition in a particular case, mathematical intuition is a general algorithm that works based on world programs, outcomes, and your proposed decisions. It’s computationally intensive, its results are not specified in advance based on intuition, on the contrary the algorithm is what stands for intuition. With more resources, this algorithm will produce different probabilities, as it comes to understand the problem better. And you just pick the algorithm. What you can say about its outcome is a matter of understanding the requirements for such general algorithm, and predicting what it must therefore compute. Absolute certainty of the algorithm, for example, would imply that the algorithm managed to logically infer that the outcome would be so and so, and I don’t see how it’s possible to do that, given the problem statement. If it’s unclear how to infer what will happen, then mathematical intuition should be uncertain (but it can know something to tilt the balance one way a little bit, perhaps enough to decide the coordination problem!)
Okay, I understand you to be saying this:
There is a single ideal mathematical intuition, which, given a particular amount of resources, and a particular game, determines a unique function M mapping {inputs} x {outputs} x {execution histories} --> [0,1] for a UDT1 agent in that game. This ideal mathematical intuition (IMI) is defined by the very nature of logical or mathematical inference under computational limitation. So, in particular, it’s not something that you can talk about choosing using some arbitrary tie-breaker like lexicographic order.
Now, maybe the IMI requires that the function M be binary in some particular game with some particular amount of resources. Or maybe the IMI requires a non-binary function M for all amounts of computational resources in that game. Unless you can explain exactly why the IMI requires a binary function M for this particular game, you haven’t really made progress on the kinds of questions that we’re interested in.
Is that right?
More or less. Of course there is no point in going for a “single” mathematical intuition, but the criteria for choosing one shouldn’t be specific to a particular game. Mathematical intuition primarily works with the world program, trying to estimate how plausible it is that this world program will be equivalent to a given history definition, under the condition that the agent produces given output.
Let me see if I understand your point. Are you saying the following?
Some UDT1 agents perform correctly in the scenario, but some don’t. To not be “cheating”, you need to provide a formal decision theory (or at least make some substantial progress towards providing one) that explains why the agent’s builder would choose to build one of the UDT1 agents that do perform correctly.
Not quite. UDT is not an engineering problem, it’s a science problem. There is a mystery in what mathematical intuition is supposed to be, not just a question of tackling it on. The current understanding allows to instantiate incorrect UDT agents, but that’s a failure of understanding, not a problem with UDT agents. By studying the setting more, we’ll learn more about what mathematical intuition is, which will show some of the old designs incorrect.
You say “Not quite”, but this is still looking like what I tried to capture with my paraphrase. I was asking if you were saying the following:
A full solution that was a pure extension (not revision) of UDT1 [since I was trying to work within UDT1] would have to take the form of a formal DT such that a builder with that DT would have to choose to build a correct UDT1 agent.
Yeah, that works; though of course the revised decision theories will most certainly not be formal extensions of UDT1, they might give guidelines on designing good UDT1-compliant agents.
...and this leads to another bit of progress, on the structure of mathematical intuition. The key exercise is to try to explain explicit optimization of strategies, as described in your post, in terms of local ambient control that determines only the action. The strategies then become the assumptions of the proof search that tries to guess what will be the global outcome. The solution comes from two agents being equivalent without the observation, thus “input” is automatically extracted from the agents’ code (if we assume the input to be just part of the code by the time decision problem gets stated). I’ll describe it in more detail later, if this pans out.
I think that Vladimir is right if he is saying that UTD1 can handle the problem in your Explicit Optimization of Global Strategy post.
With your forbearance, I’ll set up the problem in the notation of my write-up of UTD1.
There is only one world-program P in this problem. The world-program runs the UTD algorithm twice, feeding it input “1” on one run, and feeding it input “2“ on the other run. I’ll call these respective runs “Run1” and “Run2”.
The set of inputs for the UTD1 algorithm is X = {1, 2}.
The set of outputs for the UTD1 algorithm is Y = {A, B}.
There are four possible execution histories for P:
E, in which Run1 outputs A, Run2 outputs A, and each gets $0.
F, in which Run1 outputs A, Run2 outputs B, and each gets $10.
G, in which Run1 outputs B, Run2 outputs A, and each gets $10.
H, in which Run1 outputs B, Run2 outputs B, and each gets $0.
The utility function U for the UTD1 algorithm is defined as follows:
U(E) = 0.
U(F) = 20.
U(G) = 20.
U(H) = 0.
Now we want to choose a mathematical intuition function M so that Run1 and Run2 don’t give the same output. This mathematical intuition function does have to satisfy a couple of constraints:
For each choice of input X and output Y, the function M(X, Y, –) must be a normalized probability distribution on {E, F, G, H}.
The mathematical intuition needs to meet certain minimal standards to deserve its name. For example, we need to have M(1, B, E) = 0. The algorithm should know that P isn’t going to execute according to E if the algorithm returns B on input 1.
But these constraints still leave us with enough freedom in how we set up the mathematical intuition. In particular, we can set
M(1, A, F) = 1, and all other values of M(1, A, –) equal to zero;
M(1, B, H) = 1, and all other values of M(1, B, –) equal to zero;
M(2, A, H) = 1, and all other values of M(2, A, –) equal to zero;
M(2, B, F) = 1, and all other values of M(2, B, –) equal to zero.
Thus, in Run1, the algorithm computes that, if it outputs A, then execution history F would transpire, so the agent would get utility U(F) = 20. But if Run1 outputs B, then H would transpire, yielding utility U(H) = 0. Therefore, Run1 outputs A.
Similarly, Run2 computes that its outputting A would result in H, with utility 0, while outputting B would result in F, with utility 20. Therefore, Run2 outputs B.
Hence, execution history F transpires, and the algorithm reaps $20.
The symmetry is broken by “1” being different from “2″. The probabilities express logical uncertainty, and so essentially depend on what happens to be provable given finite resources and epistemic state of the agent, for which implementation detail matters. The asymmetry is thus hidden in mathematical intuition, and is not visible in the parts of UDT explicitly described.
...but on the other hand, you don’t need the “input” at all, if decision-making is about figuring out the strategy. You can just have a strategy that produces the output, with no explicit input. The history of input can remain implicit in the agent’s program, which is available anyway.
Good; that was my understanding.
Yes, that works too. On second thought, extracting output in this exact manner, while pushing everything else to the “input” allows to pose a problem specifically about the output in this particular situation, so as to optimize the activity for figuring out this output, rather than the whole strategy, of which right now you only need this aspect and no more.
Edit: Though, you don’t need “input” to hold the rest of the strategy.
I was having trouble understanding what strategy couldn’t be captured by a function X → Y. After all, what could possibly determine the output of an algorithm other than its source code and whatever input it remembers getting on that particular run? Just to be clear, do you now agree that every strategy is captured by some function f: X → Y mapping inputs to outputs?
One potential problem is that there are infinitely many input-output mappings. The agent can’t assume a bound on the memory it will have, so it can’t assume a bound on the lengths of inputs X that it will someday need to plug into an input-output mapping f.
Unlike the case where there are potentially infinitely many programs P1, P2, . . ., it’s not clear to me that it’s enough to wrap up an infinte set I of input-output mappings into some finite program that generates them. This is because the UDT1.1 agent needs to compute a sum for every element of I. So, if the set I is infinite, the number of sums to be computed will be infinite. Having a finite description of I won’t help here, at least not with a brute-force UDT1.1 algorithm.
Any infinite thing in any given problem statement is already presented to you with a finite description. All you have to do is transform that finite description of an infinite object so as to get a finite description of a solution of your problem posed about the infinite object.
Right. I agree.
But, to make Wei’s formal description of UDT1.1 work, there is a difference between
dealing with a finite description of an infinite execution history Ei and
dealing with a finite description of an infinite set I of input-output maps.
The difference is this: The execution histories only get fed into the utility function U and the mathematical intuition function (which I denote by M). These two functions are taken to be black boxes in Wei’s description of UDT1.1. His purpose is not to explain how these functions work, so he isn’t responsible for explaining how they deal with finite descriptions of infinite things. Therefore, the potential infinitude of the execution histories is not a problem for what he was trying to do.
In contrast, the part of the algorithm that he describes explicitly does require computing an expected utility for every input-output map and then selecting the input-output map that yielded the largest expected utility. Thus, if I is infinite, the brute-force version of UDT1.1 requires the agent to find a maximum from among infinitely many expected utilities. That means that the brute-force version just doesn’t work in this case. Merely saying that you have a finite description of I is not enough to say in general how you are finding the maximum from among infinitely many expected utilities. In fact, it seems possible that there may be no maximum.
Actually, in both UDT1 and UDT1.1, there is a similar issue with the possibility of having infinitely many possible execution-history sequences . In both versions of UDT, you have to perform a sum over all such sequences. Even if you have a finite description of the set E of such sequences, a complete description of UDT still needs to explain how you are performing the sum over the infinitely many elements of the set. In particular, it’s not obvious that this sum is always well-defined.
...but the action could be a natural number, no? It’s entirely OK if there is no maximum—the available computational resources then limit how good a strategy the agent manages to implement (“Define as big a natural number as you can!”). The “algorithm” is descriptive, it’s really a definition of optimality of a decision, not specification of how this decision is to be computed. You can sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.
Okay. I didn’t know that the specification of how to compute was explicitly understood to be incomplete in this way. Of course, the description could only be improved by being more specific about just when you can “sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.”
Yes, that works too. If you can make “inputs and outputs” into anything, sure you can represent any strategy with them. I’m not sure it makes sense to separate them in that case though—just make the whole strategy as the sole product of decision-making; interpretation of this product is a separate question, and doesn’t necessarily need to be addressed at all.