That is what I meant. My problem is that I don’t see how to express that strategy correctly with the vocabulary in this post, and I don’t understand the notation in those posts well enough to tell if they express that idea correctly.
For example, I’m not comfortable with how Eliezer framed it. If my strategy is “cooperate iff they play cooperate iff I play cooperate,” then won’t I never cooperate with myself, because I don’t have the strategy “cooperate iff they play cooperate”?
(I do think that general idea might be formalizable, but you need models of outcome dependencies that can fit inside themselves. Which maybe the notation I don’t understand is powerful enough to do.)
“Cooperate iff they play cooperate iff I play cooperate” is imprecise. Your complaint seems to stem from interpreting it to pay too much attention to the details of the algorithms.* One reason to use counterfactuals is to avoid this. One interpretation of the slogan is “A()=C iff A()=B().” This has symmetry, so the agent cooperates with itself. However, it this is vulnerable to self-fulfilling prophecy against the cooperation rock (ie, it can choose either option) and its actions against the defection rock depend on implementation details. It is better to interpret it as cooperate if both counterfactuals “A()=C ⇒ B()=C” and “A()=D ⇒ B()=D” are true. Again, this has symmetry to allow it to cooperate with itself, but it defects against rocks. In fact, an agent like from this post pretty much derives this strategy and should cooperate with an agent that comes pre-programmed with this strategy. The problem is whether the counterfactuals are provable. Here is a thread on the topic.
* Another example of agents paying too much attentions to implementation details is that we usually want to exclude agents that punish other agents for their choice of decision theory, rather than their decisions.
Aren’t the implementation details sort of the whole point? I mean, suppose you code a version of cooperate iff “A()=C ⇒ B()=C and A()=D ⇒ B()=D” and feed it a copy of itself. Can it actually recognize that it should cooperate? Or does it think that the premise “A()=C” implies a cooperation rock, which it defects against?
That is, if the algorithm is “create two copies of your opponent, feed them different inputs, and combine their output by this formula,” when you play that against itself it looks like you get 2^t copies, where t is how long you ran it before it melted down.
I see how one could code a cliquebot that only cooperated with someone with its source code, and no one else. I don’t see how one could formalize this general idea- of trying D,C then C,C then D,D (and never playing C,D)- in a way that halts and recognizes different implementations of the same idea. The problem is that the propositions don’t seem to cleanly differentiate between code outcomes, like “A()=C”, and code, like “A()=C ⇒ B()=C”.
No, the algorithm is definitely not to feed inputs in simulation. The algorithm is to prove things about the interaction of the two algorithms. For more detail, see the post that is the parent of the comment I linked.
“A()=C ⇒ B()=C” is a statement of propositional logic, not code. It is material implication, not conditional execution.
The algorithm is to prove things about the interaction of the two algorithms.
This is possibly just ignorance on my part, but as far as I can tell what that post does is take the difficult part and put it in a black box, which leaves me no wiser about how this works.
“A()=C ⇒ B()=C” is a statement of propositional logic, not code. It is material implication, not conditional execution.
Code was the wrong phrase for me to use- I should have gone with my earlier “outcome dependency,” which sounds like your “material implication.”
I think I’ve found my largest stumbling block: when I see “B()=C”, it matters to me what’s in the (), i.e. the inputs to B. But if B is a copy of this algorithm, it needs more information than “A()=C” in order to generate an output. Or, perhaps more precisely, it doesn’t care about the “=C” part and makes its decision solely on the “A()” part. But then I can’t really feed it “A()=C” and “A()=D” and expect to get different results.
That is, counterfactuals about my algorithm outputs are meaningless if the opponent’s algorithm judges me based on my algorithm, not my outputs. Right? Or am I missing something?
The input to A and B is always the same, namely the source code of A and B and the universe. In fact, often we consider all that to be hard coded and there to be no input, hence the notation A().
Could I rephrase your last paragraph as “counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output”? Yes, this is a serious problem in making sense of strategies like “cooperate iff the opponent cooperates iff I do.” Either my opponent cooperates or not. That is a fact about math. How could it be different?
And yet, in the post I linked to, Cousin It managed to set up a situation in which there are provable theorems about what happens in the counterfactuals. Still, that does not answer the question of the meaning of “logical counterfactuals.” They seems to match the ordinary language use of counterfactuals, as in the above strategy.
If the “black box” was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn’t look very black to me, but it is doing a lot of work.
I don’t think you should be judging material implication by its name. Perhaps by “outcome dependency” you mean “logical counterfactual.” Then, yes, counterfactuals are related to such implications, by definition. The arrow is just logical implication. There are certain statements of logic that we interpret as counterfactuals.
Could I rephrase your last paragraph as “counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output”?
That looks right to me.
Either my opponent cooperates or not. That is a fact about math. How could it be different?
What do you mean by the first sentence? That the opponent’s move is one of {C, D}? Or that we are either in the world where C is played with probability 1, or the world where D is played with probability 1?
It seems to me that pernicious algorithms could cause cycling if you aren’t careful, and it’s not specified what happens if the program fails to output an order. If you tack on some more assumptions, I’m comfortable with saying that the opponent must play either C or D, but the second seems outlawed by mixed strategies being possible.
If the “black box” was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn’t look very black to me, but it is doing a lot of work.
Peano arithmetic is beyond my formal education. I think I understand most of what it’s doing but the motivation of why one would turn to it is not yet clear.
I don’t think you should be judging material implication by its name.
So, “A()=C ⇒ B()=C” means “it cannot be the case that I cooperate and they defect” and “A()=D ⇒ B()=D” means “it cannot be the case that I defect and they cooperate,” and if both of those are true, then the algorithm cooperates. Right? (The first is to not be taken advantage of, and the second is to take advantage of those who will let you.)
What I’m not seeing is how you take two copies of that, evaluate them, and get “C,C” without jumping to the end by wishful thinking. It looks to me like the counterfactuals multiply without end, because every proof requires four subproofs that are just as complicated. (This is where it looks like the provability oracle is being called in, but I would rather avoid that if possible.)
Yes, there are complications like programs failing to terminate or using random numbers. For simplicity, we usually assume no random numbers. We could force termination by putting a time bound and some default action, but it’s usually better to distinguish the such program failure from other actions. In any event, at least one of the possibilities A()=C and A()=D is false, and thus thus claims like “A()=C ⇒ B()=C” are confusing.
If A and B have the same source code, then it is provable that they have the same source code and thus it is provable that A()=B(), and thus provable that A()=C ⇒ B()=C. That is what A and B do, is search for that proof. They do not simulate each other, so there is no multiplication of counterfactuals. Identical source code is a rather limited case, but it is a start. Loeb’s theorem shows how to prove things in some other cases.
That is what A and B do, is search for that proof.
Ok. Where do they look for it, and how will they know if they’ve found it?
I don’t like, but will accept for now, the “evaluate every possible proof less than X characters” method of finding proofs.
But I don’t see how you determine those proofs are true or false without simulating A() and B(), especially if B() isn’t a copy of A(), but some complicated algorithm that might or might not cash out as equivalent to A().
(Where I’m going with this: if this idea requires magic to do its basic operation, then I am uncomfortable with using this idea for anything.)
Very often in this conversation, I think we’re using words to mean wildly different things; such as “proof.” Do you need to simulate bubble sort to prove that it takes quadratic time?
Very often in this conversation, I think we’re using words to mean wildly different things; such as “proof.”
Very possibly! “Simulate” is another word that we may not be using the same way.
Do you need to simulate bubble sort to prove that it takes quadratic time?
I would say ‘yes, if’, conditioned on what you mean by simulate. It doesn’t look like you can prove bubble sort takes quadratic time by running it a bunch of times, though you could certainly suggest that it does. But I also don’t see how to prove bubble sort takes quadratic time without intense understanding of the algorithm (and it may be that intense understanding requires simulation).
If you’re given the algorithm to begin with, this doesn’t seem like a big issue. But if I give you some code, which is an implementation of bubble sort but not labeled as such, then proving it takes quadratic time is more involved, and may (will?) require running the code. Right?
It is certainly true that some algorithms are easier to prove things about than others. Let’s set aside running time and just talk about correctness of sorting. There are algorithms that sort but are undecidable; or for which the proof is very long. But the proof that bubbling correctly sorts is short, even intercal bubbling. Finding a short proof quickly may be hard, but exhaustively searching for it is simple to implement and simple to prove things about. And once you’ve found it, you just verify it. You don’t have to simulate the code to verify the proof. You certainly don’t have an infinite regress.
If you like, you can say that by invoking exhaustive searching, we are assuming P=NP and assuming away the problem of AI. That is intentional. The question of decision theory is an independent problem and it is right to separate them this way. But exhaustive search is a concrete computable function, not “magic.” Also, sometimes you really are constrained by space, not time and so exhaustive search is a realistic option.
Putting in a proof generator makes it easy to make an algorithm undecidable. If we make these deterministic algorithms play Rock-Paper-Scissors, they will fail to terminate or otherwise break down. Many similar algorithms with resource constraints will be undecidable to other algorithms with the same resource constraints. But proofs are not particularly hard to reason about and do not force undecidability / infinite regress. In particular, when the game is PD and the algorithms are trying to cooperate to be understood and reach (C,C) rather than (D,D). Then (sometimes) there are short proofs of the relevant things and they do manage to cooperate.
And once you’ve found it, you just verify it. You don’t have to simulate the code to verify the proof. You certainly don’t have an infinite regress.
I still don’t see why this is the case, but I think I understand everything else about it, which suggests I should let this sit for a while and it’ll become clear when it’s relevant. (The proof verification is where it looked like magic was happening, not finding the interesting proofs.)
If you like, you can say that by invoking exhaustive searching, we are assuming P=NP and assuming away the problem of AI. That is intentional. The question of decision theory is an independent problem and it is right to separate them this way.
I think I disagree that it’s right to separate them this way, but this may just be a matter of taste / goals. I am far more interested in prescriptive decision theory than normative decision theory (bounded rationality vs. unbounded rationality).
I considered saying that proofs are verifiable by definition. I mean a proof in formal logic. The point of formal logic is that it is computable (even fast) to verify it. The point, which matches the common usage, is that it is convincing. You can’t prove anything about arbitrary code; these algorithms are not guaranteed to understand the opponent. But it’s interesting that sometimes they do succeed in proving things about their opponents.
If you’re given the algorithm to begin with, this doesn’t seem like a big issue. But if I give you some code, which is an implementation of bubble sort but not labeled as such, then proving it takes quadratic time is more involved, and may (will?) require running the code. Right?
I’d say it’s easier to prove that a naïve bubble sort runs in quadratic time, than it is to prove that when it terminates, the list is actually sorted. It’s obvious by inspection that it contains a nested loop over the same data structure.
Well, that’s described in these posts, or do you mean something else?
That is what I meant. My problem is that I don’t see how to express that strategy correctly with the vocabulary in this post, and I don’t understand the notation in those posts well enough to tell if they express that idea correctly.
For example, I’m not comfortable with how Eliezer framed it. If my strategy is “cooperate iff they play cooperate iff I play cooperate,” then won’t I never cooperate with myself, because I don’t have the strategy “cooperate iff they play cooperate”?
(I do think that general idea might be formalizable, but you need models of outcome dependencies that can fit inside themselves. Which maybe the notation I don’t understand is powerful enough to do.)
“Cooperate iff they play cooperate iff I play cooperate” is imprecise. Your complaint seems to stem from interpreting it to pay too much attention to the details of the algorithms.* One reason to use counterfactuals is to avoid this. One interpretation of the slogan is “A()=C iff A()=B().” This has symmetry, so the agent cooperates with itself. However, it this is vulnerable to self-fulfilling prophecy against the cooperation rock (ie, it can choose either option) and its actions against the defection rock depend on implementation details. It is better to interpret it as cooperate if both counterfactuals “A()=C ⇒ B()=C” and “A()=D ⇒ B()=D” are true. Again, this has symmetry to allow it to cooperate with itself, but it defects against rocks. In fact, an agent like from this post pretty much derives this strategy and should cooperate with an agent that comes pre-programmed with this strategy. The problem is whether the counterfactuals are provable. Here is a thread on the topic.
* Another example of agents paying too much attentions to implementation details is that we usually want to exclude agents that punish other agents for their choice of decision theory, rather than their decisions.
Aren’t the implementation details sort of the whole point? I mean, suppose you code a version of cooperate iff “A()=C ⇒ B()=C and A()=D ⇒ B()=D” and feed it a copy of itself. Can it actually recognize that it should cooperate? Or does it think that the premise “A()=C” implies a cooperation rock, which it defects against?
That is, if the algorithm is “create two copies of your opponent, feed them different inputs, and combine their output by this formula,” when you play that against itself it looks like you get 2^t copies, where t is how long you ran it before it melted down.
I see how one could code a cliquebot that only cooperated with someone with its source code, and no one else. I don’t see how one could formalize this general idea- of trying D,C then C,C then D,D (and never playing C,D)- in a way that halts and recognizes different implementations of the same idea. The problem is that the propositions don’t seem to cleanly differentiate between code outcomes, like “A()=C”, and code, like “A()=C ⇒ B()=C”.
No, the algorithm is definitely not to feed inputs in simulation. The algorithm is to prove things about the interaction of the two algorithms. For more detail, see the post that is the parent of the comment I linked.
“A()=C ⇒ B()=C” is a statement of propositional logic, not code. It is material implication, not conditional execution.
This is possibly just ignorance on my part, but as far as I can tell what that post does is take the difficult part and put it in a black box, which leaves me no wiser about how this works.
Code was the wrong phrase for me to use- I should have gone with my earlier “outcome dependency,” which sounds like your “material implication.”
I think I’ve found my largest stumbling block: when I see “B()=C”, it matters to me what’s in the (), i.e. the inputs to B. But if B is a copy of this algorithm, it needs more information than “A()=C” in order to generate an output. Or, perhaps more precisely, it doesn’t care about the “=C” part and makes its decision solely on the “A()” part. But then I can’t really feed it “A()=C” and “A()=D” and expect to get different results.
That is, counterfactuals about my algorithm outputs are meaningless if the opponent’s algorithm judges me based on my algorithm, not my outputs. Right? Or am I missing something?
The input to A and B is always the same, namely the source code of A and B and the universe. In fact, often we consider all that to be hard coded and there to be no input, hence the notation A().
Could I rephrase your last paragraph as “counterfactuals about the output of my algorithm are meaningless since my algorithm only has one output”? Yes, this is a serious problem in making sense of strategies like “cooperate iff the opponent cooperates iff I do.” Either my opponent cooperates or not. That is a fact about math. How could it be different?
And yet, in the post I linked to, Cousin It managed to set up a situation in which there are provable theorems about what happens in the counterfactuals. Still, that does not answer the question of the meaning of “logical counterfactuals.” They seems to match the ordinary language use of counterfactuals, as in the above strategy.
If the “black box” was the provability oracle, then consider searching for proofs of bounded lengths, which is computable. If the black box was formal logic (Peano arithmetic), well, it doesn’t look very black to me, but it is doing a lot of work.
I don’t think you should be judging material implication by its name. Perhaps by “outcome dependency” you mean “logical counterfactual.” Then, yes, counterfactuals are related to such implications, by definition. The arrow is just logical implication. There are certain statements of logic that we interpret as counterfactuals.
That looks right to me.
What do you mean by the first sentence? That the opponent’s move is one of {C, D}? Or that we are either in the world where C is played with probability 1, or the world where D is played with probability 1?
It seems to me that pernicious algorithms could cause cycling if you aren’t careful, and it’s not specified what happens if the program fails to output an order. If you tack on some more assumptions, I’m comfortable with saying that the opponent must play either C or D, but the second seems outlawed by mixed strategies being possible.
Peano arithmetic is beyond my formal education. I think I understand most of what it’s doing but the motivation of why one would turn to it is not yet clear.
So, “A()=C ⇒ B()=C” means “it cannot be the case that I cooperate and they defect” and “A()=D ⇒ B()=D” means “it cannot be the case that I defect and they cooperate,” and if both of those are true, then the algorithm cooperates. Right? (The first is to not be taken advantage of, and the second is to take advantage of those who will let you.)
What I’m not seeing is how you take two copies of that, evaluate them, and get “C,C” without jumping to the end by wishful thinking. It looks to me like the counterfactuals multiply without end, because every proof requires four subproofs that are just as complicated. (This is where it looks like the provability oracle is being called in, but I would rather avoid that if possible.)
Yes, there are complications like programs failing to terminate or using random numbers. For simplicity, we usually assume no random numbers. We could force termination by putting a time bound and some default action, but it’s usually better to distinguish the such program failure from other actions. In any event, at least one of the possibilities A()=C and A()=D is false, and thus thus claims like “A()=C ⇒ B()=C” are confusing.
If A and B have the same source code, then it is provable that they have the same source code and thus it is provable that A()=B(), and thus provable that A()=C ⇒ B()=C. That is what A and B do, is search for that proof. They do not simulate each other, so there is no multiplication of counterfactuals. Identical source code is a rather limited case, but it is a start. Loeb’s theorem shows how to prove things in some other cases.
Ok. Where do they look for it, and how will they know if they’ve found it?
I don’t like, but will accept for now, the “evaluate every possible proof less than X characters” method of finding proofs.
But I don’t see how you determine those proofs are true or false without simulating A() and B(), especially if B() isn’t a copy of A(), but some complicated algorithm that might or might not cash out as equivalent to A().
(Where I’m going with this: if this idea requires magic to do its basic operation, then I am uncomfortable with using this idea for anything.)
Very often in this conversation, I think we’re using words to mean wildly different things; such as “proof.” Do you need to simulate bubble sort to prove that it takes quadratic time?
Very possibly! “Simulate” is another word that we may not be using the same way.
I would say ‘yes, if’, conditioned on what you mean by simulate. It doesn’t look like you can prove bubble sort takes quadratic time by running it a bunch of times, though you could certainly suggest that it does. But I also don’t see how to prove bubble sort takes quadratic time without intense understanding of the algorithm (and it may be that intense understanding requires simulation).
If you’re given the algorithm to begin with, this doesn’t seem like a big issue. But if I give you some code, which is an implementation of bubble sort but not labeled as such, then proving it takes quadratic time is more involved, and may (will?) require running the code. Right?
It is certainly true that some algorithms are easier to prove things about than others. Let’s set aside running time and just talk about correctness of sorting. There are algorithms that sort but are undecidable; or for which the proof is very long. But the proof that bubbling correctly sorts is short, even intercal bubbling. Finding a short proof quickly may be hard, but exhaustively searching for it is simple to implement and simple to prove things about. And once you’ve found it, you just verify it. You don’t have to simulate the code to verify the proof. You certainly don’t have an infinite regress.
If you like, you can say that by invoking exhaustive searching, we are assuming P=NP and assuming away the problem of AI. That is intentional. The question of decision theory is an independent problem and it is right to separate them this way. But exhaustive search is a concrete computable function, not “magic.” Also, sometimes you really are constrained by space, not time and so exhaustive search is a realistic option.
Putting in a proof generator makes it easy to make an algorithm undecidable. If we make these deterministic algorithms play Rock-Paper-Scissors, they will fail to terminate or otherwise break down. Many similar algorithms with resource constraints will be undecidable to other algorithms with the same resource constraints. But proofs are not particularly hard to reason about and do not force undecidability / infinite regress. In particular, when the game is PD and the algorithms are trying to cooperate to be understood and reach (C,C) rather than (D,D). Then (sometimes) there are short proofs of the relevant things and they do manage to cooperate.
I still don’t see why this is the case, but I think I understand everything else about it, which suggests I should let this sit for a while and it’ll become clear when it’s relevant. (The proof verification is where it looked like magic was happening, not finding the interesting proofs.)
I think I disagree that it’s right to separate them this way, but this may just be a matter of taste / goals. I am far more interested in prescriptive decision theory than normative decision theory (bounded rationality vs. unbounded rationality).
I considered saying that proofs are verifiable by definition. I mean a proof in formal logic. The point of formal logic is that it is computable (even fast) to verify it. The point, which matches the common usage, is that it is convincing. You can’t prove anything about arbitrary code; these algorithms are not guaranteed to understand the opponent. But it’s interesting that sometimes they do succeed in proving things about their opponents.
I’d say it’s easier to prove that a naïve bubble sort runs in quadratic time, than it is to prove that when it terminates, the list is actually sorted. It’s obvious by inspection that it contains a nested loop over the same data structure.