But the bigger problem is that we can’t say exactly what makes a “silly” counterfactual different from a “serious” one.
Would it be naive to hope for a criterion that roughly says: “A conditional P ⇒ Q is silly iff the ‘most economical’ way of proving it is to deduce it from ¬P or else from Q.” Something like: “there exists a proof of ¬P or of Q which is strictly shorter than the shortest proof of P ⇒ Q”?
A totally different approach starts with the fact that your ‘lemma 1’ could be proved without knowing anything about A. Perhaps this could be deemed a sufficient condition for a counterfactual to be serious. But I guess it’s not a necessary condition?
Both these approaches have been proposed on the workshop list. Good job figuring them out so quickly!
Would it be naive to hope for a criterion that roughly says: “A conditional P ⇒ Q is silly iff the ‘most economical’ way of proving it is to deduce it from ¬P or else from Q.” Something like: “there exists a proof of ¬P or of Q which is strictly shorter than the shortest proof of P ⇒ Q”?
I can make such a criterion fall into nasty Loebian traps by maliciously tweaking the formal system to make some proofs longer than others. That means any proof of correct behavior (like one-boxing) must rely on the intimate details of the proof enumeration order, but we have no idea how to talk formally about such things.
A totally different approach starts with the fact that your ‘lemma 1’ could be proved without knowing anything about A. Perhaps this could be deemed a sufficient condition for a counterfactual to be serious. But I guess it’s not a necessary condition?
A doesn’t necessarily get the code of U neatly factored into A and everything else. The agent has to find copies of itself in the universe, it doesn’t get told the positions of all copies explicitly. Note that if we replace the U in the post with some other U’ that can be proven equivalent to U by S, then A can notice that equivalence, unscramble the code of U’ into U, and win.
Something related to the ‘most economical proof’ thought is the following: suppose both P and Q depend on some variable x, and the conditional P(x) ⇒ Q(x) is true for all values of x. Then it is silly if either ¬P(x) holds for all x, or if Q(x) holds for all x.
The tricky thing would be to introduce x in a meaningful way. In the case where we want to prove conditionals of the form “agent does X ⇒ world does Y”, we want to avoid ending up with a conditional that’s true because we prove “agent doesn’t do X” (I think we’re actually alright with an unconditional proof of “world does Y”). So we want to make the agent’s actions somehow depend on whatever x is. For instance, if we gave the agent a very very low probability of picking a random action (and let x be the source of randomness), would that do anything?
That solution has also been discussed on the list. Unfortunately, for it to work as intended, the different copies of the agent must use correlated random variables, which is not very realistic in problems like the Prisoner’s Dilemma.
Right, but I’m not suggesting necessarily that the agent actually make decisions randomly, merely that we use this as an aid in proofs. For instance, suppose we assume that all agents (including A itself) have a probability of p (independently) of being replaced by a different algorithm, which just acts randomly. Then, we can try to prove that Pr[world does Y|agent does X] in this probability space approaches 1 as p approaches 0 (colloquially, “agent does X ⇒ world does Y with high probability”).
Then if we end up accidentally proving silly things, then we merely end up proving that Pr[agent does X] goes to 0 (because it doesn’t happen except when actions are chosen at random). However, the conditional probability shouldn’t be affected.
For instance, take Newcomb’s problem. The terrible thing in the usual case is that we stumble on a proof that A()=2, so then it follows that A()=1 ⇒ U() = 0, which scares us so we output A()=2.
Here that can’t happen. Suppose we stumbled on a proof that A()=2 w.h.p. Then when we calculate the conditional probability Pr[U() = 0 | A() = 1], we calculate the probability Pr[U() = 0 and A() = 1] and end up with p^2/4 -- if even one of U() and A() ends up what it’s supposed to be, this doesn’t happen. So the conditional probability is p/2, which still goes to 0. On the other hand, when we calculate Pr[U() = 1000000 | A() = 1] = Pr[U() = 1M and A() = 1] / Pr[A() = 1], the numerator is p^2/4 + p/2(1-p) -- in the case where A went haywire but U didn’t, this is what happens. So this conditional probability is 1 - p/2, which goes to 1. We believe this over the other probability, so we actually output A()=1 -- and this shouldn’t happen.
I tried to parse your comment but couldn’t. What are expressions like Pr[U()=0 | A()=1] supposed to mean? What event is “A()=1”? Newcomb’s problem makes two calls to A() which affect the resulting utility differently. If the two instances of A always get “randomly replaced” together or not at all, then I agree that it solves Newcomb’s problem, but I think the assumption is too strong. On the other hand, if they get “randomly replaced” independently, I think you need to give a more careful argument, and also I think it won’t work :-(
This is why I defined the probability space to be that, instead of A sometimes doing something random, there’s a low probability that A is replaced with a different agent that always does something random. I don’t see why the assumption is too strong. We can define the probability space any way we like, since we don’t actually have to implement it, all we need is to be able to prove things about the probability space.
Now that I say it carefully, it’s somewhat reminiscent of the problem you’re always objecting to: that we can’t separate A from the rest of the universe. But if we can pick out the things that are “agents”—basically, if we pick out anything that’s not immediately predictable, and I think that can be made rigorous—then we can make this definition.
Oh, but in the actual Newcomb’s problem, the two calls to A are actually calls to different but identical routines, aren’t they? Are they? One of them is A’s actual thought process, the other is Omega’s absolutely perfect prediction of A’s thought process. But on the other hand, none of the proofs go through if you can’t verify that the two copies are the same, which is equivalent to making them the same subroutine.
Yeah, the problem in the non-oracle setting is about separating A from the rest of the universe. I feel that any good solution to this problem should be “crisp” rather than delegated to A’s fuzzy reasoning abilities, because at this point in our research we’re not yet trying to make a good optimizer, but trying to define mathematically what optimization means in the first place.
Would it be naive to hope for a criterion that roughly says: “A conditional P ⇒ Q is silly iff the ‘most economical’ way of proving it is to deduce it from ¬P or else from Q.” Something like: “there exists a proof of ¬P or of Q which is strictly shorter than the shortest proof of P ⇒ Q”?
A totally different approach starts with the fact that your ‘lemma 1’ could be proved without knowing anything about A. Perhaps this could be deemed a sufficient condition for a counterfactual to be serious. But I guess it’s not a necessary condition?
Both these approaches have been proposed on the workshop list. Good job figuring them out so quickly!
I can make such a criterion fall into nasty Loebian traps by maliciously tweaking the formal system to make some proofs longer than others. That means any proof of correct behavior (like one-boxing) must rely on the intimate details of the proof enumeration order, but we have no idea how to talk formally about such things.
A doesn’t necessarily get the code of U neatly factored into A and everything else. The agent has to find copies of itself in the universe, it doesn’t get told the positions of all copies explicitly. Note that if we replace the U in the post with some other U’ that can be proven equivalent to U by S, then A can notice that equivalence, unscramble the code of U’ into U, and win.
Something related to the ‘most economical proof’ thought is the following: suppose both P and Q depend on some variable x, and the conditional P(x) ⇒ Q(x) is true for all values of x. Then it is silly if either ¬P(x) holds for all x, or if Q(x) holds for all x.
The tricky thing would be to introduce x in a meaningful way. In the case where we want to prove conditionals of the form “agent does X ⇒ world does Y”, we want to avoid ending up with a conditional that’s true because we prove “agent doesn’t do X” (I think we’re actually alright with an unconditional proof of “world does Y”). So we want to make the agent’s actions somehow depend on whatever x is. For instance, if we gave the agent a very very low probability of picking a random action (and let x be the source of randomness), would that do anything?
That solution has also been discussed on the list. Unfortunately, for it to work as intended, the different copies of the agent must use correlated random variables, which is not very realistic in problems like the Prisoner’s Dilemma.
Right, but I’m not suggesting necessarily that the agent actually make decisions randomly, merely that we use this as an aid in proofs. For instance, suppose we assume that all agents (including A itself) have a probability of p (independently) of being replaced by a different algorithm, which just acts randomly. Then, we can try to prove that Pr[world does Y|agent does X] in this probability space approaches 1 as p approaches 0 (colloquially, “agent does X ⇒ world does Y with high probability”).
Then if we end up accidentally proving silly things, then we merely end up proving that Pr[agent does X] goes to 0 (because it doesn’t happen except when actions are chosen at random). However, the conditional probability shouldn’t be affected.
For instance, take Newcomb’s problem. The terrible thing in the usual case is that we stumble on a proof that A()=2, so then it follows that A()=1 ⇒ U() = 0, which scares us so we output A()=2.
Here that can’t happen. Suppose we stumbled on a proof that A()=2 w.h.p. Then when we calculate the conditional probability Pr[U() = 0 | A() = 1], we calculate the probability Pr[U() = 0 and A() = 1] and end up with p^2/4 -- if even one of U() and A() ends up what it’s supposed to be, this doesn’t happen. So the conditional probability is p/2, which still goes to 0. On the other hand, when we calculate Pr[U() = 1000000 | A() = 1] = Pr[U() = 1M and A() = 1] / Pr[A() = 1], the numerator is p^2/4 + p/2(1-p) -- in the case where A went haywire but U didn’t, this is what happens. So this conditional probability is 1 - p/2, which goes to 1. We believe this over the other probability, so we actually output A()=1 -- and this shouldn’t happen.
I tried to parse your comment but couldn’t. What are expressions like Pr[U()=0 | A()=1] supposed to mean? What event is “A()=1”? Newcomb’s problem makes two calls to A() which affect the resulting utility differently. If the two instances of A always get “randomly replaced” together or not at all, then I agree that it solves Newcomb’s problem, but I think the assumption is too strong. On the other hand, if they get “randomly replaced” independently, I think you need to give a more careful argument, and also I think it won’t work :-(
This is why I defined the probability space to be that, instead of A sometimes doing something random, there’s a low probability that A is replaced with a different agent that always does something random. I don’t see why the assumption is too strong. We can define the probability space any way we like, since we don’t actually have to implement it, all we need is to be able to prove things about the probability space.
Now that I say it carefully, it’s somewhat reminiscent of the problem you’re always objecting to: that we can’t separate A from the rest of the universe. But if we can pick out the things that are “agents”—basically, if we pick out anything that’s not immediately predictable, and I think that can be made rigorous—then we can make this definition.
Oh, but in the actual Newcomb’s problem, the two calls to A are actually calls to different but identical routines, aren’t they? Are they? One of them is A’s actual thought process, the other is Omega’s absolutely perfect prediction of A’s thought process. But on the other hand, none of the proofs go through if you can’t verify that the two copies are the same, which is equivalent to making them the same subroutine.
Yeah, the problem in the non-oracle setting is about separating A from the rest of the universe. I feel that any good solution to this problem should be “crisp” rather than delegated to A’s fuzzy reasoning abilities, because at this point in our research we’re not yet trying to make a good optimizer, but trying to define mathematically what optimization means in the first place.