That solution has also been discussed on the list. Unfortunately, for it to work as intended, the different copies of the agent must use correlated random variables, which is not very realistic in problems like the Prisoner’s Dilemma.
Right, but I’m not suggesting necessarily that the agent actually make decisions randomly, merely that we use this as an aid in proofs. For instance, suppose we assume that all agents (including A itself) have a probability of p (independently) of being replaced by a different algorithm, which just acts randomly. Then, we can try to prove that Pr[world does Y|agent does X] in this probability space approaches 1 as p approaches 0 (colloquially, “agent does X ⇒ world does Y with high probability”).
Then if we end up accidentally proving silly things, then we merely end up proving that Pr[agent does X] goes to 0 (because it doesn’t happen except when actions are chosen at random). However, the conditional probability shouldn’t be affected.
For instance, take Newcomb’s problem. The terrible thing in the usual case is that we stumble on a proof that A()=2, so then it follows that A()=1 ⇒ U() = 0, which scares us so we output A()=2.
Here that can’t happen. Suppose we stumbled on a proof that A()=2 w.h.p. Then when we calculate the conditional probability Pr[U() = 0 | A() = 1], we calculate the probability Pr[U() = 0 and A() = 1] and end up with p^2/4 -- if even one of U() and A() ends up what it’s supposed to be, this doesn’t happen. So the conditional probability is p/2, which still goes to 0. On the other hand, when we calculate Pr[U() = 1000000 | A() = 1] = Pr[U() = 1M and A() = 1] / Pr[A() = 1], the numerator is p^2/4 + p/2(1-p) -- in the case where A went haywire but U didn’t, this is what happens. So this conditional probability is 1 - p/2, which goes to 1. We believe this over the other probability, so we actually output A()=1 -- and this shouldn’t happen.
I tried to parse your comment but couldn’t. What are expressions like Pr[U()=0 | A()=1] supposed to mean? What event is “A()=1”? Newcomb’s problem makes two calls to A() which affect the resulting utility differently. If the two instances of A always get “randomly replaced” together or not at all, then I agree that it solves Newcomb’s problem, but I think the assumption is too strong. On the other hand, if they get “randomly replaced” independently, I think you need to give a more careful argument, and also I think it won’t work :-(
This is why I defined the probability space to be that, instead of A sometimes doing something random, there’s a low probability that A is replaced with a different agent that always does something random. I don’t see why the assumption is too strong. We can define the probability space any way we like, since we don’t actually have to implement it, all we need is to be able to prove things about the probability space.
Now that I say it carefully, it’s somewhat reminiscent of the problem you’re always objecting to: that we can’t separate A from the rest of the universe. But if we can pick out the things that are “agents”—basically, if we pick out anything that’s not immediately predictable, and I think that can be made rigorous—then we can make this definition.
Oh, but in the actual Newcomb’s problem, the two calls to A are actually calls to different but identical routines, aren’t they? Are they? One of them is A’s actual thought process, the other is Omega’s absolutely perfect prediction of A’s thought process. But on the other hand, none of the proofs go through if you can’t verify that the two copies are the same, which is equivalent to making them the same subroutine.
Yeah, the problem in the non-oracle setting is about separating A from the rest of the universe. I feel that any good solution to this problem should be “crisp” rather than delegated to A’s fuzzy reasoning abilities, because at this point in our research we’re not yet trying to make a good optimizer, but trying to define mathematically what optimization means in the first place.
That solution has also been discussed on the list. Unfortunately, for it to work as intended, the different copies of the agent must use correlated random variables, which is not very realistic in problems like the Prisoner’s Dilemma.
Right, but I’m not suggesting necessarily that the agent actually make decisions randomly, merely that we use this as an aid in proofs. For instance, suppose we assume that all agents (including A itself) have a probability of p (independently) of being replaced by a different algorithm, which just acts randomly. Then, we can try to prove that Pr[world does Y|agent does X] in this probability space approaches 1 as p approaches 0 (colloquially, “agent does X ⇒ world does Y with high probability”).
Then if we end up accidentally proving silly things, then we merely end up proving that Pr[agent does X] goes to 0 (because it doesn’t happen except when actions are chosen at random). However, the conditional probability shouldn’t be affected.
For instance, take Newcomb’s problem. The terrible thing in the usual case is that we stumble on a proof that A()=2, so then it follows that A()=1 ⇒ U() = 0, which scares us so we output A()=2.
Here that can’t happen. Suppose we stumbled on a proof that A()=2 w.h.p. Then when we calculate the conditional probability Pr[U() = 0 | A() = 1], we calculate the probability Pr[U() = 0 and A() = 1] and end up with p^2/4 -- if even one of U() and A() ends up what it’s supposed to be, this doesn’t happen. So the conditional probability is p/2, which still goes to 0. On the other hand, when we calculate Pr[U() = 1000000 | A() = 1] = Pr[U() = 1M and A() = 1] / Pr[A() = 1], the numerator is p^2/4 + p/2(1-p) -- in the case where A went haywire but U didn’t, this is what happens. So this conditional probability is 1 - p/2, which goes to 1. We believe this over the other probability, so we actually output A()=1 -- and this shouldn’t happen.
I tried to parse your comment but couldn’t. What are expressions like Pr[U()=0 | A()=1] supposed to mean? What event is “A()=1”? Newcomb’s problem makes two calls to A() which affect the resulting utility differently. If the two instances of A always get “randomly replaced” together or not at all, then I agree that it solves Newcomb’s problem, but I think the assumption is too strong. On the other hand, if they get “randomly replaced” independently, I think you need to give a more careful argument, and also I think it won’t work :-(
This is why I defined the probability space to be that, instead of A sometimes doing something random, there’s a low probability that A is replaced with a different agent that always does something random. I don’t see why the assumption is too strong. We can define the probability space any way we like, since we don’t actually have to implement it, all we need is to be able to prove things about the probability space.
Now that I say it carefully, it’s somewhat reminiscent of the problem you’re always objecting to: that we can’t separate A from the rest of the universe. But if we can pick out the things that are “agents”—basically, if we pick out anything that’s not immediately predictable, and I think that can be made rigorous—then we can make this definition.
Oh, but in the actual Newcomb’s problem, the two calls to A are actually calls to different but identical routines, aren’t they? Are they? One of them is A’s actual thought process, the other is Omega’s absolutely perfect prediction of A’s thought process. But on the other hand, none of the proofs go through if you can’t verify that the two copies are the same, which is equivalent to making them the same subroutine.
Yeah, the problem in the non-oracle setting is about separating A from the rest of the universe. I feel that any good solution to this problem should be “crisp” rather than delegated to A’s fuzzy reasoning abilities, because at this point in our research we’re not yet trying to make a good optimizer, but trying to define mathematically what optimization means in the first place.