If you want to build an AI that maximizes utility, and that AI can create copies of itself, and each copy’s existence and state of knowledge can also depend on events happening in the world, then you need a general theory of how to make decisions in such situations. In the limiting case when there’s no copying at all, the solution is standard Bayesian rationality and expected utility maximization, but that falls apart when you introduce copying. Basically we need a theory that looks as nice as Bayesian rationality, is reflectively consistent (i.e. the AI won’t immediately self-modify away from it), and leads to reasonable decisions in the presence of copying. Coming up with such a theory turns out to be surprisingly hard. Many of us feel that UDT is the right approach, but many gaps still have to be filled in.
Note that many problems that involve copying can be converted to problems that create identical mind states by erasing memories. My favorite motivating example is the Absent-Minded Driver problem. The Sleeping Beauty problem is similar to that, but formulated in terms of probabilities instead of decisions, so people get confused.
An even simpler way to emulate copying is by putting multiple people in the same situation. That leads to various “anthropic problems”, which are well covered in Bostrom’s book. My favorite example of these is Psy-Kosh’s problem.
Another idea that’s equivalent to copying is having powerful agents that can predict your actions, like in Newcomb’s problem, Counterfactual Mugging and some more complicated scenarios that we came up with.
When a problem involves a predictor that’s predicting your actions, it can often be transformed into another problem that has an indistinguishable copy of you inside the predictor. In some cases, like Counterfactual Mugging, the copy and the original can even receive different evidence, though they are still unable to tell which is which.
There are more complicated scenarios, where the predictor is doing high-level logical reasoning about you instead of running a simulation of you. In simple cases like Newcomb’s Problem, that distinction doesn’t matter, but there is an important family of problems where it matters. The earliest known example is Gary Drescher’s Agent Simulates Predictor. Other examples are Wei Dai’s problem about bargaining and logical uncertainty and my own problem about logical priors. Right now this is the branch of decision theory that interests me most.
Can you formalize the idea of “copying” and show why expected utility maximization fails once I have “copied” myself? I think I understand why Newcomb’s problem is interesting and significant, but in terms of an AI rewriting its source code… well, my brain is changing all the time and I don’t think I have any problems with expected utility maximization.
We can formalize “copying” by using information sets that include more than one node, as I tried to do in this post. Expected utility maximization fails on such problems because your subjective probability of being at a certain node might depend on the action you’re about to take, as mentioned in this thread.
The Absent-Minded Driver problem is an example of such dependence, because your subjective probability of being at the second intersection depends on your choosing to go straight at the first intersection, and the two intersections are indistinguishable to you.
If you want to build an AI that maximizes utility, and that AI can create copies of itself, and each copy’s existence and state of knowledge can also depend on events happening in the world, then you need a general theory of how to make decisions in such situations. In the limiting case when there’s no copying at all, the solution is standard Bayesian rationality and expected utility maximization, but that falls apart when you introduce copying. Basically we need a theory that looks as nice as Bayesian rationality, is reflectively consistent (i.e. the AI won’t immediately self-modify away from it), and leads to reasonable decisions in the presence of copying. Coming up with such a theory turns out to be surprisingly hard. Many of us feel that UDT is the right approach, but many gaps still have to be filled in.
Note that many problems that involve copying can be converted to problems that create identical mind states by erasing memories. My favorite motivating example is the Absent-Minded Driver problem. The Sleeping Beauty problem is similar to that, but formulated in terms of probabilities instead of decisions, so people get confused.
An even simpler way to emulate copying is by putting multiple people in the same situation. That leads to various “anthropic problems”, which are well covered in Bostrom’s book. My favorite example of these is Psy-Kosh’s problem.
Another idea that’s equivalent to copying is having powerful agents that can predict your actions, like in Newcomb’s problem, Counterfactual Mugging and some more complicated scenarios that we came up with.
Can you explain this equivalence?
When a problem involves a predictor that’s predicting your actions, it can often be transformed into another problem that has an indistinguishable copy of you inside the predictor. In some cases, like Counterfactual Mugging, the copy and the original can even receive different evidence, though they are still unable to tell which is which.
There are more complicated scenarios, where the predictor is doing high-level logical reasoning about you instead of running a simulation of you. In simple cases like Newcomb’s Problem, that distinction doesn’t matter, but there is an important family of problems where it matters. The earliest known example is Gary Drescher’s Agent Simulates Predictor. Other examples are Wei Dai’s problem about bargaining and logical uncertainty and my own problem about logical priors. Right now this is the branch of decision theory that interests me most.
Can you formalize the idea of “copying” and show why expected utility maximization fails once I have “copied” myself? I think I understand why Newcomb’s problem is interesting and significant, but in terms of an AI rewriting its source code… well, my brain is changing all the time and I don’t think I have any problems with expected utility maximization.
We can formalize “copying” by using information sets that include more than one node, as I tried to do in this post. Expected utility maximization fails on such problems because your subjective probability of being at a certain node might depend on the action you’re about to take, as mentioned in this thread.
The Absent-Minded Driver problem is an example of such dependence, because your subjective probability of being at the second intersection depends on your choosing to go straight at the first intersection, and the two intersections are indistinguishable to you.