Can someone explain to me the significance of problems like Sleeping Beauty? I see a lot of digital ink being spilled over them and I can kind of see how they call into question what we mean by “probability” and “expected utility”, but I can’t quite pin down the thread that connects all of them. Someone will pose a solution to a paradox X, and then another reply with a modified version X’ that the previous solution fails on, and I tend to have trouble seeing what the exact thing is people are trying to solve.
If you want to build an AI that maximizes utility, and that AI can create copies of itself, and each copy’s existence and state of knowledge can also depend on events happening in the world, then you need a general theory of how to make decisions in such situations. In the limiting case when there’s no copying at all, the solution is standard Bayesian rationality and expected utility maximization, but that falls apart when you introduce copying. Basically we need a theory that looks as nice as Bayesian rationality, is reflectively consistent (i.e. the AI won’t immediately self-modify away from it), and leads to reasonable decisions in the presence of copying. Coming up with such a theory turns out to be surprisingly hard. Many of us feel that UDT is the right approach, but many gaps still have to be filled in.
Note that many problems that involve copying can be converted to problems that create identical mind states by erasing memories. My favorite motivating example is the Absent-Minded Driver problem. The Sleeping Beauty problem is similar to that, but formulated in terms of probabilities instead of decisions, so people get confused.
An even simpler way to emulate copying is by putting multiple people in the same situation. That leads to various “anthropic problems”, which are well covered in Bostrom’s book. My favorite example of these is Psy-Kosh’s problem.
Another idea that’s equivalent to copying is having powerful agents that can predict your actions, like in Newcomb’s problem, Counterfactual Mugging and some more complicated scenarios that we came up with.
When a problem involves a predictor that’s predicting your actions, it can often be transformed into another problem that has an indistinguishable copy of you inside the predictor. In some cases, like Counterfactual Mugging, the copy and the original can even receive different evidence, though they are still unable to tell which is which.
There are more complicated scenarios, where the predictor is doing high-level logical reasoning about you instead of running a simulation of you. In simple cases like Newcomb’s Problem, that distinction doesn’t matter, but there is an important family of problems where it matters. The earliest known example is Gary Drescher’s Agent Simulates Predictor. Other examples are Wei Dai’s problem about bargaining and logical uncertainty and my own problem about logical priors. Right now this is the branch of decision theory that interests me most.
Can you formalize the idea of “copying” and show why expected utility maximization fails once I have “copied” myself? I think I understand why Newcomb’s problem is interesting and significant, but in terms of an AI rewriting its source code… well, my brain is changing all the time and I don’t think I have any problems with expected utility maximization.
We can formalize “copying” by using information sets that include more than one node, as I tried to do in this post. Expected utility maximization fails on such problems because your subjective probability of being at a certain node might depend on the action you’re about to take, as mentioned in this thread.
The Absent-Minded Driver problem is an example of such dependence, because your subjective probability of being at the second intersection depends on your choosing to go straight at the first intersection, and the two intersections are indistinguishable to you.
I don’t know about academic philosophy, but on Less Wrong there is the hope of one day coming up with an algorithm that calculates the “best”, “most rational” way to act.
That’s a bit of a simplification, though. It is hoped that we can separate the question of how to learn (epistemology) and what is right (moral philosophy) from the question of given one’s knowledge and values, what is the “best”, “most rational” way to behave? (decision theory).
The von Neumann–Morgenstern theorem is the paradigmatic result here. It suggests (but does not prove) that given one’s beliefs and values, one “should” act so as to maximize a certain weighted sum. But as the various paradoxes show, this is far from the last word on the matter.
Can someone explain to me the significance of problems like Sleeping Beauty? I see a lot of digital ink being spilled over them and I can kind of see how they call into question what we mean by “probability” and “expected utility”, but I can’t quite pin down the thread that connects all of them. Someone will pose a solution to a paradox X, and then another reply with a modified version X’ that the previous solution fails on, and I tend to have trouble seeing what the exact thing is people are trying to solve.
If you want to build an AI that maximizes utility, and that AI can create copies of itself, and each copy’s existence and state of knowledge can also depend on events happening in the world, then you need a general theory of how to make decisions in such situations. In the limiting case when there’s no copying at all, the solution is standard Bayesian rationality and expected utility maximization, but that falls apart when you introduce copying. Basically we need a theory that looks as nice as Bayesian rationality, is reflectively consistent (i.e. the AI won’t immediately self-modify away from it), and leads to reasonable decisions in the presence of copying. Coming up with such a theory turns out to be surprisingly hard. Many of us feel that UDT is the right approach, but many gaps still have to be filled in.
Note that many problems that involve copying can be converted to problems that create identical mind states by erasing memories. My favorite motivating example is the Absent-Minded Driver problem. The Sleeping Beauty problem is similar to that, but formulated in terms of probabilities instead of decisions, so people get confused.
An even simpler way to emulate copying is by putting multiple people in the same situation. That leads to various “anthropic problems”, which are well covered in Bostrom’s book. My favorite example of these is Psy-Kosh’s problem.
Another idea that’s equivalent to copying is having powerful agents that can predict your actions, like in Newcomb’s problem, Counterfactual Mugging and some more complicated scenarios that we came up with.
Can you explain this equivalence?
When a problem involves a predictor that’s predicting your actions, it can often be transformed into another problem that has an indistinguishable copy of you inside the predictor. In some cases, like Counterfactual Mugging, the copy and the original can even receive different evidence, though they are still unable to tell which is which.
There are more complicated scenarios, where the predictor is doing high-level logical reasoning about you instead of running a simulation of you. In simple cases like Newcomb’s Problem, that distinction doesn’t matter, but there is an important family of problems where it matters. The earliest known example is Gary Drescher’s Agent Simulates Predictor. Other examples are Wei Dai’s problem about bargaining and logical uncertainty and my own problem about logical priors. Right now this is the branch of decision theory that interests me most.
Can you formalize the idea of “copying” and show why expected utility maximization fails once I have “copied” myself? I think I understand why Newcomb’s problem is interesting and significant, but in terms of an AI rewriting its source code… well, my brain is changing all the time and I don’t think I have any problems with expected utility maximization.
We can formalize “copying” by using information sets that include more than one node, as I tried to do in this post. Expected utility maximization fails on such problems because your subjective probability of being at a certain node might depend on the action you’re about to take, as mentioned in this thread.
The Absent-Minded Driver problem is an example of such dependence, because your subjective probability of being at the second intersection depends on your choosing to go straight at the first intersection, and the two intersections are indistinguishable to you.
I don’t know about academic philosophy, but on Less Wrong there is the hope of one day coming up with an algorithm that calculates the “best”, “most rational” way to act.
That’s a bit of a simplification, though. It is hoped that we can separate the question of how to learn (epistemology) and what is right (moral philosophy) from the question of given one’s knowledge and values, what is the “best”, “most rational” way to behave? (decision theory).
The von Neumann–Morgenstern theorem is the paradigmatic result here. It suggests (but does not prove) that given one’s beliefs and values, one “should” act so as to maximize a certain weighted sum. But as the various paradoxes show, this is far from the last word on the matter.