I’ve been thinking about this topic, off and on, at least since September 1997, when I joined the Extropians mailing list, and sent off a “copying related probability question” (which is still in my “sent” folder but apparently no longer archived anywhere that Google can find). Both Eliezer and Nick were also participants in that discussion. What are the chances that we’re still trying to figure this out 12 years later?
My current position, for what it’s worth, is that anticipation and continuity of experience are both evolutionary adaptations that will turn maladaptive when mind copying/merging becomes possible. Theoretically, evolution could have programmed us to use UDT, in which case this dilemma wouldn’t exist now, because anticipation and continuity of experience is not part of UDT.
So why don’t we just switch over to UDT, and consider the problem solved (assuming this kind of self-modification is feasible)? The problem with that is that much of our preferences are specified in terms of anticipation of experience, and there is no obvious way how to map those onto UDT preferences. For example, suppose you’re about to be tortured in an hour. Should you make as many copies as you can of yourself (who won’t be tortured) before the hour is up, in order to reduce your anticipation of the torture experience? You have to come up with a way to answer that question before you can switch to UDT.
One approach that I think is promising, which Johnicholas already suggested, is to ask “what would evolution do?” The way I interpret that is, whenever there’s an ambiguity in how to map our preferences onto UDT, or where our preferences are incoherent, pick the UDT preference that maximizes evolutionary success.
But a problem with that, is that what evolution does depends on where you look. For example, suppose you sample Reality using some weird distribution. (Let’s say you heavily favor worlds where lottery numbers always come out to be the digits of pi.) Then you might find a bunch of Bayesians who use that weird distribution as their prior (or the UDT equivalent of that), since they would be the ones having the most evolutionary success in that part of Reality.
The next thought is that perhaps algorithmic complexity and related concepts can help here. Maybe there is a natural way to define a measure over Reality, to say that most of Reality is here, and not there. And then say we want to maximize evolutionary success under this measure.
How to define “evolutionary success” is another issue that needs to be resolved in this approach. I think some notion of “amount of Reality under one’s control/influence” (and not “number of copies/descendants”) would make the most sense.
I just happened to see this whilst it happens to be 12 years later. I wonder what your sense of this puzzle is now (object-level as well as meta-level).
I’m not really aware of any significant progress since 12 years ago. I’ve mostly given up working on this problem, or most object-level philosophical problems, due to slow pace of progress and perceived opportunity costs. (Spending time on ensuring a future where progress on such problems can continue to be made, e.g., fighting against x-risk and value/philosophical lock-in or drift, seems a better bet even for the part of me that really wants to solve philosophical problems.) It seems like there’s a decline in other LWer’s interest in the problem, maybe for similar reasons?
My thread of subjective experience is a fundamental part of how I feel from the inside. Exchanging it for something else would be pretty much equivalent to death—death in the human, subjective sense. I would not wish to exchange it unless the alternative was torture for a googol years or something of that ilk.
That’s a good point. I probably wouldn’t want to give up my thread of subjective experience either. But unless I switch (or someone comes up with a better solution than UDT), when mind copying/merging becomes possible I’m probably going to start making some crazy decisions.
I’m not sure what the solution is, but here’s one idea. An UDT agent doesn’t use anticipation or continuity of experience to make decisions, but perhaps it can run some computations on the side to generate the qualia of anticipation and continuity.
Another idea, which may be more intuitively acceptable, is don’t make the switch yourself. Create a copy, and have the copy switch to UDT (before it starts running). Then give most of your resources to the copy and live a single-threaded life under its protection. (I guess the copy in this case isn’t so much a copy but more of a personal FAI.)
That’s what I was thinking, too. You make tools the best way you can. The distinction between tools that are or aren’t part of you will ultimately become meaningless anyway. We’re going to populate the galaxy with huge Jupiter brains that are incredibly smart and powerful but whose only supergoal is to protect a tiny human-nugget inside.
Seeing that others here are trying to figure out how to make probabilities of anticipated subjective experiences work, I should perhaps mention that I spent quite a bit of time near the beginning of those 12 years trying to do the same thing. As you can see, I eventually gave up and decided that such probabilities shouldn’t play a role in a decision theory for agents who can copy and merge themselves.
This isn’t to discourage others from exploring this approach. There could easily be something that I overlooked, that a fresh pair of eyes can find. Or maybe someone can give a conclusive argument that explains why it can’t work.
BTW, notice that UDT not only doesn’t involve anticipatory probabilities, it doesn’t even involve indexical probabilities (i.e. answers to “where am I likely to be, given my memories and observations?” as opposed to “what should I expect to see later?”). It seems fairly obvious that if you don’t have indexical probabilities, then you can’t have anticipatory probabilities. (See ETA below.) I tried to give an argument against indexical probabilities, which apparently nobody (except maybe Nesov) liked. Can anyone do better?
ETA: In the Absent-Minded Driver problem, suppose after you make the decision to EXIT or CONTINUE, you get to see which intersection you’re actually at (and this is also forgotten by the time you get to the next intersection). Then clearly your anticipatory probability for seeing ‘X’, if it exists, ought to be the same as your indexical probability of being at X.
So why don’t we just switch over to UDT, and consider the problem solved
Because we can’t interpret UDT’s decision algorithm as providing epistemic advice. It says to never update our priors and even to go on putting weight on logical impossibilities after they’re known to be impossible. UDT tells us what to do—but not what to anticipate seeing happen next.
If you appeal to intuitions about rigor, it’s not so much an outlier since fear and excitement must be aspects of rigorously reconstructed preference as well.
I find myself simultaneously convinced and unconvinced by this! Anticipation (dependent, of course, on your definition) is surely a vital tool in any agent that wants to steer the future? Or do you mean ‘human anticipation’ as differentiated from other kinds? In which case, what demarcates that from whatever an AI would do in thinking about the future?
However, Dai, your top level comment sums up my eventual thoughts on this problem very well. I’ve been trying for a long time to resign myself to the idea that a notion of discrete personal experience is incompatible with what we know about the world. Doesn’t make it any easier though.
My two cents—the answer to this trilemma will come from thinking about the system as a whole rather than personal experience. Can we taboo ‘personal experience’ and find a less anthropocentric way to think about this?
UDT tells us what to do—but not what to anticipate seeing happen next.
Ok, we can count that as a disadvantage when comparing UDT with alternative solutions, but why is it a deal-killer for you, especially since you’re mainly interested in decision theory as a tool for programming FAI? As long as the FAI knows what to do, why do you care so much that it doesn’t anticipate seeing what happen next?
Ok, but that appears to be the same reason that I gave (right after I asked the question) for why we can’t switch over to UDT yet. So why did you give a another answer without reference to mine? That seems to be needlessly confusing. Here’s how I put it:
The problem with that is that much of our preferences are specified in terms of anticipation of experience, and there is no obvious way how to map those onto UDT preferences.
There’s more in that comment where I explored one possible approach to this problem. Do you have any thoughts on that?
Also, do you agree (or think it’s a possibility) that specifying preferences in terms of anticipation (instead of, say, world histories) was an evolutionary “mistake”, because evolution couldn’t anticipate that one day there would be mind copying/merging technology? If so, that doesn’t necessarily mean we should discard such preferences, but I think it does mean that there is no need to treat it as somehow more fundamental than other kinds of preferences, such as, for example, the fear of stepping into a teleporter that uses destructive scanning, or the desire not to be consigned to a tiny portion of Reality due to “mistaken” preferences.
I can’t switch over to UDT because it doesn’t tell me what I’ll see next, except to the extent it tells me to expect to see pi < 3 with some measure. It’s not that it doesn’t map. It’s that UDT goes on assigning measure to 2 + 2 = 5, but I’ll never see that happen. UDT is not what I want to map my preferences onto, it’s not a difficulty of mapping.
That’s not what happens in my conception of UDT. Maybe in Nesov’s, but he hasn’t gotten it worked out, and I’m not sure it’s really going to work. My current position on this is still that you should update on your own internal computations, but not on input from the outside.
ETA:
UDT is not what I want to map my preferences onto, it’s not a difficulty of mapping.
Is that the same point that Dan Armak made, which I responded to, or a different one?
I can’t switch over to UDT because it doesn’t tell me what I’ll see next, except to the extent it tells me to expect to see pi < 3 with some measure.
It’s not you who should use UDT, it’s the world. This is a salient point of departure between FAI and humanity. FAI is not in the business of saying in words what you should expect. People are stuff of the world, not rules of the world or strategies to play by those rules. Rules and strategies don’t depend on particular moves, they specify how to handle them, but plays consist of moves, of evidence. This very distinction between plays and strategies is the true origin of updatelessness. It is the fault to make this distinction that causes the confusion UDT resolves.
Nesov, your writings are so hard to understand sometimes. Let me take this as an example and give you some detailed feedback. I hope it’s useful to you to determine in the future where you might have to explain in more detail or use more precise language.
It’s not you who should use UDT, it’s the world.
Do you mean “it’s not only you”, or “it’s the world except you”? If it’s the latter, it doesn’t seem to make any sense. If it’s the former, it doesn’t seem to answer Eliezer’s objection.
This is a salient point of departure between FAI and humanity.
Do you mean FAI should use UDT, and humanity shouldn’t?
FAI is not in the business of saying in words what you should expect.
Ok, this seems clear. (Although why not, if that would make me feel better?)
People are stuff of the world, not rules of the world or strategies to play by those rules.
By “stuff”, do you mean “part of the state of the world”? And people do in some sense embody strategies (what they would do in different situations), so what do you mean by “people are not strategies”?
Rules and strategies don’t depend on particular moves, they specify how to handle them, but plays consist of moves, of evidence. This very distinction between plays and strategies is the true origin of updatelessness. It is the fault to make this distinction that causes the confusion UDT resolves.
This part makes sense, but I don’t see the connection to what Eliezer wrote.
Do you mean “it’s not only you”, or “it’s the world except you”? If it’s the latter, it doesn’t seem to make any sense. If it’s the former, it doesn’t seem to answer Eliezer’s objection.
I mean the world as substrate, with “you” being implemented on the substrate of FAI. FAI runs UDT, you consist of FAI’s decisions (even if in the sense of “influenced by”, there seems to be no formal difference). The decisions are output of the strategy optimized for by UDT, two levels removed from running UDT themselves.
Do you mean FAI should use UDT, and humanity shouldn’t?
Yes, in the sense that humanity runs on the FAI-substrate that uses UDT or something on the level of strategy-optimization anyway, but humanity itself is not about optimization.
By “stuff”, do you mean “part of the state of the world”? And people do in some sense embody strategies (what they would do in different situations), so what do you mean by “people are not strategies”?
I suspect that people should be found in plays (what actually happens given the state of the world), not strategies (plans for every eventuality).
There is no problem with FAI looking at both past and future you—intuition only breaks down when you speak of first-person anticipation. You don’t care what FAI anticipates to see for itself and whether it does. The dynamic of past->future you should be good with respect to anticipation, just as it should be good with respect to excitement.
Part of which question? And whatever you call “causally connected” past/future persons is a property of the stuff-in-general that FAI puts into place in the right way.
Unless I’m misunderstanding UDT, isn’t speed another issue? An FAI must know what’s likely to be happening in the near future in order to prioritize its computational resources so they’re handling the most likely problems. You wouldn’t want it churning through the implications of the Loch Ness monster being real while a mega-asteroid is headed for the earth.
Wei Dai should not be worrying about matters of mere efficiency at this point. First we need to know what to compute via a fast approximation.
(There are all sorts of exceptions to this principle, and they mostly have to do with “efficient” choices of representation that affect the underlying epistemology. You can view a Bayesian network as efficiently compressing a raw probability distribution, but it can also be seen as committing to an ontology that includes primitive causality.)
Wei Dai should not be worrying about matters of mere efficiency at this point. First we need to know what to compute via a fast approximation.
But that path is not viable here. If UDT claims to make decisions independently of any anticipation, then it seems it must be optimal on average over all the impossibilities it’s prepared to compute an output for. That means it must be sacrificing optimality in this world-state (by No Free Lunch), even given infinite computing time, so having a quick approximation doesn’t help.
If an AI running UDT is just as prepared to find Nessie as to find out how to stop the incoming asteroid, it will be inferior to a program designed just to find out how to stop asteroids. Expand the Nessie possibility to improbable world-states, and the asteroid possibility to probable ones, and you see the problem.
Though I freely admit I may be completely lost on this.
I’ve been thinking about this topic, off and on, at least since September 1997, when I joined the Extropians mailing list… What are the chances that we’re still trying to figure this out 12 years later?
Not small. I read that list and similar forums in the early 90s before becoming an AGI relinquishmentarian till about 2 years ago. When coming back to the discussions, I was astonished how most of the topics on discussion were essentially the same ones I remembered from 15 years earlier.
Note—there is a difference between investigating “what would evolution do?”, as a jumping-off point for other strategies, and recommending “we should do what evolution does”.
But a problem with that, is that what evolution does depends on where you look.
Why is it that if I set up a little grid-world on my computer and evolve little agents, I seem to get answers to the question “what does evolution do”? Am I encoding “where to look” into the grid-world somehow?
I’ve been thinking about this topic, off and on, at least since September 1997, when I joined the Extropians mailing list, and sent off a “copying related probability question” (which is still in my “sent” folder but apparently no longer archived anywhere that Google can find). Both Eliezer and Nick were also participants in that discussion. What are the chances that we’re still trying to figure this out 12 years later?
My current position, for what it’s worth, is that anticipation and continuity of experience are both evolutionary adaptations that will turn maladaptive when mind copying/merging becomes possible. Theoretically, evolution could have programmed us to use UDT, in which case this dilemma wouldn’t exist now, because anticipation and continuity of experience is not part of UDT.
So why don’t we just switch over to UDT, and consider the problem solved (assuming this kind of self-modification is feasible)? The problem with that is that much of our preferences are specified in terms of anticipation of experience, and there is no obvious way how to map those onto UDT preferences. For example, suppose you’re about to be tortured in an hour. Should you make as many copies as you can of yourself (who won’t be tortured) before the hour is up, in order to reduce your anticipation of the torture experience? You have to come up with a way to answer that question before you can switch to UDT.
One approach that I think is promising, which Johnicholas already suggested, is to ask “what would evolution do?” The way I interpret that is, whenever there’s an ambiguity in how to map our preferences onto UDT, or where our preferences are incoherent, pick the UDT preference that maximizes evolutionary success.
But a problem with that, is that what evolution does depends on where you look. For example, suppose you sample Reality using some weird distribution. (Let’s say you heavily favor worlds where lottery numbers always come out to be the digits of pi.) Then you might find a bunch of Bayesians who use that weird distribution as their prior (or the UDT equivalent of that), since they would be the ones having the most evolutionary success in that part of Reality.
The next thought is that perhaps algorithmic complexity and related concepts can help here. Maybe there is a natural way to define a measure over Reality, to say that most of Reality is here, and not there. And then say we want to maximize evolutionary success under this measure.
How to define “evolutionary success” is another issue that needs to be resolved in this approach. I think some notion of “amount of Reality under one’s control/influence” (and not “number of copies/descendants”) would make the most sense.
I just happened to see this whilst it happens to be 12 years later. I wonder what your sense of this puzzle is now (object-level as well as meta-level).
I’m not really aware of any significant progress since 12 years ago. I’ve mostly given up working on this problem, or most object-level philosophical problems, due to slow pace of progress and perceived opportunity costs. (Spending time on ensuring a future where progress on such problems can continue to be made, e.g., fighting against x-risk and value/philosophical lock-in or drift, seems a better bet even for the part of me that really wants to solve philosophical problems.) It seems like there’s a decline in other LWer’s interest in the problem, maybe for similar reasons?
My thread of subjective experience is a fundamental part of how I feel from the inside. Exchanging it for something else would be pretty much equivalent to death—death in the human, subjective sense. I would not wish to exchange it unless the alternative was torture for a googol years or something of that ilk.
Why would you wish to switch to UDT?
That’s a good point. I probably wouldn’t want to give up my thread of subjective experience either. But unless I switch (or someone comes up with a better solution than UDT), when mind copying/merging becomes possible I’m probably going to start making some crazy decisions.
I’m not sure what the solution is, but here’s one idea. An UDT agent doesn’t use anticipation or continuity of experience to make decisions, but perhaps it can run some computations on the side to generate the qualia of anticipation and continuity.
Another idea, which may be more intuitively acceptable, is don’t make the switch yourself. Create a copy, and have the copy switch to UDT (before it starts running). Then give most of your resources to the copy and live a single-threaded life under its protection. (I guess the copy in this case isn’t so much a copy but more of a personal FAI.)
That’s what I was thinking, too. You make tools the best way you can. The distinction between tools that are or aren’t part of you will ultimately become meaningless anyway. We’re going to populate the galaxy with huge Jupiter brains that are incredibly smart and powerful but whose only supergoal is to protect a tiny human-nugget inside.
Seeing that others here are trying to figure out how to make probabilities of anticipated subjective experiences work, I should perhaps mention that I spent quite a bit of time near the beginning of those 12 years trying to do the same thing. As you can see, I eventually gave up and decided that such probabilities shouldn’t play a role in a decision theory for agents who can copy and merge themselves.
This isn’t to discourage others from exploring this approach. There could easily be something that I overlooked, that a fresh pair of eyes can find. Or maybe someone can give a conclusive argument that explains why it can’t work.
BTW, notice that UDT not only doesn’t involve anticipatory probabilities, it doesn’t even involve indexical probabilities (i.e. answers to “where am I likely to be, given my memories and observations?” as opposed to “what should I expect to see later?”). It seems fairly obvious that if you don’t have indexical probabilities, then you can’t have anticipatory probabilities. (See ETA below.) I tried to give an argument against indexical probabilities, which apparently nobody (except maybe Nesov) liked. Can anyone do better?
ETA: In the Absent-Minded Driver problem, suppose after you make the decision to EXIT or CONTINUE, you get to see which intersection you’re actually at (and this is also forgotten by the time you get to the next intersection). Then clearly your anticipatory probability for seeing ‘X’, if it exists, ought to be the same as your indexical probability of being at X.
Because we can’t interpret UDT’s decision algorithm as providing epistemic advice. It says to never update our priors and even to go on putting weight on logical impossibilities after they’re known to be impossible. UDT tells us what to do—but not what to anticipate seeing happen next.
This presumably places anticipation together with excitement and fear—an aspect of human experience, but not a useful concept for decision theory.
I’m not convinced that “It turns out that pi is in fact greater than three” is a mere aspect of human experience.
If you appeal to intuitions about rigor, it’s not so much an outlier since fear and excitement must be aspects of rigorously reconstructed preference as well.
I find myself simultaneously convinced and unconvinced by this! Anticipation (dependent, of course, on your definition) is surely a vital tool in any agent that wants to steer the future? Or do you mean ‘human anticipation’ as differentiated from other kinds? In which case, what demarcates that from whatever an AI would do in thinking about the future?
However, Dai, your top level comment sums up my eventual thoughts on this problem very well. I’ve been trying for a long time to resign myself to the idea that a notion of discrete personal experience is incompatible with what we know about the world. Doesn’t make it any easier though.
My two cents—the answer to this trilemma will come from thinking about the system as a whole rather than personal experience. Can we taboo ‘personal experience’ and find a less anthropocentric way to think about this?
Ok, we can count that as a disadvantage when comparing UDT with alternative solutions, but why is it a deal-killer for you, especially since you’re mainly interested in decision theory as a tool for programming FAI? As long as the FAI knows what to do, why do you care so much that it doesn’t anticipate seeing what happen next?
Because I care about what I see next.
Therefore the FAI has to care about what I see next—or whatever it is that I should be caring about.
Ok, but that appears to be the same reason that I gave (right after I asked the question) for why we can’t switch over to UDT yet. So why did you give a another answer without reference to mine? That seems to be needlessly confusing. Here’s how I put it:
There’s more in that comment where I explored one possible approach to this problem. Do you have any thoughts on that?
Also, do you agree (or think it’s a possibility) that specifying preferences in terms of anticipation (instead of, say, world histories) was an evolutionary “mistake”, because evolution couldn’t anticipate that one day there would be mind copying/merging technology? If so, that doesn’t necessarily mean we should discard such preferences, but I think it does mean that there is no need to treat it as somehow more fundamental than other kinds of preferences, such as, for example, the fear of stepping into a teleporter that uses destructive scanning, or the desire not to be consigned to a tiny portion of Reality due to “mistaken” preferences.
I can’t switch over to UDT because it doesn’t tell me what I’ll see next, except to the extent it tells me to expect to see pi < 3 with some measure. It’s not that it doesn’t map. It’s that UDT goes on assigning measure to 2 + 2 = 5, but I’ll never see that happen. UDT is not what I want to map my preferences onto, it’s not a difficulty of mapping.
That’s not what happens in my conception of UDT. Maybe in Nesov’s, but he hasn’t gotten it worked out, and I’m not sure it’s really going to work. My current position on this is still that you should update on your own internal computations, but not on input from the outside.
ETA:
Is that the same point that Dan Armak made, which I responded to, or a different one?
It’s not you who should use UDT, it’s the world. This is a salient point of departure between FAI and humanity. FAI is not in the business of saying in words what you should expect. People are stuff of the world, not rules of the world or strategies to play by those rules. Rules and strategies don’t depend on particular moves, they specify how to handle them, but plays consist of moves, of evidence. This very distinction between plays and strategies is the true origin of updatelessness. It is the fault to make this distinction that causes the confusion UDT resolves.
Nesov, your writings are so hard to understand sometimes. Let me take this as an example and give you some detailed feedback. I hope it’s useful to you to determine in the future where you might have to explain in more detail or use more precise language.
Do you mean “it’s not only you”, or “it’s the world except you”? If it’s the latter, it doesn’t seem to make any sense. If it’s the former, it doesn’t seem to answer Eliezer’s objection.
Do you mean FAI should use UDT, and humanity shouldn’t?
Ok, this seems clear. (Although why not, if that would make me feel better?)
By “stuff”, do you mean “part of the state of the world”? And people do in some sense embody strategies (what they would do in different situations), so what do you mean by “people are not strategies”?
This part makes sense, but I don’t see the connection to what Eliezer wrote.
I mean the world as substrate, with “you” being implemented on the substrate of FAI. FAI runs UDT, you consist of FAI’s decisions (even if in the sense of “influenced by”, there seems to be no formal difference). The decisions are output of the strategy optimized for by UDT, two levels removed from running UDT themselves.
Yes, in the sense that humanity runs on the FAI-substrate that uses UDT or something on the level of strategy-optimization anyway, but humanity itself is not about optimization.
I suspect that people should be found in plays (what actually happens given the state of the world), not strategies (plans for every eventuality).
There is no problem with FAI looking at both past and future you—intuition only breaks down when you speak of first-person anticipation. You don’t care what FAI anticipates to see for itself and whether it does. The dynamic of past->future you should be good with respect to anticipation, just as it should be good with respect to excitement.
But part of the question is: must past/future me be causally connected to me?
Part of which question? And whatever you call “causally connected” past/future persons is a property of the stuff-in-general that FAI puts into place in the right way.
Unless I’m misunderstanding UDT, isn’t speed another issue? An FAI must know what’s likely to be happening in the near future in order to prioritize its computational resources so they’re handling the most likely problems. You wouldn’t want it churning through the implications of the Loch Ness monster being real while a mega-asteroid is headed for the earth.
Wei Dai should not be worrying about matters of mere efficiency at this point. First we need to know what to compute via a fast approximation.
(There are all sorts of exceptions to this principle, and they mostly have to do with “efficient” choices of representation that affect the underlying epistemology. You can view a Bayesian network as efficiently compressing a raw probability distribution, but it can also be seen as committing to an ontology that includes primitive causality.)
But that path is not viable here. If UDT claims to make decisions independently of any anticipation, then it seems it must be optimal on average over all the impossibilities it’s prepared to compute an output for. That means it must be sacrificing optimality in this world-state (by No Free Lunch), even given infinite computing time, so having a quick approximation doesn’t help.
If an AI running UDT is just as prepared to find Nessie as to find out how to stop the incoming asteroid, it will be inferior to a program designed just to find out how to stop asteroids. Expand the Nessie possibility to improbable world-states, and the asteroid possibility to probable ones, and you see the problem.
Though I freely admit I may be completely lost on this.
Not small. I read that list and similar forums in the early 90s before becoming an AGI relinquishmentarian till about 2 years ago. When coming back to the discussions, I was astonished how most of the topics on discussion were essentially the same ones I remembered from 15 years earlier.
Note—there is a difference between investigating “what would evolution do?”, as a jumping-off point for other strategies, and recommending “we should do what evolution does”.
Why is it that if I set up a little grid-world on my computer and evolve little agents, I seem to get answers to the question “what does evolution do”? Am I encoding “where to look” into the grid-world somehow?