I think you should self-modify to be updateless* with respect to the prior you have at the time of the modification. This is consistent with still anthropically updating with respect to information you have before the modification
Right, I was saying that assuming this, most agent-moments in this universe will have stopped anthropically updating long ago, i.e., they have a prior fixed at the time of self-modification and are no longer anthropically updating against it with new information. This feels weird to me, like if anthropically updating is philosophically correct, why is only a tiny fraction of agent-moments doing it. But maybe this is actually fine, if we say that anthropically updating is philosophically correct for people who have indexical values and we just happen to be the few agent-moments who have indexical values.
I guess there is still a further question of whether having indexical values is philosophically correct (for us), with a similar issue of “if it’s philosophically correct, why do so few agent-moments have indexical values”? I can imagine that question being ultimately resolved either way… Overall I think I’m still in essentially the same epistemic position as when I wrote Where do selfish values come from?:
So, should we freeze our selfish values, or rewind our values, or maybe even keep our “irrational” decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn’t too alien)? I don’t know what conclusions to draw from this line of thought
(where “freeze our selfish values” could be interpreted as “self-modify to be updateless with respect to the prior you have at the time of the modification” along with corresponding changes to values)
I personally don’t find this weird. By my lights, the ultimate justification for deciding to not update is how I expect the policy of not-updating to help me in the future. So if I’m in a situation where I just don’t expect to be helped by not-updating, I might as well update. I struggle to see what mystery is left here that isn’t dissolved by this observation.
I guess I’m not sure why “so few agent-moments having indexical values” should matter to what my values are — I simply don’t care about counterfactual worlds, when the real world has its own problems to fix. :)
So if I’m in a situation where I just don’t expect to be helped by not-updating, I might as well update.
What is this in referrence to? I’m not sure what part of my comment you’re replying to with this.
I struggle to see what mystery is left here that isn’t dissolved by this observation.
I may be missing your point, but I’ll just describe an interesting mystery that I see. Consider a model of a human mind with two parts, conscious and subconscious, each with their own values and decision making process. They interact in some complex way but sometimes the conscious part can strongly overpower the subconscious, like when a martyr sacrifices themselves for an ideology that they believe in. For simplicity let’s treat the conscious part as an independent agent of its own.
Today we don’t have technology to arbitrarily modify the subconscious part, but can self-modify the conscious part, just by “adopting” or “believing in” different ideas. Let’s say the conscious part of you could decide at any time to become updateless and not have indexical values. (Again let’s assume this for simplicity. Reality is perhaps messier than this.) It may still have to defer to the subconscious part much of the time, but in important moments it will be able to take charge and assert itself with its new decision theory and values.
Let’s say the conscious you is reluctant to become updateless now, because you’re not totally sure that’s actually a good idea. So you make a resolution that when you do fully solve all the relevant philosophical problems and end up deciding that updatelessness is correct, you’ll self-modify to be updateless with respect to today’s prior, instead of the future prior (at time of the modification).
AFAICT, there’s nothing stopping you from carrying this out. But consider that when the day finally comes, you could also think, “If 15-year old me had known about updatelessness, he would have made the same resolution but with respect to his prior instead of Anthony-2024′s prior. The fact that he didn’t is simply a mistake or historical accident, which I have the power to correct. Why shouldn’t I act as if he did make that resolution?” And I don’t see what would stop you from carrying that out either.
An important point to emphasize here is that your conscious mind currently isn’t running some decision theory with a well-defined algorithm and utility function, so we can’t decide what to do by thinking “what would this decision theory recommend”. Instead it runs on ideas/memes, and for those of us who really like philosophical ideas/memes, it runs on philosophy. And I currently don’t know what philosophy will ultimately say about I should do when it comes to self-modifying to become updateless, and specifically which prior to become updateless with respect to. “Self-modify to be updateless with respect to the prior you have at the time of the modification” would be obvious if we were running a decision theory, but it’s not obvious because of considerations like the above, and who knows what other considerations/arguments there may be that we haven’t even thought of yet.
I took you to be saying: If the vast majority of agent-moments don’t update, this is some sign that those of us who do still update might be making a mistake.
So I’m saying: I know that 1) the reason the vast majority of agent-moments wouldn’t update (let’s grant this) is that they had predecessors who bound them not to update, and 2) I just am not bound by any such predecessors. Then, due to (2) it’s unsurprising that what’s optimal for me would be different from what the vast majority of agent-moments do.
Re: your explanation of the mystery:
So you make a resolution that when you do fully solve all the relevant philosophical problems and end up deciding that updatelessness is correct, you’ll self-modify to be updateless with respect to today’s prior, instead of the future prior (at time of the modification).
Not central (I think?), but I’m unsure whether this move works; at least, it depends on the details of the situation. E.g. if the hope is “By self-modifying later on to be updateless w.r.t. my current prior, I’ll still be able to cooperate with lots of other agents in a similar epistemic situation to my current one, even after we end up in different epistemic situations [in which my decision is much less correlated with those agents’ decisions],” I’m skeptical of that, for reasons similar to my argument here.
when the day finally comes, you could also think, “If 15-year old me had known about updatelessness, he would have made the same resolution but with respect to his prior instead of Anthony-2024′s prior. The fact that he didn’t is simply a mistake or historical accident, which I have the power to correct. Why shouldn’t I act as if he did make that resolution?” And I don’t see what would stop you from carrying that out either.
I think where we disagree is that I’m unconvinced there is any mistake-from-my-current-perspective to correct in the cases of anthropic updating. There would have been a mistake from the perspective of some hypothetical predecessor of mine asked to choose between different plans (before knowing who I am), but that’s just not my perspective. I’d claim that in order to argue I’m making a mistake from my current perspective, you’d want to argue that I don’t actually get information such that anthropic updating follows from Bayesianism.
An important point to emphasize here is that your conscious mind currently isn’t running some decision theory with a well-defined algorithm and utility function, so we can’t decide what to do by thinking “what would this decision theory recommend”.
I absolutely agree with this! And don’t see why it’s in tension with my view.
It seems hard for me to understand you, which may be due to my lack of familiarity with your overall views on decision theory and related philosophy. Do you have something that explains, e.g., what is your current favorite decision theory and how should it be interpreted (what are the type signatures of different variables, what are probabilities, what is the background metaphysics, etc.), what kinds uncertainties exist and how they relate to each other, what is your view on the semantics of indexicals, what type of a thing is an agent (do you take more of an algorithmic view, or a physical view)? (I tried looking into your post history and couldn’t find much that is relevant.) Also what are the “epistemic principles” that you mentioned in the OP?
I interpret a decision theory as an answer to “Given my values and beliefs, what am I trying to do as an agent (i.e., if rationality is ‘winning,’ what is ‘winning’)?” Insofar as I endorse maximizing expected utility, a decision theory is an answer to “How do I define ‘expected utility,’ and what options do I view myself as maximizing over?”
I think it’s important to consider these normative questions, not just “What decision procedure wins, given my definition of ‘winning’?”
On this interpretation of “decision theory,” EDT is the most appealing option I’m aware of. What I’m trying to do just seems to be: “make decisions such that I expect the best consequences conditional on those decisions.” The EDT criterion satisfies some very appealing principles like the “irrelevance of impossible outcomes.” And the “decisions” in question determine my actions in the given decision node.
I take view #1 in your list in “What are probabilities?”
I don’t think “arbitrariness” in this sense is problematic. There is a genuine mystery here as to why the world is the way it is, but I don’t think we can infer the existence of other worlds purely from our confusion.
And it just doesn’t seem that the thing I’m doing when I’m forming beliefs about the world is answering “how much do I care about different possible worlds?”
Indexicals: I haven’t formed a deliberate view on this. A flat-footed response to cases like your “old puzzle” in the comment you linked: Insofar as I simply don’t experience a superposition of experiences at once, it seems that if I get copied, “I” just will experience one of the copies’ experience-streams and not the others’. (Again I don’t consider it problematic that there’s some arbitrariness in which of the copies ends up being “me” — indeed if Everett is right then this sort of arbitrary direction of the flow of experience-streams happens all the time.) I think “you are just a different person from your future self, so there’s no fact of the matter what you will observe” is a reasonable alternative though.
I take a physicalist* view of agents: “There are particular configurations of stuff that can be well-modeled as ‘decision-makers.’ A configuration of stuff is ‘making a decision’ (relative to their epistemic state) insofar as they’re uncertain what their future behavior will be, and using some process that selects that future behavior in a way that is well-modeled as goal-directed. [Obviously there’s more to say about what counts as ‘well-modeled.’] My processes of deliberation about decisions and behavior resulting from those decisions can tell me what other configurations-of-stuff are probably doing, but I don’t see a motivation for modeling myself as actually being the same agent as those other configurations-of-stuff.”
Epistemic principles: Things like the principle of indifference, i.e., distribute credence equally over indistinguishable possibilities, all else equal.
* [Not to say I endorse physicalism in the broad sense]
Right, I was saying that assuming this, most agent-moments in this universe will have stopped anthropically updating long ago, i.e., they have a prior fixed at the time of self-modification and are no longer anthropically updating against it with new information. This feels weird to me, like if anthropically updating is philosophically correct, why is only a tiny fraction of agent-moments doing it. But maybe this is actually fine, if we say that anthropically updating is philosophically correct for people who have indexical values and we just happen to be the few agent-moments who have indexical values.
I guess there is still a further question of whether having indexical values is philosophically correct (for us), with a similar issue of “if it’s philosophically correct, why do so few agent-moments have indexical values”? I can imagine that question being ultimately resolved either way… Overall I think I’m still in essentially the same epistemic position as when I wrote Where do selfish values come from?:
(where “freeze our selfish values” could be interpreted as “self-modify to be updateless with respect to the prior you have at the time of the modification” along with corresponding changes to values)
That clarifies things somewhat, thanks!
I personally don’t find this weird. By my lights, the ultimate justification for deciding to not update is how I expect the policy of not-updating to help me in the future. So if I’m in a situation where I just don’t expect to be helped by not-updating, I might as well update. I struggle to see what mystery is left here that isn’t dissolved by this observation.
I guess I’m not sure why “so few agent-moments having indexical values” should matter to what my values are — I simply don’t care about counterfactual worlds, when the real world has its own problems to fix. :)
What is this in referrence to? I’m not sure what part of my comment you’re replying to with this.
I may be missing your point, but I’ll just describe an interesting mystery that I see. Consider a model of a human mind with two parts, conscious and subconscious, each with their own values and decision making process. They interact in some complex way but sometimes the conscious part can strongly overpower the subconscious, like when a martyr sacrifices themselves for an ideology that they believe in. For simplicity let’s treat the conscious part as an independent agent of its own.
Today we don’t have technology to arbitrarily modify the subconscious part, but can self-modify the conscious part, just by “adopting” or “believing in” different ideas. Let’s say the conscious part of you could decide at any time to become updateless and not have indexical values. (Again let’s assume this for simplicity. Reality is perhaps messier than this.) It may still have to defer to the subconscious part much of the time, but in important moments it will be able to take charge and assert itself with its new decision theory and values.
Let’s say the conscious you is reluctant to become updateless now, because you’re not totally sure that’s actually a good idea. So you make a resolution that when you do fully solve all the relevant philosophical problems and end up deciding that updatelessness is correct, you’ll self-modify to be updateless with respect to today’s prior, instead of the future prior (at time of the modification).
AFAICT, there’s nothing stopping you from carrying this out. But consider that when the day finally comes, you could also think, “If 15-year old me had known about updatelessness, he would have made the same resolution but with respect to his prior instead of Anthony-2024′s prior. The fact that he didn’t is simply a mistake or historical accident, which I have the power to correct. Why shouldn’t I act as if he did make that resolution?” And I don’t see what would stop you from carrying that out either.
An important point to emphasize here is that your conscious mind currently isn’t running some decision theory with a well-defined algorithm and utility function, so we can’t decide what to do by thinking “what would this decision theory recommend”. Instead it runs on ideas/memes, and for those of us who really like philosophical ideas/memes, it runs on philosophy. And I currently don’t know what philosophy will ultimately say about I should do when it comes to self-modifying to become updateless, and specifically which prior to become updateless with respect to. “Self-modify to be updateless with respect to the prior you have at the time of the modification” would be obvious if we were running a decision theory, but it’s not obvious because of considerations like the above, and who knows what other considerations/arguments there may be that we haven’t even thought of yet.
I took you to be saying: If the vast majority of agent-moments don’t update, this is some sign that those of us who do still update might be making a mistake.
So I’m saying: I know that 1) the reason the vast majority of agent-moments wouldn’t update (let’s grant this) is that they had predecessors who bound them not to update, and 2) I just am not bound by any such predecessors. Then, due to (2) it’s unsurprising that what’s optimal for me would be different from what the vast majority of agent-moments do.
Re: your explanation of the mystery:
Not central (I think?), but I’m unsure whether this move works; at least, it depends on the details of the situation. E.g. if the hope is “By self-modifying later on to be updateless w.r.t. my current prior, I’ll still be able to cooperate with lots of other agents in a similar epistemic situation to my current one, even after we end up in different epistemic situations [in which my decision is much less correlated with those agents’ decisions],” I’m skeptical of that, for reasons similar to my argument here.
I think where we disagree is that I’m unconvinced there is any mistake-from-my-current-perspective to correct in the cases of anthropic updating. There would have been a mistake from the perspective of some hypothetical predecessor of mine asked to choose between different plans (before knowing who I am), but that’s just not my perspective. I’d claim that in order to argue I’m making a mistake from my current perspective, you’d want to argue that I don’t actually get information such that anthropic updating follows from Bayesianism.
I absolutely agree with this! And don’t see why it’s in tension with my view.
It seems hard for me to understand you, which may be due to my lack of familiarity with your overall views on decision theory and related philosophy. Do you have something that explains, e.g., what is your current favorite decision theory and how should it be interpreted (what are the type signatures of different variables, what are probabilities, what is the background metaphysics, etc.), what kinds uncertainties exist and how they relate to each other, what is your view on the semantics of indexicals, what type of a thing is an agent (do you take more of an algorithmic view, or a physical view)? (I tried looking into your post history and couldn’t find much that is relevant.) Also what are the “epistemic principles” that you mentioned in the OP?
I interpret a decision theory as an answer to “Given my values and beliefs, what am I trying to do as an agent (i.e., if rationality is ‘winning,’ what is ‘winning’)?” Insofar as I endorse maximizing expected utility, a decision theory is an answer to “How do I define ‘expected utility,’ and what options do I view myself as maximizing over?”
I think it’s important to consider these normative questions, not just “What decision procedure wins, given my definition of ‘winning’?”
(I discuss similar themes here.)
On this interpretation of “decision theory,” EDT is the most appealing option I’m aware of. What I’m trying to do just seems to be: “make decisions such that I expect the best consequences conditional on those decisions.” The EDT criterion satisfies some very appealing principles like the “irrelevance of impossible outcomes.” And the “decisions” in question determine my actions in the given decision node.
I take view #1 in your list in “What are probabilities?”
I don’t think “arbitrariness” in this sense is problematic. There is a genuine mystery here as to why the world is the way it is, but I don’t think we can infer the existence of other worlds purely from our confusion.
And it just doesn’t seem that the thing I’m doing when I’m forming beliefs about the world is answering “how much do I care about different possible worlds?”
Indexicals: I haven’t formed a deliberate view on this. A flat-footed response to cases like your “old puzzle” in the comment you linked: Insofar as I simply don’t experience a superposition of experiences at once, it seems that if I get copied, “I” just will experience one of the copies’ experience-streams and not the others’. (Again I don’t consider it problematic that there’s some arbitrariness in which of the copies ends up being “me” — indeed if Everett is right then this sort of arbitrary direction of the flow of experience-streams happens all the time.) I think “you are just a different person from your future self, so there’s no fact of the matter what you will observe” is a reasonable alternative though.
I take a physicalist* view of agents: “There are particular configurations of stuff that can be well-modeled as ‘decision-makers.’ A configuration of stuff is ‘making a decision’ (relative to their epistemic state) insofar as they’re uncertain what their future behavior will be, and using some process that selects that future behavior in a way that is well-modeled as goal-directed. [Obviously there’s more to say about what counts as ‘well-modeled.’] My processes of deliberation about decisions and behavior resulting from those decisions can tell me what other configurations-of-stuff are probably doing, but I don’t see a motivation for modeling myself as actually being the same agent as those other configurations-of-stuff.”
Epistemic principles: Things like the principle of indifference, i.e., distribute credence equally over indistinguishable possibilities, all else equal.
* [Not to say I endorse physicalism in the broad sense]