It’s not just indexical uncertainty, it’s any kind of uncertainty, as possible worlds can trade with each other. Independence is approximation, adequate for our low-intelligence times, but breaking down as it becomes possible to study counterfactuals. It’s more obvious with indexical uncertainty, where the information can be transferred in apparent form by stupid physics, and less obvious with normal uncertainty, where it takes a mind.
This idea that possible worlds can trade with each other seems to have fairly radical implications. Together with Eliezer’s idea that agents who know each other’s source code ought to play cooperate in one-shot PD, doesn’t it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences? (Note: the resulting unified preferences are not necessarily characterized by expected utility maximization.)
Let me trace the steps of my logic here. First take 2 agents in the same world who know each other’s source code. Clearly, each adopting a common set of preferences can be viewed as playing Cooperate in a one-shot PD. Now take an agent who has identified a counterfactual agent in another possible world (who has in turn identified it). Each agent should also adopt a common set of preferences, in the expectation that the other will do so as well. Either iterating this process, or by doing a single global trade across all agents in all possible worlds, we should arrive at a common set of preferences between everyone.
Hmm, maybe this is just what you meant by “one global decision”? Since my original interest was to figure out what probabilities mean in the context of indexical uncertainty, let me ask you, do probabilities have any role to play in your decision theory?
Agents don’t need to merge by changing anything in their individual preferences, merging is just a way of looking at the system, like in process algebra. Three agents can be considered as three separate agents cooperating with each other, or as two agents, one a merge of the first two of the original ones, or as one merged agent. All different perspectives on the same system, revealing its structure.
The crucial relation in this picture is that the global cooperation must be a Pareto improvement over cooperations (merges) among any subset of the agents. This is a possible origin for the structure of fair coopertive strategy. More than that, if each agent that could otherwise be considered as individual is divided in this manner on a set of elementary preferences, and all of these elementary preferences are then dumped together in the global cooperation, this may provide all the detail the precise choice of the fair cooperative strategy might need. The “weights” come from the control that each of the elementary agents has over the world.
Are you familiar with Cooperative Game Theory? I’m just learning it now, but it sounds very similar to what you’re talking about, and maybe you can reused some of its theory and math. (For some reason I’ve only paid attention to non-cooperative game theory until recently.) Here’s a quote from page 356 of “Handbook of Game Theory with Economic Applications, Vol 1”:
Of all solution concepts of cooperative games, the core is probably the easiest to understand. It is the set of all feasible outcomes (payoffs) that no player (participant) or group of participants (coalition) can improve upon by acting for themselves.
I couldn’t find anything that “clicked” with cooperation in PD. Above, I wasn’t talking about a kind of Nash equilibrium protected from coalition deviations. The correlated strategy needs to be a Pareto improvement over possible coalition strategies run by subsets of the agents, but it doesn’t need to be stable in any sense. It can be strictly dominated, for example, by either individual or coalition deviations.
A core in Cooperative Game Theory doesn’t have to be a Nash equilibrium. Take a PD game with payoffs (2,2) (-1,3) (3,-1) (0,0). In Cooperative Game Theory, (-1,3) and (3,-1) are not considered improvements that a player can make over (2,2) by acting for himself. Maybe one way to think about it is that there is an agreement phase, and an action phase, and the core is the set of agreements that no subset of players can improve upon by publicly going off (and forming their own agreement) during the agreement phase. Once an agreement is reached, there is no deviation allowed in the action phase.
Again, I’m just learning Cooperative Game Theory, but that’s my understanding and it seems to correspond exactly to your concept.
The following is an honest non-rhetorical question: Is it not misleading to use the word ‘cooperation’ as you seem to be using it here? Don’t you still get ‘cooperation’ in this sense if the subsets of agents are not causally interacting with each other (say) but have still semi-Platonicly ‘merged’ via implicit logical interaction as compared to some wider context of decision algorithms that by logical necessity exhibit comparatively less merging? This sets up a situation where an agent can (even accidentally) engineer ‘Pareto improvements’ just by improving its decision algorithm (or more precisely replacing ‘its’ decision algorithm (everywhere ‘it’ is instantiated, of course...) with a new one that has the relevant properties of a new, possibly very different logical reference class). It’s a total bastardization of the concept of trade but it seems to be enough to result in some acausal economy (er, that is, some positive-affect-laden mysterious timeless attractor simultaneously constructed and instantiated by timeful interaction) or ‘global cooperation’ as you put it, and yet despite all that timeless interaction there are many ways it could turn out that would not look to our flawed timeful minds like cooperation. I don’t trust my intuitions about what ‘cooperation’ would look like at levels of organization or intelligence much different from my own, so I’m hesitant to use the word.
(I realize this is ‘debating definitions’ but connotations matter a lot when everything is so fuzzily abstract and yet somewhat affect-laden, I think. And anyway I’m not sure I’m actually debating definitions because I might be missing an important property of Pareto improvements that makes their application to agents that are logical-property-shifting-over-time not only a useless analogy but a confused one.)
This question is partially prompted by your post about the use of the word ‘blackmail’ as if it was technically clear and not just intuitively clear which interactions are blackmail, trade, cooperation, et cetera, outside of human social perception (which is of course probably correlated with more-objectively-correct-than-modern-human meta-ethical truths but definitely not precisely so).
If the above still looks like word salad to you… sigh please let me know so I can avoid pestering you ’til I’ve worked more on making my concepts and sentences clearer. (If it still looks way too much like word salad but you at least get the gist, that’d be good to know too.)
Is it not misleading to use the word ‘cooperation’ as you seem to be using it here?
Yes, it’s better to just say that there is probably some acausal morally relevant interaction, wherein the agents work on their own goals.
(I don’t understand what you were saying about time/causality. I disagree with Nesov_2009′s treatment of preference as magical substance inherent in parts of things.)
Together with Eliezer’s idea that agents who know each other’s source code ought to play cooperate in one-shot PD, doesn’t it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences?
It does, and I discussed that here. An interesting implication that I noticed a few weeks back is that an UFAI would want to cooperate with a counterfactual FAI, so we get a slice of the future even if we fail to build FAI, depending on how probable it was that we would be able to do that. A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process). (I really should make a post on this. Some of the credit due to Rolf Nelson for UFAI deterrence idea.)
A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process).
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.
I’d like to note a connection between Vladimir’s idea, and Robin Hanson’s moral philosophy, which also involves taking into account the wants of counterfactual agents.
I’m also reminded of Eliezer’s Three Worlds Collide story. If Vladimir’s right, many more worlds (in the sense of possible worlds) will be colliding (i.e., compromising/cooperating).
I look forward to seeing the technical details when they’ve been worked out.
Ok, so I see that probability plays a role in determining one’s “bargaining power”, which makes sense. We still need a rule that outputs a compromise set of preferences when given a set of agents, their probabilities, individual preferences, and resources as input, right? Does the rule need to be uniquely fair or obvious, so that everyone can agree to it without discussion? Do you have a suggestion for what this rule should be?
Edit: I see you’ve answered some of my questions already in the other reply. This is really interesting stuff!
I don’t get Counterfactual Mugging at all. Dissolve the problem thus: exactly which observer-moment do we, as problem-solvers, get to optimize mathematically? Best algorithm we can encode before learning the toss result: precommit to be “trustworthy”. Best algorithm we can encode after learning the toss result: keep the $100 and afterwards modify ourselves to be “trustworthy”—iff we expect similar encounters with Omega-like entities in the future with high enough expected utility. It’s pretty obvious that more information about the world allows us to encode a better algorithm. Is there anything more to it?
What’s observer-moment (more technically, as used here)? What does it mean to be “trustworthy”? (To be a cooperator? To fool Omega of being a cooperator?)
For keeping the $100: you are not the only source of info, you can’t really modify yourself like that, being only a human, and it’s specified that you don’t expect other encounters of this sort.
Whatever algorithm you can encode after you learn the toss result, you can encode before learning the toss result as well, by including it under the conditional clause, to be executed if the toss result matches the appropriate possibility.
More than that, whatever you do after you encounter the new info can be considered the execution of that conditional algorithm, already running in your mind, even if no deliberative effort for choosing it was made. By establishing an explicit conditional algorithm you are only optimizing the algorithm that is already in place, using that same algorithm, so it could be done after learning the info as well as before (well, not quite, but it’s unclear how significant is the effect of lack of reflective consistency when reconsidered under reflection).
Here’s a precise definition of “observer-moment”, “trustworthiness” and everything else you might care to want defined. But I will ask you for a favor in return...
Mathematical formulation 1: Please enter a program that prints “0” or “1″. If it prints “1” you lose $100, otherwise nothing happens.
Mathematical formulation 2: Please enter a program that prints “0” or “1″. If it prints “1” you gain $10000 or lose $100 with equal probability, otherwise nothing happens.
Philosophical formulation by Vladimir Nesov, Eliezer Yudkowsky et al: we ought to find some program that optimizes the variables in case 1 and case 2 simultaneously. It must, must, must exist! For grand reasons related to philosophy and AI!
Now the favor request: Vladimir, could you please go out of character just this once? Give me a mathematical formulation in the spirit of 1 and 2 that would show me that your and Eliezer’s theories have any nontrivial application whatsoever.
Vladimir, it’s work in progress; if I could state everything clearly, I would’ve written it up. It also seems that what is already written here and there informally on this subject is sufficient to communicate the idea, at least as problem statement.
Yes, that seems like an interesting way to think about your puzzle. Thanks for pointing out the connection. Have you considered what kind of decision theory would be needed to handle these violations of Independence?
Whole strategies need to be considered instead of individual actions, so that there is only one global decision, with individual actions selected as components of the overall calculation of the better global strategy. Indexical uncertainty becomes a constraint on strategy that requires actions to be equal in indistinguishable situations. More generally, different actions of the same agent can be regarded as separate actions of separate agents sharing the same preferences, who cooperate, exchanging info through the history of agent’s development that connects them (it). Even more generally, the same process should take care of cooperation of agents with different preferences. Even in that situation, the best global strategy will take into account (coordinate) all actions performed by all agents, including counterfactual ones (a benefit of reflective consistency enabling to perform calculations on the spot, not necessarily in advance).
So, expected utility (or some other order to that effect) is compared for the global strategies involving not just the agent, but all cooperating agents, and then the agent just plays its part in the selected global strategy. If the agent has a lot of info about where it is (low indexical uncertainty), then it’ll be able to perform a precisely targeted move within the global strategy, suited best for the place it’s in. The counterfactual and other-time/other-place counterparts of the agent will perform different moves for different details of the situation. Uncertainty (of any kind) limits the ability of the agent to custom-make its moves, so it must choose a single move targeted at the larger area of the territory over which it’s uncertain, instead of choosing different moves for each of its points, if it had the possible info discriminating among them.
I don’t believe that possible worlds can trade with each other, and I don’t see anything in Counterfactual Mugging to persuade me of that.
Expectation maximization is based on a model in which you inhabit a world state, and you have a set (possibly infinite) of possible future world states, and a probability (or point on a probability distribution) attached to each one. If you have interactions between your possible future states, you’re just not representing them correctly. The most you can say is that you are using some different model. You can’t say there’s a problem with the model, unless you demonstrate a situation your model can handle better than the standard model.
To answer the counterfactual mugging: You keep your $100. Because the game is over. You can’t gain money in another branch by giving up the $100. This is not a Newcomb-like situation.
Please provide a counterargument if you vote this down.
Consider two alternative possible worlds, forking from a common worldline with equal 50% probability. In one world, an agent A develops, and in another, an agent B. Agent A can either achieve U1 A-utilons or U2 B-utilons, U2>U1 (if A chooses to get U2 B-utilons, it produces 0 A-utilons). Agent B can either achieve U1 B-utilons, or U2 A-utilons. If each of them only thinks about itself, the outcome is U1 for A and U1 for B, that is not very much. If instead each of them optimizes the other-utility, both get U2. If this causes any troubles, shift the perspective to the point before the fork, and calculate expected utility for these strategies: first one has U1/2 in both A-utility and B-utility, while the second gives U2/2 utility for both, which is better.
It’s more efficient for them to produce utility for the other, which maps directly on the concept of trade. Counterfactual mugging explores exactly the same conceptual problems that you could get trying to accept the argument above. If you accept counterfactual mugging, you should accept the deal above as well. Of course, both agents must be capable of telling whether the other counterfactual agent is going to abide by the deal, which is Omega’s powers in CM.
Strategy one has U1/2 in both A-utility and B-utility with the additional property that the utility is in the correct fork where it can be used (i.e. it truly exists).
Strategy two has U2/2 in both A-utilty and B-utility but the additional property that the utility produced is not going to be usable in the fork where it is produced (i.e. the actual utility is really U0/2 unless the utility can be traded for the opposite utility which is actually usable in the same fork).
Assuming that there is no possibility of trade (since you describe no method by which it is possible):
I don’t see a requirement for trade existing in the counterfactual mugging problem so I accept it.
Since the above deal requires the possibility of trade to actually gain USABLE utility (arguably the only nonzero kind assuming that [PersonalUse OR Trade = Usability]) and I don’t see the possibility for trade, I am justified in rejecting the above deal despite accepting the counterfactual deal.
Utility is not instrumental, not used for something else, utility is the (abstract) thing you try to maximize, caring of nothing else. It’s the measure of success, all consequences taken into account (and is not itself “physical”). As such, it doesn’t matter in what way (or “where”) utility gets “produced”. Knowing that might be useful for the purpose of computing utility, but not for the purpose of interpreting the resulting amount, since utility is the final interpretation of the situation, the only one that matters.
Now, it might be that you consider events in the counterfactual worlds not valuable, but then it interrupts my argument a step earlier than you did, it makes incorrect the statement that A’s actions can produce B-utility. It could be that A can’t produce B-utility, but it can’t be that A produces B-utility but it doesn’t matter for B.
Hence the second paragraph about counterfactual mugging: if you accept that events in the counterfactual world can confer value, then you should take this deal as well. And no matter whether you accept CM or not, if you consider the problem in advance, you want to precommit to counterfactual trade. And hence, it’s a reflectively consistent thing to do to accept counterfactual trade later as well.
Fair enough. I’m willing to rephrase my argument as A can’t produce B utility because there is no B present in the world.
Yes, I do want to pre-commit to a counter-factual trade in the mugging because that is the cost of obtaining access to an offer of high expected utility (see my real-world rephrasing here for a more intuitive example case).
In the current world-splitting case, I see no utility for me since the opposing fork cannot produce it so there is no point to me pre-committing.
Why do you believe that the counterfactual isn’t valuable? You wrote:
I’m willing to rephrase my argument as A can’t produce B utility because there is no B present in the world.
That B is not present is a given possible world is not in itself a valid reason to morally ignore that possible world (there could be valid reasons, but B’s absence is not one of them for most preferences that are not specifically designed to make this condition hold, and for human-like morality in particular). For example, people clearly care about the (actual) world where they’ve died (not present): you won’t trade a penny a day while you live for eternal torture to everyone after you die (while you should, if you don’t care about the world where you are not present).
My default is to assume that B utility cannot be produced in a different world UNLESS it is of utility in B’s world to produce the utility in another world. One method by which this is possible is trade between the two worlds (which was the source of my initial response).
Your assumption seems to be that B utility will always have value in a different world.
My default assumption is explicitly overridden for the case where I feel good (have utility in the world where I am present) when I care about the world where I am not present.
Your (assumed) blanket assumption has the counterexample that while I feel good when someone has sex with me in the world where I am present (alive), I do not feel good (I feel nothing—and am currently repulsed by the thought = NEGATIVE utility) when someone has sex with me in the world where I am dead (not present).
ACK. Wait a minute. I’m clearly confusing the action that produced B utility with B utility itself. Your problem formulation did explicitly include your assumption (which thereby makes it a premise).
OK. I think I now accept your argument so far. I have a vague feeling that you’ve carried the argument to places where the premise/assumption isn’t valid but that’s obviously the subject for another post.
(Interesting karma question. I’ve made a mistake. How interesting is that mistake to the community? In this case, I think that it was a non-obvious mistake (certainly for me without working it through ;-) that others have a reasonable probability of making on an interesting subject so it should be of interest. We’ll see whether the karma results validate my understanding.)
(Just to be sure, I expect this is exactly the point you’ve changed your mind about, so there is no need for me to argue.)
My default is to assume that B utility cannot be produced in a different world UNLESS it is of utility in B’s world to produce the utility in another world.
Does not compute. Utility can’t be “in given world” or “useful” or “useful from a given world”. Utility is a measure of stuff, not stuff itself. Measure has no location.
Your assumption seems to be that B utility will always have value in a different world.
Not if we interpret “utility” as meaning “valuable stuff”. It’s not generally correct that the same stuff is equally valuable in all possible worlds. If in worlds of both agents A and B we can produce stuff X and Y, it might well be that producing X is world A has more B-utility than producing Y in world A, but producing X in world B has less B-utility than producing Y in world B. At the same time, given amount of B-utility is equally valuable, no matter where the stuff measured so got produced.
You’re presenting a standard PD, only distributed across possible worlds. Doesn’t seem to be any difference between splitting into 2 possible worlds, and taking 2 prisoners into 2 different cells. So you would need to provide a solution, a mechanism for cooperation, that would also work for the PD. And you haven’t.
Don’t know what you mean by “accept counterfactual mugging”. Especially since I just said I don’t agree with your interpretation of it. I believe the counterfactual mugging is also just a rephrasing of the PD. You should keep the $100 unless you would cooperate in a one-shot PD. We all know that rational agents would do better by cooperating, but that doesn’t make it happen.
That was the answer to the original edition of your question, that asked what does counterfactual mugging has to do with the argument for trade between possible worlds. I presented more or less a direct reduction in the comment above.
It’s not just indexical uncertainty, it’s any kind of uncertainty, as possible worlds can trade with each other. Independence is approximation, adequate for our low-intelligence times, but breaking down as it becomes possible to study counterfactuals. It’s more obvious with indexical uncertainty, where the information can be transferred in apparent form by stupid physics, and less obvious with normal uncertainty, where it takes a mind.
This idea that possible worlds can trade with each other seems to have fairly radical implications. Together with Eliezer’s idea that agents who know each other’s source code ought to play cooperate in one-shot PD, doesn’t it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences? (Note: the resulting unified preferences are not necessarily characterized by expected utility maximization.)
Let me trace the steps of my logic here. First take 2 agents in the same world who know each other’s source code. Clearly, each adopting a common set of preferences can be viewed as playing Cooperate in a one-shot PD. Now take an agent who has identified a counterfactual agent in another possible world (who has in turn identified it). Each agent should also adopt a common set of preferences, in the expectation that the other will do so as well. Either iterating this process, or by doing a single global trade across all agents in all possible worlds, we should arrive at a common set of preferences between everyone.
Hmm, maybe this is just what you meant by “one global decision”? Since my original interest was to figure out what probabilities mean in the context of indexical uncertainty, let me ask you, do probabilities have any role to play in your decision theory?
Agents don’t need to merge by changing anything in their individual preferences, merging is just a way of looking at the system, like in process algebra. Three agents can be considered as three separate agents cooperating with each other, or as two agents, one a merge of the first two of the original ones, or as one merged agent. All different perspectives on the same system, revealing its structure.
The crucial relation in this picture is that the global cooperation must be a Pareto improvement over cooperations (merges) among any subset of the agents. This is a possible origin for the structure of fair coopertive strategy. More than that, if each agent that could otherwise be considered as individual is divided in this manner on a set of elementary preferences, and all of these elementary preferences are then dumped together in the global cooperation, this may provide all the detail the precise choice of the fair cooperative strategy might need. The “weights” come from the control that each of the elementary agents has over the world.
Are you familiar with Cooperative Game Theory? I’m just learning it now, but it sounds very similar to what you’re talking about, and maybe you can reused some of its theory and math. (For some reason I’ve only paid attention to non-cooperative game theory until recently.) Here’s a quote from page 356 of “Handbook of Game Theory with Economic Applications, Vol 1”:
I couldn’t find anything that “clicked” with cooperation in PD. Above, I wasn’t talking about a kind of Nash equilibrium protected from coalition deviations. The correlated strategy needs to be a Pareto improvement over possible coalition strategies run by subsets of the agents, but it doesn’t need to be stable in any sense. It can be strictly dominated, for example, by either individual or coalition deviations.
A core in Cooperative Game Theory doesn’t have to be a Nash equilibrium. Take a PD game with payoffs (2,2) (-1,3) (3,-1) (0,0). In Cooperative Game Theory, (-1,3) and (3,-1) are not considered improvements that a player can make over (2,2) by acting for himself. Maybe one way to think about it is that there is an agreement phase, and an action phase, and the core is the set of agreements that no subset of players can improve upon by publicly going off (and forming their own agreement) during the agreement phase. Once an agreement is reached, there is no deviation allowed in the action phase.
Again, I’m just learning Cooperative Game Theory, but that’s my understanding and it seems to correspond exactly to your concept.
Sounds interesting, thank you.
The following is an honest non-rhetorical question: Is it not misleading to use the word ‘cooperation’ as you seem to be using it here? Don’t you still get ‘cooperation’ in this sense if the subsets of agents are not causally interacting with each other (say) but have still semi-Platonicly ‘merged’ via implicit logical interaction as compared to some wider context of decision algorithms that by logical necessity exhibit comparatively less merging? This sets up a situation where an agent can (even accidentally) engineer ‘Pareto improvements’ just by improving its decision algorithm (or more precisely replacing ‘its’ decision algorithm (everywhere ‘it’ is instantiated, of course...) with a new one that has the relevant properties of a new, possibly very different logical reference class). It’s a total bastardization of the concept of trade but it seems to be enough to result in some acausal economy (er, that is, some positive-affect-laden mysterious timeless attractor simultaneously constructed and instantiated by timeful interaction) or ‘global cooperation’ as you put it, and yet despite all that timeless interaction there are many ways it could turn out that would not look to our flawed timeful minds like cooperation. I don’t trust my intuitions about what ‘cooperation’ would look like at levels of organization or intelligence much different from my own, so I’m hesitant to use the word.
(I realize this is ‘debating definitions’ but connotations matter a lot when everything is so fuzzily abstract and yet somewhat affect-laden, I think. And anyway I’m not sure I’m actually debating definitions because I might be missing an important property of Pareto improvements that makes their application to agents that are logical-property-shifting-over-time not only a useless analogy but a confused one.)
This question is partially prompted by your post about the use of the word ‘blackmail’ as if it was technically clear and not just intuitively clear which interactions are blackmail, trade, cooperation, et cetera, outside of human social perception (which is of course probably correlated with more-objectively-correct-than-modern-human meta-ethical truths but definitely not precisely so).
If the above still looks like word salad to you… sigh please let me know so I can avoid pestering you ’til I’ve worked more on making my concepts and sentences clearer. (If it still looks way too much like word salad but you at least get the gist, that’d be good to know too.)
Yes, it’s better to just say that there is probably some acausal morally relevant interaction, wherein the agents work on their own goals.
(I don’t understand what you were saying about time/causality. I disagree with Nesov_2009′s treatment of preference as magical substance inherent in parts of things.)
It does, and I discussed that here. An interesting implication that I noticed a few weeks back is that an UFAI would want to cooperate with a counterfactual FAI, so we get a slice of the future even if we fail to build FAI, depending on how probable it was that we would be able to do that. A Paperclip maximizer might wipe out humanity, then catch up on its reflective consistency, look back, notice that there was a counterfactual future where a FAI is built, allot some of the collective preference to humanity, and restore it from the info remaining after the initial destruction (effectively constructing a FAI in the process). (I really should make a post on this. Some of the credit due to Rolf Nelson for UFAI deterrence idea.)
This seems fishy to me, given the vast space of possible preferences and the narrowness of the target. Assuming your idea of preference compromise as the convergent solution, what weighting might a reflective AI give to all of the other possible preference states, especially given the mutually exclusive nature of some preferences? If there’s any Occam prior involved at all, something horrifically complicated like human moral value just isn’t worth considering for Clippy.
Preferences get considered (loosely) based on probabilities with which AGIs possessing them could’ve been launched. There supposedly is a nontrivial chance of getting to FAI, so it’s a nontrivial portion of Paperlipper’s cooperation. FAI gets its share because of (justified efficacy of) our efforts for creating FAI, not because of being on some special metaphysical place, and even not because of the relation of its values to human origin, as humans in themselves claim no power.
Mind projection fallacy? How are these probabilities calculated, and on what prior information? Even if the AI can look back on the past and properly say in some sense that there was a such-and-this a probability of some FAI project succeeding, can’t it just the same look still further back and say there was such-and-that a probability of humanity never evolving in the first place? This just brings us back to the problem orthonormal mentions: our preferences are swamped by the vastness of the space of all possible counterfactual preferences.
You don’t care about counterfactual preferences, you only care about the bearers of these conterfactual preferences being willing to help you, in exchange for you helping them.
It might well be that prior to the first AGI, the info about the world is too sparse or scrambled to coordinate with counterfactual AGIs, for our AGI to discern what’s to be done for the others to improve the possible outcome for itself. Of those possibilities, most may remain averaged out to nothing specific. Only if the possibility of FAI is clear enough, will the trade take form, and sharing common history until recently is a help in getting the clear info.
I’d like to note a connection between Vladimir’s idea, and Robin Hanson’s moral philosophy, which also involves taking into account the wants of counterfactual agents.
I’m also reminded of Eliezer’s Three Worlds Collide story. If Vladimir’s right, many more worlds (in the sense of possible worlds) will be colliding (i.e., compromising/cooperating).
I look forward to seeing the technical details when they’ve been worked out.
Ok, so I see that probability plays a role in determining one’s “bargaining power”, which makes sense. We still need a rule that outputs a compromise set of preferences when given a set of agents, their probabilities, individual preferences, and resources as input, right? Does the rule need to be uniquely fair or obvious, so that everyone can agree to it without discussion? Do you have a suggestion for what this rule should be?
Edit: I see you’ve answered some of my questions already in the other reply. This is really interesting stuff!
I don’t get Counterfactual Mugging at all. Dissolve the problem thus: exactly which observer-moment do we, as problem-solvers, get to optimize mathematically? Best algorithm we can encode before learning the toss result: precommit to be “trustworthy”. Best algorithm we can encode after learning the toss result: keep the $100 and afterwards modify ourselves to be “trustworthy”—iff we expect similar encounters with Omega-like entities in the future with high enough expected utility. It’s pretty obvious that more information about the world allows us to encode a better algorithm. Is there anything more to it?
What’s observer-moment (more technically, as used here)? What does it mean to be “trustworthy”? (To be a cooperator? To fool Omega of being a cooperator?)
For keeping the $100: you are not the only source of info, you can’t really modify yourself like that, being only a human, and it’s specified that you don’t expect other encounters of this sort.
Whatever algorithm you can encode after you learn the toss result, you can encode before learning the toss result as well, by including it under the conditional clause, to be executed if the toss result matches the appropriate possibility.
More than that, whatever you do after you encounter the new info can be considered the execution of that conditional algorithm, already running in your mind, even if no deliberative effort for choosing it was made. By establishing an explicit conditional algorithm you are only optimizing the algorithm that is already in place, using that same algorithm, so it could be done after learning the info as well as before (well, not quite, but it’s unclear how significant is the effect of lack of reflective consistency when reconsidered under reflection).
Here’s a precise definition of “observer-moment”, “trustworthiness” and everything else you might care to want defined. But I will ask you for a favor in return...
Mathematical formulation 1: Please enter a program that prints “0” or “1″. If it prints “1” you lose $100, otherwise nothing happens.
Mathematical formulation 2: Please enter a program that prints “0” or “1″. If it prints “1” you gain $10000 or lose $100 with equal probability, otherwise nothing happens.
Philosophical formulation by Vladimir Nesov, Eliezer Yudkowsky et al: we ought to find some program that optimizes the variables in case 1 and case 2 simultaneously. It must, must, must exist! For grand reasons related to philosophy and AI!
Now the favor request: Vladimir, could you please go out of character just this once? Give me a mathematical formulation in the spirit of 1 and 2 that would show me that your and Eliezer’s theories have any nontrivial application whatsoever.
Vladimir, it’s work in progress; if I could state everything clearly, I would’ve written it up. It also seems that what is already written here and there informally on this subject is sufficient to communicate the idea, at least as problem statement.
Yes, that seems like an interesting way to think about your puzzle. Thanks for pointing out the connection. Have you considered what kind of decision theory would be needed to handle these violations of Independence?
Whole strategies need to be considered instead of individual actions, so that there is only one global decision, with individual actions selected as components of the overall calculation of the better global strategy. Indexical uncertainty becomes a constraint on strategy that requires actions to be equal in indistinguishable situations. More generally, different actions of the same agent can be regarded as separate actions of separate agents sharing the same preferences, who cooperate, exchanging info through the history of agent’s development that connects them (it). Even more generally, the same process should take care of cooperation of agents with different preferences. Even in that situation, the best global strategy will take into account (coordinate) all actions performed by all agents, including counterfactual ones (a benefit of reflective consistency enabling to perform calculations on the spot, not necessarily in advance).
So, expected utility (or some other order to that effect) is compared for the global strategies involving not just the agent, but all cooperating agents, and then the agent just plays its part in the selected global strategy. If the agent has a lot of info about where it is (low indexical uncertainty), then it’ll be able to perform a precisely targeted move within the global strategy, suited best for the place it’s in. The counterfactual and other-time/other-place counterparts of the agent will perform different moves for different details of the situation. Uncertainty (of any kind) limits the ability of the agent to custom-make its moves, so it must choose a single move targeted at the larger area of the territory over which it’s uncertain, instead of choosing different moves for each of its points, if it had the possible info discriminating among them.
I don’t believe that possible worlds can trade with each other, and I don’t see anything in Counterfactual Mugging to persuade me of that.
Expectation maximization is based on a model in which you inhabit a world state, and you have a set (possibly infinite) of possible future world states, and a probability (or point on a probability distribution) attached to each one. If you have interactions between your possible future states, you’re just not representing them correctly. The most you can say is that you are using some different model. You can’t say there’s a problem with the model, unless you demonstrate a situation your model can handle better than the standard model.
To answer the counterfactual mugging: You keep your $100. Because the game is over. You can’t gain money in another branch by giving up the $100. This is not a Newcomb-like situation.
Please provide a counterargument if you vote this down.
Consider two alternative possible worlds, forking from a common worldline with equal 50% probability. In one world, an agent A develops, and in another, an agent B. Agent A can either achieve U1 A-utilons or U2 B-utilons, U2>U1 (if A chooses to get U2 B-utilons, it produces 0 A-utilons). Agent B can either achieve U1 B-utilons, or U2 A-utilons. If each of them only thinks about itself, the outcome is U1 for A and U1 for B, that is not very much. If instead each of them optimizes the other-utility, both get U2. If this causes any troubles, shift the perspective to the point before the fork, and calculate expected utility for these strategies: first one has U1/2 in both A-utility and B-utility, while the second gives U2/2 utility for both, which is better.
It’s more efficient for them to produce utility for the other, which maps directly on the concept of trade. Counterfactual mugging explores exactly the same conceptual problems that you could get trying to accept the argument above. If you accept counterfactual mugging, you should accept the deal above as well. Of course, both agents must be capable of telling whether the other counterfactual agent is going to abide by the deal, which is Omega’s powers in CM.
Strategy one has U1/2 in both A-utility and B-utility with the additional property that the utility is in the correct fork where it can be used (i.e. it truly exists).
Strategy two has U2/2 in both A-utilty and B-utility but the additional property that the utility produced is not going to be usable in the fork where it is produced (i.e. the actual utility is really U0/2 unless the utility can be traded for the opposite utility which is actually usable in the same fork).
Assuming that there is no possibility of trade (since you describe no method by which it is possible):
I don’t see a requirement for trade existing in the counterfactual mugging problem so I accept it.
Since the above deal requires the possibility of trade to actually gain USABLE utility (arguably the only nonzero kind assuming that [PersonalUse OR Trade = Usability]) and I don’t see the possibility for trade, I am justified in rejecting the above deal despite accepting the counterfactual deal.
Utility is not instrumental, not used for something else, utility is the (abstract) thing you try to maximize, caring of nothing else. It’s the measure of success, all consequences taken into account (and is not itself “physical”). As such, it doesn’t matter in what way (or “where”) utility gets “produced”. Knowing that might be useful for the purpose of computing utility, but not for the purpose of interpreting the resulting amount, since utility is the final interpretation of the situation, the only one that matters.
Now, it might be that you consider events in the counterfactual worlds not valuable, but then it interrupts my argument a step earlier than you did, it makes incorrect the statement that A’s actions can produce B-utility. It could be that A can’t produce B-utility, but it can’t be that A produces B-utility but it doesn’t matter for B.
Hence the second paragraph about counterfactual mugging: if you accept that events in the counterfactual world can confer value, then you should take this deal as well. And no matter whether you accept CM or not, if you consider the problem in advance, you want to precommit to counterfactual trade. And hence, it’s a reflectively consistent thing to do to accept counterfactual trade later as well.
Fair enough. I’m willing to rephrase my argument as A can’t produce B utility because there is no B present in the world.
Yes, I do want to pre-commit to a counter-factual trade in the mugging because that is the cost of obtaining access to an offer of high expected utility (see my real-world rephrasing here for a more intuitive example case).
In the current world-splitting case, I see no utility for me since the opposing fork cannot produce it so there is no point to me pre-committing.
Why do you believe that the counterfactual isn’t valuable? You wrote:
That B is not present is a given possible world is not in itself a valid reason to morally ignore that possible world (there could be valid reasons, but B’s absence is not one of them for most preferences that are not specifically designed to make this condition hold, and for human-like morality in particular). For example, people clearly care about the (actual) world where they’ve died (not present): you won’t trade a penny a day while you live for eternal torture to everyone after you die (while you should, if you don’t care about the world where you are not present).
We seem to have differing assumptions:
My default is to assume that B utility cannot be produced in a different world UNLESS it is of utility in B’s world to produce the utility in another world. One method by which this is possible is trade between the two worlds (which was the source of my initial response).
Your assumption seems to be that B utility will always have value in a different world.
My default assumption is explicitly overridden for the case where I feel good (have utility in the world where I am present) when I care about the world where I am not present.
Your (assumed) blanket assumption has the counterexample that while I feel good when someone has sex with me in the world where I am present (alive), I do not feel good (I feel nothing—and am currently repulsed by the thought = NEGATIVE utility) when someone has sex with me in the world where I am dead (not present).
ACK. Wait a minute. I’m clearly confusing the action that produced B utility with B utility itself. Your problem formulation did explicitly include your assumption (which thereby makes it a premise).
OK. I think I now accept your argument so far. I have a vague feeling that you’ve carried the argument to places where the premise/assumption isn’t valid but that’s obviously the subject for another post.
(Interesting karma question. I’ve made a mistake. How interesting is that mistake to the community? In this case, I think that it was a non-obvious mistake (certainly for me without working it through ;-) that others have a reasonable probability of making on an interesting subject so it should be of interest. We’ll see whether the karma results validate my understanding.)
(Just to be sure, I expect this is exactly the point you’ve changed your mind about, so there is no need for me to argue.)
Does not compute. Utility can’t be “in given world” or “useful” or “useful from a given world”. Utility is a measure of stuff, not stuff itself. Measure has no location.
Not if we interpret “utility” as meaning “valuable stuff”. It’s not generally correct that the same stuff is equally valuable in all possible worlds. If in worlds of both agents A and B we can produce stuff X and Y, it might well be that producing X is world A has more B-utility than producing Y in world A, but producing X in world B has less B-utility than producing Y in world B. At the same time, given amount of B-utility is equally valuable, no matter where the stuff measured so got produced.
Yes. I agree fully with the above post.
But can certainly be location dependent. Measure doesn’t have to be translation invariant. Hyperbolic discounting, for instance.
You’re presenting a standard PD, only distributed across possible worlds. Doesn’t seem to be any difference between splitting into 2 possible worlds, and taking 2 prisoners into 2 different cells. So you would need to provide a solution, a mechanism for cooperation, that would also work for the PD. And you haven’t.
Don’t know what you mean by “accept counterfactual mugging”. Especially since I just said I don’t agree with your interpretation of it. I believe the counterfactual mugging is also just a rephrasing of the PD. You should keep the $100 unless you would cooperate in a one-shot PD. We all know that rational agents would do better by cooperating, but that doesn’t make it happen.
That was the answer to the original edition of your question, that asked what does counterfactual mugging has to do with the argument for trade between possible worlds. I presented more or less a direct reduction in the comment above.