(btw, I couldn’t find a good link for acausal trade introduction discussion; I would be grateful for one)
We discussed this at a LW Seattle meetup. It seems like the following is an argument for why all AIs with a decision theory that does acausal trade act as if they have the same utility function. That’s a surprising conclusion to me which I hadn’t seen before, but also doesn’t seem too hard to come up with, so I’m curious where I’ve gone off the rails. This argument has a very Will_Newsomey flavor to it to me.
Lets say we’re in a big universe with many many chances for intelligent life, but most of them are so far apart that they will never meet eachother. Lets also say that UDT/TDT-like decision theories are are in some sense the obviously correct decision theory to follow, so that many civilizations, when they build an AI, they use something like UDT/TDT. At their inception, these AIs will have very different goals since since the civilizations that built them would have very different evolutionary histories.
If many of these AIs can observe that the universe is such that there will be other UDT/TDT AIs out there with different goals then each AI trade acausally with the AIs it thinks will be out there. Presumably each AI will have to study the universe and figure out a probability distribution for the goals of those AIs. Since the universe is large, each AI will expect many other AIs to be out there and thus bargain away most of its influence over its local area. Thus, the starting goals of each AI will only have a minor influence on what it does; each AI will act as if it has some combined utility function.
In a situation of “causal trade”, does everyone end up with the same utility function?
The Coase theorem does imply that perfect bargaining will lead agents to maximize a single welfare function. (This is what it means for the outcome to be “efficient”.) Of course, the welfare function will depend on the agents’ relative endowments (roughly, “wealth” or bargaining power).
(Also remember that humans have to “simulate” each other using logic-like prior information even in the straightforward efficient-causal scenario—it would be prohibitively expensive for humans to re-derive all possible pooling equilibria &c. from scratch for each and every overlapping set of sense data. “Acausal” economics is just an edge case of normal economics.)
The most glaring problem seems to be how it could deduce the goals of other AIs. It either implies the existence of some sort of universal goal system, or allows information to propagate faster than c.
What I had in mind was that each of the AIs would come up with a distribution over the kinds of civilizations which are likely to arise in the universe by predicting the kinds of planets out there (which is presumably something you can do since even we have models for this) and figuring out different potential evolutions for life that arises on those planets. Does that make sense?
I was going to respond saying I didn’t think that would work as a method, but now I’m not so sure.
My counterargument would be to suggest that there’s no goal system which can’t arbitrarily come about as a Fisherian Runaway, and that our AI’s acausal trade partners could be working on pretty much any optimisation criteria whatsoever. Thinking about it a bit more, I’m not entirely sure the Fisherian Runaway argument is all that robust. There is, for example, presumably no Fisherian Runaway goal of immediate self-annihilation.
If there’s some sort of structure to the space of possible goal systems, there may very well be a universally derivable distribution of goals our AI could find, and share with all its interstellar brethren. But there would need to be a lot of structure to it before it could start acting on their behalf, because otherwise the space would still be huge, and the probability of any given goal system would be dwarfed by the evidence of the goal system of its native civilisation.
There’s a plot for a Ctrhulhonic horror tale lurking in here, whereby humanity creates an AI, which proceeds to deduce a universal goal preference for eliminating civilisations like humanity. Incomprehensible alien minds from the stars, psychically sharing horrible secrets written into the fabric of the universe.
Except for the eliminating humans part, the Ctrhulhonic outcome seems almost like the default. We build AI, proving that it implements out reflectively stable wishes and then it still proceeds to do almost pay very little attention to what we thought we wanted.
One thing that might push back in the opposite direction is that if humans have heavily path dependent preferences (which seems pretty plausible) or selfish wrt currently existing humans in some way then an AI built for our wishes might not be willing to trade much humanity away in exchange for resources far away.
The Cthulhonic outcome is only the case if there are identifiable points in the space of possible goal systems to which the AI can assign enough probability to make them credible acausal trade partners. Whether those identifiable points exist is not clear or obvious.
When it ruminates over possible varieties of sapient life in the universe, it would need to find clusters of goals that were (a) non-universal, (b) specific enough to actually act upon, and (c) so probabilistically dense that they didn’t vanish into obscurity against humanity’s preferences, which it possesses direct observational evidence for.
Whether those clusters exist, and if they do, whether they can be deduced a priori by sitting in a darkened room and thinking really hard, does not seem obvious either way. Intuitively, thinking about trying to draw specific conclusions from extremely dilute evidence, I’m inclined to think they can’t, but I’m not prepared to inject that belief with a super amount of confidence, as I may very well think differently if I were a billion times smarter.
I think what matters is not so much the probability of goal clusters, but something like the expectation of the amount of resources that AIs that have a particular goal cluster have access to. An AI might think that some specific goal cluster only has a 1:1000 chance of occurring anywhere, but if it does then there are probably a million instances of it. I think this is the same as being certain that there are 1,000 (1million/1,000) AIs with that goal cluster. Which seems like enough to ‘dilute’ the preferences of any given AI.
If the universe is pretty big then it seems like it would be pretty easy to get large expectations even with low probabilities. (let me know if I’m not making sense)
The “million instances” is the size of the cluster, and yes, that would impact its weight, but I think it’s arithmetically erroneous to suggest the density matters more than the probability. It depends entirely on what those densities and probabilities are, and you’re just plucking numbers straight out of the air. Why not go the whole hog and suggest a goal cluster that happens nine times out of ten, with a gajillion instances?
I believe the salient questions are:
Do such clusters even exist? Can they be inferred from a poverty of evidence just by thinking about possible agents that may or may not arise in our universe with enough confidence to actually act upon? This boils down to whether, if I’m smart enough, I can sit in an empty room, think “what if...” about examples of something I’ve never seen before from an enormous space of possibilities, and come up with an accurate collection of properties for those things, weighted by probability. There are some things we can do that with and some things we can’t. What category do alien goal systems fall into?
If they do exist, will they be specific enough for an AI to act upon? Even if it does deduce some inscrutable set of alien factors that we can’t make sense of, will they be coherent? Humans care a lot about methods of governance, the moral status of unborn children, and who people should and shouldn’t have sex with, but they don’t agree on these things.
If they do exist, are there going to be many disparate clusters, or will they converge? If they do converge, how relatively far away from the median is humanity? If they’re disparate, are they completely disjoint goals, or are they overlap and/or conflict with each other? More to the point, are they going to overlap and/or conflict with us?
I can’t say how much we’d need to worry about a superintelligent TDT-agent implementing alien goals. That’s a fact about the universe for which I don’t have a lot of evidence However there’s more than enough uncertainty surrounding the question for me to not lose any sleep over it.
One problem is that, in order to actually get specific about utility functions, the AI would have to simulate another AI that is simulating it—that’s like trying to put a manhole cover through its own manhole by putting it a box first.
If we assume that the computation problems are solved, a toy model involving robots laying different colors of tile might be interesting to consider. In fact there’s probably a post in there. The effects will be different sizes for different classes of utility functions over tiles. In the case of infinity robots with cosmopolitan utility functions, you do get an interesting sort of agreement though.
This outcome is bad because bargaining away influence over the AI’s local area in exchange for a small amount of control over the global utility function is a poor trade. But in that case, it’s also a poor acausal trade.
A more reasonable acausal trade to make with other AIs would be to trade away influence over faraway places. After all, other AIs presumably care about those places more than our AI does, so this is a trade that’s actually beneficial to both parties. It’s even a marginally reasonable thing to do acausally.
Of course, this means that our AI isn’t allowed to help the Babyeaters stop eating their babies, in accordance with its acausal agreement with the AI the Babyeaters could have made. But it also means that the Superhappy AI isn’t allowed to help us become free of pain, because of its acausal agreement with our AI. Ideally, this would hold even if we didn’t make an AI yet.
This outcome is bad because bargaining away influence over the AI’s local area in exchange for a small amount of control over the global utility function is a poor trade. But in that case, it’s also a poor acausal trade.
I agree with your logic, but why do you say it’s a bad trade? At first it seemed absurd to me, but after thinking about it I’m able to feel that it’s the best possible outcome. Do you have more specific reasons why it’s bad?
At best it means that the AI shapes our civilization into some sort of twisted extrapolation of what other alien races might like. In the worst case, it ends up calculating a high probability of existence for Evil Abhorrent Alien Race #176 which is in every way antithetical to the human race, and the acausal trade that it makes is that it wipes out the human race (satisfying #176′s desires) so that if the #176 make an AI, that AI will wipe out their race as well (satisfying human desires, since you wouldn’t believe the terrible, inhuman monstrous things those #176s were up to).
That’s a surprising conclusion to me which I hadn’t seen before, but also doesn’t seem too hard to come up with, so I’m curious where I’ve gone off the rails. This argument has a very Will_Newsomey flavor to it to me.
Perhaps it is not wise to speculate out loud in this area until you’ve worked through three rounds of “ok, so what are the implications of that idea” and decided that it would help people to hear about the conclusions you’ve developed three steps back. You can frequently find interesting things when you wander around, but there are certain neighborhoods you should not explore with children along for the ride until you’ve been there before and made sure its reasonably safe.
Not just going meta for the sake of it: I assert you have not sufficiently thought throught the implications of promoting that sort of non-openness publicly on the board. Perhaps you could PM jsavaltier.
I’m lying, of course. But interesting to register points of strongest divergence between LW and conventional morality (JenniferRM’s post, I mean; jsalvatier’s is fine and interesting).
A question about acausal trade
(btw, I couldn’t find a good link for acausal trade introduction discussion; I would be grateful for one)
We discussed this at a LW Seattle meetup. It seems like the following is an argument for why all AIs with a decision theory that does acausal trade act as if they have the same utility function. That’s a surprising conclusion to me which I hadn’t seen before, but also doesn’t seem too hard to come up with, so I’m curious where I’ve gone off the rails. This argument has a very Will_Newsomey flavor to it to me.
Lets say we’re in a big universe with many many chances for intelligent life, but most of them are so far apart that they will never meet eachother. Lets also say that UDT/TDT-like decision theories are are in some sense the obviously correct decision theory to follow, so that many civilizations, when they build an AI, they use something like UDT/TDT. At their inception, these AIs will have very different goals since since the civilizations that built them would have very different evolutionary histories.
If many of these AIs can observe that the universe is such that there will be other UDT/TDT AIs out there with different goals then each AI trade acausally with the AIs it thinks will be out there. Presumably each AI will have to study the universe and figure out a probability distribution for the goals of those AIs. Since the universe is large, each AI will expect many other AIs to be out there and thus bargain away most of its influence over its local area. Thus, the starting goals of each AI will only have a minor influence on what it does; each AI will act as if it has some combined utility function.
What are the problems with this idea?
Substitute the word causal for acausal. In a situation of “causal trade”, does everyone end up with the same utility function?
The Coase theorem does imply that perfect bargaining will lead agents to maximize a single welfare function. (This is what it means for the outcome to be “efficient”.) Of course, the welfare function will depend on the agents’ relative endowments (roughly, “wealth” or bargaining power).
(Also remember that humans have to “simulate” each other using logic-like prior information even in the straightforward efficient-causal scenario—it would be prohibitively expensive for humans to re-derive all possible pooling equilibria &c. from scratch for each and every overlapping set of sense data. “Acausal” economics is just an edge case of normal economics.)
Unrelated question: Do you think it’d be fair to say that physics is the intersection of metaphysics and phenomenology?
The most glaring problem seems to be how it could deduce the goals of other AIs. It either implies the existence of some sort of universal goal system, or allows information to propagate faster than c.
What I had in mind was that each of the AIs would come up with a distribution over the kinds of civilizations which are likely to arise in the universe by predicting the kinds of planets out there (which is presumably something you can do since even we have models for this) and figuring out different potential evolutions for life that arises on those planets. Does that make sense?
I was going to respond saying I didn’t think that would work as a method, but now I’m not so sure.
My counterargument would be to suggest that there’s no goal system which can’t arbitrarily come about as a Fisherian Runaway, and that our AI’s acausal trade partners could be working on pretty much any optimisation criteria whatsoever. Thinking about it a bit more, I’m not entirely sure the Fisherian Runaway argument is all that robust. There is, for example, presumably no Fisherian Runaway goal of immediate self-annihilation.
If there’s some sort of structure to the space of possible goal systems, there may very well be a universally derivable distribution of goals our AI could find, and share with all its interstellar brethren. But there would need to be a lot of structure to it before it could start acting on their behalf, because otherwise the space would still be huge, and the probability of any given goal system would be dwarfed by the evidence of the goal system of its native civilisation.
There’s a plot for a Ctrhulhonic horror tale lurking in here, whereby humanity creates an AI, which proceeds to deduce a universal goal preference for eliminating civilisations like humanity. Incomprehensible alien minds from the stars, psychically sharing horrible secrets written into the fabric of the universe.
Except for the eliminating humans part, the Ctrhulhonic outcome seems almost like the default. We build AI, proving that it implements out reflectively stable wishes and then it still proceeds to do almost pay very little attention to what we thought we wanted.
One thing that might push back in the opposite direction is that if humans have heavily path dependent preferences (which seems pretty plausible) or selfish wrt currently existing humans in some way then an AI built for our wishes might not be willing to trade much humanity away in exchange for resources far away.
The Cthulhonic outcome is only the case if there are identifiable points in the space of possible goal systems to which the AI can assign enough probability to make them credible acausal trade partners. Whether those identifiable points exist is not clear or obvious.
When it ruminates over possible varieties of sapient life in the universe, it would need to find clusters of goals that were (a) non-universal, (b) specific enough to actually act upon, and (c) so probabilistically dense that they didn’t vanish into obscurity against humanity’s preferences, which it possesses direct observational evidence for.
Whether those clusters exist, and if they do, whether they can be deduced a priori by sitting in a darkened room and thinking really hard, does not seem obvious either way. Intuitively, thinking about trying to draw specific conclusions from extremely dilute evidence, I’m inclined to think they can’t, but I’m not prepared to inject that belief with a super amount of confidence, as I may very well think differently if I were a billion times smarter.
I think what matters is not so much the probability of goal clusters, but something like the expectation of the amount of resources that AIs that have a particular goal cluster have access to. An AI might think that some specific goal cluster only has a 1:1000 chance of occurring anywhere, but if it does then there are probably a million instances of it. I think this is the same as being certain that there are 1,000 (1million/1,000) AIs with that goal cluster. Which seems like enough to ‘dilute’ the preferences of any given AI.
If the universe is pretty big then it seems like it would be pretty easy to get large expectations even with low probabilities. (let me know if I’m not making sense)
The “million instances” is the size of the cluster, and yes, that would impact its weight, but I think it’s arithmetically erroneous to suggest the density matters more than the probability. It depends entirely on what those densities and probabilities are, and you’re just plucking numbers straight out of the air. Why not go the whole hog and suggest a goal cluster that happens nine times out of ten, with a gajillion instances?
I believe the salient questions are:
Do such clusters even exist? Can they be inferred from a poverty of evidence just by thinking about possible agents that may or may not arise in our universe with enough confidence to actually act upon? This boils down to whether, if I’m smart enough, I can sit in an empty room, think “what if...” about examples of something I’ve never seen before from an enormous space of possibilities, and come up with an accurate collection of properties for those things, weighted by probability. There are some things we can do that with and some things we can’t. What category do alien goal systems fall into?
If they do exist, will they be specific enough for an AI to act upon? Even if it does deduce some inscrutable set of alien factors that we can’t make sense of, will they be coherent? Humans care a lot about methods of governance, the moral status of unborn children, and who people should and shouldn’t have sex with, but they don’t agree on these things.
If they do exist, are there going to be many disparate clusters, or will they converge? If they do converge, how relatively far away from the median is humanity? If they’re disparate, are they completely disjoint goals, or are they overlap and/or conflict with each other? More to the point, are they going to overlap and/or conflict with us?
I can’t say how much we’d need to worry about a superintelligent TDT-agent implementing alien goals. That’s a fact about the universe for which I don’t have a lot of evidence However there’s more than enough uncertainty surrounding the question for me to not lose any sleep over it.
One problem is that, in order to actually get specific about utility functions, the AI would have to simulate another AI that is simulating it—that’s like trying to put a manhole cover through its own manhole by putting it a box first.
If we assume that the computation problems are solved, a toy model involving robots laying different colors of tile might be interesting to consider. In fact there’s probably a post in there. The effects will be different sizes for different classes of utility functions over tiles. In the case of infinity robots with cosmopolitan utility functions, you do get an interesting sort of agreement though.
This outcome is bad because bargaining away influence over the AI’s local area in exchange for a small amount of control over the global utility function is a poor trade. But in that case, it’s also a poor acausal trade.
A more reasonable acausal trade to make with other AIs would be to trade away influence over faraway places. After all, other AIs presumably care about those places more than our AI does, so this is a trade that’s actually beneficial to both parties. It’s even a marginally reasonable thing to do acausally.
Of course, this means that our AI isn’t allowed to help the Babyeaters stop eating their babies, in accordance with its acausal agreement with the AI the Babyeaters could have made. But it also means that the Superhappy AI isn’t allowed to help us become free of pain, because of its acausal agreement with our AI. Ideally, this would hold even if we didn’t make an AI yet.
I agree with your logic, but why do you say it’s a bad trade? At first it seemed absurd to me, but after thinking about it I’m able to feel that it’s the best possible outcome. Do you have more specific reasons why it’s bad?
At best it means that the AI shapes our civilization into some sort of twisted extrapolation of what other alien races might like. In the worst case, it ends up calculating a high probability of existence for Evil Abhorrent Alien Race #176 which is in every way antithetical to the human race, and the acausal trade that it makes is that it wipes out the human race (satisfying #176′s desires) so that if the #176 make an AI, that AI will wipe out their race as well (satisfying human desires, since you wouldn’t believe the terrible, inhuman monstrous things those #176s were up to).
Perhaps it is not wise to speculate out loud in this area until you’ve worked through three rounds of “ok, so what are the implications of that idea” and decided that it would help people to hear about the conclusions you’ve developed three steps back. You can frequently find interesting things when you wander around, but there are certain neighborhoods you should not explore with children along for the ride until you’ve been there before and made sure its reasonably safe.
Perhaps you could send a PM to Will?
Not just going meta for the sake of it: I assert you have not sufficiently thought throught the implications of promoting that sort of non-openness publicly on the board. Perhaps you could PM jsavaltier.
I’m lying, of course. But interesting to register points of strongest divergence between LW and conventional morality (JenniferRM’s post, I mean; jsalvatier’s is fine and interesting).