The most glaring problem seems to be how it could deduce the goals of other AIs. It either implies the existence of some sort of universal goal system, or allows information to propagate faster than c.
What I had in mind was that each of the AIs would come up with a distribution over the kinds of civilizations which are likely to arise in the universe by predicting the kinds of planets out there (which is presumably something you can do since even we have models for this) and figuring out different potential evolutions for life that arises on those planets. Does that make sense?
I was going to respond saying I didn’t think that would work as a method, but now I’m not so sure.
My counterargument would be to suggest that there’s no goal system which can’t arbitrarily come about as a Fisherian Runaway, and that our AI’s acausal trade partners could be working on pretty much any optimisation criteria whatsoever. Thinking about it a bit more, I’m not entirely sure the Fisherian Runaway argument is all that robust. There is, for example, presumably no Fisherian Runaway goal of immediate self-annihilation.
If there’s some sort of structure to the space of possible goal systems, there may very well be a universally derivable distribution of goals our AI could find, and share with all its interstellar brethren. But there would need to be a lot of structure to it before it could start acting on their behalf, because otherwise the space would still be huge, and the probability of any given goal system would be dwarfed by the evidence of the goal system of its native civilisation.
There’s a plot for a Ctrhulhonic horror tale lurking in here, whereby humanity creates an AI, which proceeds to deduce a universal goal preference for eliminating civilisations like humanity. Incomprehensible alien minds from the stars, psychically sharing horrible secrets written into the fabric of the universe.
Except for the eliminating humans part, the Ctrhulhonic outcome seems almost like the default. We build AI, proving that it implements out reflectively stable wishes and then it still proceeds to do almost pay very little attention to what we thought we wanted.
One thing that might push back in the opposite direction is that if humans have heavily path dependent preferences (which seems pretty plausible) or selfish wrt currently existing humans in some way then an AI built for our wishes might not be willing to trade much humanity away in exchange for resources far away.
The Cthulhonic outcome is only the case if there are identifiable points in the space of possible goal systems to which the AI can assign enough probability to make them credible acausal trade partners. Whether those identifiable points exist is not clear or obvious.
When it ruminates over possible varieties of sapient life in the universe, it would need to find clusters of goals that were (a) non-universal, (b) specific enough to actually act upon, and (c) so probabilistically dense that they didn’t vanish into obscurity against humanity’s preferences, which it possesses direct observational evidence for.
Whether those clusters exist, and if they do, whether they can be deduced a priori by sitting in a darkened room and thinking really hard, does not seem obvious either way. Intuitively, thinking about trying to draw specific conclusions from extremely dilute evidence, I’m inclined to think they can’t, but I’m not prepared to inject that belief with a super amount of confidence, as I may very well think differently if I were a billion times smarter.
I think what matters is not so much the probability of goal clusters, but something like the expectation of the amount of resources that AIs that have a particular goal cluster have access to. An AI might think that some specific goal cluster only has a 1:1000 chance of occurring anywhere, but if it does then there are probably a million instances of it. I think this is the same as being certain that there are 1,000 (1million/1,000) AIs with that goal cluster. Which seems like enough to ‘dilute’ the preferences of any given AI.
If the universe is pretty big then it seems like it would be pretty easy to get large expectations even with low probabilities. (let me know if I’m not making sense)
The “million instances” is the size of the cluster, and yes, that would impact its weight, but I think it’s arithmetically erroneous to suggest the density matters more than the probability. It depends entirely on what those densities and probabilities are, and you’re just plucking numbers straight out of the air. Why not go the whole hog and suggest a goal cluster that happens nine times out of ten, with a gajillion instances?
I believe the salient questions are:
Do such clusters even exist? Can they be inferred from a poverty of evidence just by thinking about possible agents that may or may not arise in our universe with enough confidence to actually act upon? This boils down to whether, if I’m smart enough, I can sit in an empty room, think “what if...” about examples of something I’ve never seen before from an enormous space of possibilities, and come up with an accurate collection of properties for those things, weighted by probability. There are some things we can do that with and some things we can’t. What category do alien goal systems fall into?
If they do exist, will they be specific enough for an AI to act upon? Even if it does deduce some inscrutable set of alien factors that we can’t make sense of, will they be coherent? Humans care a lot about methods of governance, the moral status of unborn children, and who people should and shouldn’t have sex with, but they don’t agree on these things.
If they do exist, are there going to be many disparate clusters, or will they converge? If they do converge, how relatively far away from the median is humanity? If they’re disparate, are they completely disjoint goals, or are they overlap and/or conflict with each other? More to the point, are they going to overlap and/or conflict with us?
I can’t say how much we’d need to worry about a superintelligent TDT-agent implementing alien goals. That’s a fact about the universe for which I don’t have a lot of evidence However there’s more than enough uncertainty surrounding the question for me to not lose any sleep over it.
The most glaring problem seems to be how it could deduce the goals of other AIs. It either implies the existence of some sort of universal goal system, or allows information to propagate faster than c.
What I had in mind was that each of the AIs would come up with a distribution over the kinds of civilizations which are likely to arise in the universe by predicting the kinds of planets out there (which is presumably something you can do since even we have models for this) and figuring out different potential evolutions for life that arises on those planets. Does that make sense?
I was going to respond saying I didn’t think that would work as a method, but now I’m not so sure.
My counterargument would be to suggest that there’s no goal system which can’t arbitrarily come about as a Fisherian Runaway, and that our AI’s acausal trade partners could be working on pretty much any optimisation criteria whatsoever. Thinking about it a bit more, I’m not entirely sure the Fisherian Runaway argument is all that robust. There is, for example, presumably no Fisherian Runaway goal of immediate self-annihilation.
If there’s some sort of structure to the space of possible goal systems, there may very well be a universally derivable distribution of goals our AI could find, and share with all its interstellar brethren. But there would need to be a lot of structure to it before it could start acting on their behalf, because otherwise the space would still be huge, and the probability of any given goal system would be dwarfed by the evidence of the goal system of its native civilisation.
There’s a plot for a Ctrhulhonic horror tale lurking in here, whereby humanity creates an AI, which proceeds to deduce a universal goal preference for eliminating civilisations like humanity. Incomprehensible alien minds from the stars, psychically sharing horrible secrets written into the fabric of the universe.
Except for the eliminating humans part, the Ctrhulhonic outcome seems almost like the default. We build AI, proving that it implements out reflectively stable wishes and then it still proceeds to do almost pay very little attention to what we thought we wanted.
One thing that might push back in the opposite direction is that if humans have heavily path dependent preferences (which seems pretty plausible) or selfish wrt currently existing humans in some way then an AI built for our wishes might not be willing to trade much humanity away in exchange for resources far away.
The Cthulhonic outcome is only the case if there are identifiable points in the space of possible goal systems to which the AI can assign enough probability to make them credible acausal trade partners. Whether those identifiable points exist is not clear or obvious.
When it ruminates over possible varieties of sapient life in the universe, it would need to find clusters of goals that were (a) non-universal, (b) specific enough to actually act upon, and (c) so probabilistically dense that they didn’t vanish into obscurity against humanity’s preferences, which it possesses direct observational evidence for.
Whether those clusters exist, and if they do, whether they can be deduced a priori by sitting in a darkened room and thinking really hard, does not seem obvious either way. Intuitively, thinking about trying to draw specific conclusions from extremely dilute evidence, I’m inclined to think they can’t, but I’m not prepared to inject that belief with a super amount of confidence, as I may very well think differently if I were a billion times smarter.
I think what matters is not so much the probability of goal clusters, but something like the expectation of the amount of resources that AIs that have a particular goal cluster have access to. An AI might think that some specific goal cluster only has a 1:1000 chance of occurring anywhere, but if it does then there are probably a million instances of it. I think this is the same as being certain that there are 1,000 (1million/1,000) AIs with that goal cluster. Which seems like enough to ‘dilute’ the preferences of any given AI.
If the universe is pretty big then it seems like it would be pretty easy to get large expectations even with low probabilities. (let me know if I’m not making sense)
The “million instances” is the size of the cluster, and yes, that would impact its weight, but I think it’s arithmetically erroneous to suggest the density matters more than the probability. It depends entirely on what those densities and probabilities are, and you’re just plucking numbers straight out of the air. Why not go the whole hog and suggest a goal cluster that happens nine times out of ten, with a gajillion instances?
I believe the salient questions are:
Do such clusters even exist? Can they be inferred from a poverty of evidence just by thinking about possible agents that may or may not arise in our universe with enough confidence to actually act upon? This boils down to whether, if I’m smart enough, I can sit in an empty room, think “what if...” about examples of something I’ve never seen before from an enormous space of possibilities, and come up with an accurate collection of properties for those things, weighted by probability. There are some things we can do that with and some things we can’t. What category do alien goal systems fall into?
If they do exist, will they be specific enough for an AI to act upon? Even if it does deduce some inscrutable set of alien factors that we can’t make sense of, will they be coherent? Humans care a lot about methods of governance, the moral status of unborn children, and who people should and shouldn’t have sex with, but they don’t agree on these things.
If they do exist, are there going to be many disparate clusters, or will they converge? If they do converge, how relatively far away from the median is humanity? If they’re disparate, are they completely disjoint goals, or are they overlap and/or conflict with each other? More to the point, are they going to overlap and/or conflict with us?
I can’t say how much we’d need to worry about a superintelligent TDT-agent implementing alien goals. That’s a fact about the universe for which I don’t have a lot of evidence However there’s more than enough uncertainty surrounding the question for me to not lose any sleep over it.