Actually, I’d argue the main problem with “Siren Worlds” is the assumption that you can “envision”, or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.
That kind of computing power would require, well, something like the mass of a whole country/planet/galaxy and then some. Even if we generously assume a very low fidelity of simulation, comparable with mere weather simulations or even mere video games, we’re still talking whole server/compute farms being turned towards nothing but the task of pretending to possess a magical crystal ball for no sensible reason.
tl;dr: human values are already quite fragile and vulnerable to human-generated siren worlds.
Simulation complexity has not stopped humans from implementing totalitarian dictatorships (based on divine right of kings, fundamentalism, communism, fascism, people’s democracy, what-have-you) due to envisioning a siren world that is ultimately unrealistic.
It doesn’t require detailed simulation of a physical world, it only requires sufficient simulation of human desires, biases, blind spots, etc. that can lead people to abandon previously held values because they believe the siren world values will be necessary and sufficient to achieve what the siren world shows them. It exploits a flaw in human reasoning, not a flaw in accurate physical simulation.
That’s shifting the definition of “siren world” from “something which looks very nice when simulated in high-resolution but has things horrendously wrong on the inside” to a very standard “Human beings imagine things in low-resolution and don’t always think them out clearly.”
You don’t need to pour extra Lovecraft Sauce on your existing irrationalities just for your enjoyment of Lovecraft Sauce.
It depends a lot on how the world is being shown. If the AI is your “guide”, it can show you the seductive features of the world, or choose the fidelity of the simulation in just the right ways in the right places, etc… Without needing a full fledged simulation. You can have a siren world in text, just through the AI’s (technically accurate) descriptions, given your questions.
You’re missing my point, which is that proposing you’ve got “an AI” (with no dissolved understanding of how the thing actually works underneath what you’d get from a Greg Egan novel) which “simulates” possible “worlds” is already engaging in several layers of magical thinking, and you shouldn’t be surprised to draw silly conclusions from magical thinking.
I think I’m not getting your point either. Isn’t Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won’t be making decisions like this?
Isn’t Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won’t be making decisions like this?
While I do think that real AIs won’t make decisions in this fashion, that aside, as I had understood Stuart’s article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which “the AI” was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties, but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
The “But also...” part is the bit I actually object to.
Let’s focus on a simple version, without the metaphors. We’re talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.
So what is happening is that various possible future worlds will be considered by the AI according to its desirability criteria, these worlds will be described to humans according to its description criteria, and humans will choose according to whatever criteria we use. So we have a combination of criteria that result in a final decision. A siren world is a world that ranks very high in these combined criteria but is actually nasty.
If we stick to that scenario and assume the AI is truthful, the main siren world generator is the ability of the AI to describe them in ways that sound very attractive to humans. Since human beliefs and preferences are not clearly distinct., this ranges from misleading (incorrect human beliefs) to actively seductive (influencing human preferences to favour these worlds).
The higher the bandwidth the AI has, the more chance it has of “seduction”, or of exploiting known or unknown human irrationalities (again, there’s often no clear distinction between exploiting irrationalities for beliefs or preferences).
One scenario—Paul Christiano’s—is a bit different but has essentially unlimited bandwidth (or, more precisely, has an AI estimating the result of a setup that has essentially unlimited bandwidth).
but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
This category can include irrationalities we don’t yet know about, better exploitation of irrationalities we do know about, and a host of speculative scenarios about hacking the human brain, which I don’t want to rule out completely at this stage.
We’re talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.
No. We’re not. That’s dumb. Like, sorry to be spiteful, but that is already a bad move. You do not treat any scenario involving “an AI”, without dissolving the concept, as desirable or realistic. You have “an AI”, without having either removed its “an AI”-ness (in the LW sense of “an AI”) entirely or guaranteed Friendliness? You’re already dead.
Can we assume, that since I’ve been working all this time on AI safety, that I’m not an idiot? When presenting a scenario (“assume AI contained, and truthful”) I’m investigating whether we have safety within the terms of that scenario. Which here we don’t, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.
Ah. Then here’s the difference in assumptions: I don’t believe a contained, truthful UFAI is safe in the first place. I just have an incredibly low prior on that. So low, in fact, that I didn’t think anyone would take it seriously enough to imagine scenarios which prove it’s unsafe, because it’s just so bloody obvious that you do not build UFAI for any reason, because it will go wrong in some way you didn’t plan for.
See the point on Paul Christiano’s design. The problem I discussed applies not only to UFAIs but to other designs that seek to get round it, but use potentially unrestricted search.
I’m puzzled. Are you sure that’s your main objection? Because,
you make a different objection (I think) in your response to the sibling, and
it seems to me that since any simulation of this kind will be incomplete, and I assume the AI will seek the most efficient way to achieve its programmed goals, the scenario you describe is in fact horribly dangerous; the AI has an incentive to deceive us. (And somewhat like Wei Dai, I thought we were really talking about an AI goal system that talks about extrapolating human responses to various futures.)
It would be completely unfair of me to focus on the line, “as thorough as a film might be today”. But since it’s funny, I give you Cracked.com on Independence Day.
as I had understood Stuart’s article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which “the AI” was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties
It’s true that Stuart wrote about Oracle AI in his Siren worlds post, but I thought that was mostly just to explain the idea of what a Siren world is. Later on in the post he talks about how Paul Christiano’s take on indirect normativity has a similar problem. Basically the problem can occur if an AI tries to model a human as accurately as possible, then uses the model directly as its utility function and tries to find a feasible future world that maximizes the utility function.
It seems plausible that even if the AI couldn’t produce a high resolution simulation of a Siren world W, it could still infer (using various approximations and heuristics) that with high probability its utility function assigns a high score to W, and choose to realize W on that basis. It also seems plausible that an AI eventually would have enough computing power to produce high resolution simulations of Siren worlds, e.g., after it has colonized the galaxy, so the problem could happen at that point if not before.
but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
What extra-scary features are you referring to? (Possibly I skipped over the parts you found objectionable since I was already familiar with the basic issue and didn’t read Stuart’s post super carefully.)
Are you arguing that real AIs won’t be making decisions like this?
Yes. I think that probabilistic backwards chaining, aka “planning as inference”, is the more realistic way to plan, and better represented in the current literature.
Actually, I’d argue the main problem with “Siren Worlds” is the assumption that you can “envision”, or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.
That’s not needed for a siren world. Putting human brains into vats and stimulating their pleasure centers doesn’t require much computing power.
Wireheading isn’t a siren world, though. The point of the concept is that it looks like what we want, when we look at it from the outside, but actually, on the inside, something is very wrong. Example: a world full of people who are always smiling and singing about happiness because they will be taken out and shot if they don’t (Lilly Weatherwax’s Genua comes to mind). If the “siren world” fails to look appealing to (most) human sensibilities in the first place, as with wireheading, then it’s simply failing at siren.
The point is that we’re supposed to worry about what happens when we can let computers do our fantasizing for us in high resolution and real time, and then put those fantasies into action, as if we could ever actually do this, because there’s a danger in letting ourselves get caught up in a badly un-thought-through fantasy’s nice aspects without thinking about what it would really be like.
The problem being, no, we can’t actually do that kind of “automated fantasizing” in any real sense, for the same reason that fantasies don’t resemble reality: to fully simulate some fantasy in high resolution (ie: such that choosing to put it into action would involve any substantial causal entanglement between the fantasy and the subsequent realized “utopia”) involves degrees of computing power we just won’t have and which it just wouldn’t even be efficient to use that way.
Backwards chaining from “What if I had a Palantir?” does lead to thinking, “What if Sauron used it to overwhelm my will and enthrall me?”, which sounds wise except that, “What if I had a Palantir?” really ought to lead to, “That’s neither possible nor an efficient way to get what I want.”
Actually, I’d argue the main problem with “Siren Worlds” is the assumption that you can “envision”, or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.
That kind of computing power would require, well, something like the mass of a whole country/planet/galaxy and then some. Even if we generously assume a very low fidelity of simulation, comparable with mere weather simulations or even mere video games, we’re still talking whole server/compute farms being turned towards nothing but the task of pretending to possess a magical crystal ball for no sensible reason.
tl;dr: human values are already quite fragile and vulnerable to human-generated siren worlds.
Simulation complexity has not stopped humans from implementing totalitarian dictatorships (based on divine right of kings, fundamentalism, communism, fascism, people’s democracy, what-have-you) due to envisioning a siren world that is ultimately unrealistic.
It doesn’t require detailed simulation of a physical world, it only requires sufficient simulation of human desires, biases, blind spots, etc. that can lead people to abandon previously held values because they believe the siren world values will be necessary and sufficient to achieve what the siren world shows them. It exploits a flaw in human reasoning, not a flaw in accurate physical simulation.
That’s shifting the definition of “siren world” from “something which looks very nice when simulated in high-resolution but has things horrendously wrong on the inside” to a very standard “Human beings imagine things in low-resolution and don’t always think them out clearly.”
You don’t need to pour extra Lovecraft Sauce on your existing irrationalities just for your enjoyment of Lovecraft Sauce.
It depends a lot on how the world is being shown. If the AI is your “guide”, it can show you the seductive features of the world, or choose the fidelity of the simulation in just the right ways in the right places, etc… Without needing a full fledged simulation. You can have a siren world in text, just through the AI’s (technically accurate) descriptions, given your questions.
You’re missing my point, which is that proposing you’ve got “an AI” (with no dissolved understanding of how the thing actually works underneath what you’d get from a Greg Egan novel) which “simulates” possible “worlds” is already engaging in several layers of magical thinking, and you shouldn’t be surprised to draw silly conclusions from magical thinking.
I think I’m not getting your point either. Isn’t Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won’t be making decisions like this?
While I do think that real AIs won’t make decisions in this fashion, that aside, as I had understood Stuart’s article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which “the AI” was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties, but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
The “But also...” part is the bit I actually object to.
Let’s focus on a simple version, without the metaphors. We’re talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.
So what is happening is that various possible future worlds will be considered by the AI according to its desirability criteria, these worlds will be described to humans according to its description criteria, and humans will choose according to whatever criteria we use. So we have a combination of criteria that result in a final decision. A siren world is a world that ranks very high in these combined criteria but is actually nasty.
If we stick to that scenario and assume the AI is truthful, the main siren world generator is the ability of the AI to describe them in ways that sound very attractive to humans. Since human beliefs and preferences are not clearly distinct., this ranges from misleading (incorrect human beliefs) to actively seductive (influencing human preferences to favour these worlds).
The higher the bandwidth the AI has, the more chance it has of “seduction”, or of exploiting known or unknown human irrationalities (again, there’s often no clear distinction between exploiting irrationalities for beliefs or preferences).
One scenario—Paul Christiano’s—is a bit different but has essentially unlimited bandwidth (or, more precisely, has an AI estimating the result of a setup that has essentially unlimited bandwidth).
This category can include irrationalities we don’t yet know about, better exploitation of irrationalities we do know about, and a host of speculative scenarios about hacking the human brain, which I don’t want to rule out completely at this stage.
No. We’re not. That’s dumb. Like, sorry to be spiteful, but that is already a bad move. You do not treat any scenario involving “an AI”, without dissolving the concept, as desirable or realistic. You have “an AI”, without having either removed its “an AI”-ness (in the LW sense of “an AI”) entirely or guaranteed Friendliness? You’re already dead.
Can we assume, that since I’ve been working all this time on AI safety, that I’m not an idiot? When presenting a scenario (“assume AI contained, and truthful”) I’m investigating whether we have safety within the terms of that scenario. Which here we don’t, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.
Ah. Then here’s the difference in assumptions: I don’t believe a contained, truthful UFAI is safe in the first place. I just have an incredibly low prior on that. So low, in fact, that I didn’t think anyone would take it seriously enough to imagine scenarios which prove it’s unsafe, because it’s just so bloody obvious that you do not build UFAI for any reason, because it will go wrong in some way you didn’t plan for.
See the point on Paul Christiano’s design. The problem I discussed applies not only to UFAIs but to other designs that seek to get round it, but use potentially unrestricted search.
I’m puzzled. Are you sure that’s your main objection? Because,
you make a different objection (I think) in your response to the sibling, and
it seems to me that since any simulation of this kind will be incomplete, and I assume the AI will seek the most efficient way to achieve its programmed goals, the scenario you describe is in fact horribly dangerous; the AI has an incentive to deceive us. (And somewhat like Wei Dai, I thought we were really talking about an AI goal system that talks about extrapolating human responses to various futures.)
It would be completely unfair of me to focus on the line, “as thorough as a film might be today”. But since it’s funny, I give you Cracked.com on Independence Day.
To be honest, I was assuming we’re not talking about a “contained” UFAI, since that’s, you know, trivially unsafe.
It’s true that Stuart wrote about Oracle AI in his Siren worlds post, but I thought that was mostly just to explain the idea of what a Siren world is. Later on in the post he talks about how Paul Christiano’s take on indirect normativity has a similar problem. Basically the problem can occur if an AI tries to model a human as accurately as possible, then uses the model directly as its utility function and tries to find a feasible future world that maximizes the utility function.
It seems plausible that even if the AI couldn’t produce a high resolution simulation of a Siren world W, it could still infer (using various approximations and heuristics) that with high probability its utility function assigns a high score to W, and choose to realize W on that basis. It also seems plausible that an AI eventually would have enough computing power to produce high resolution simulations of Siren worlds, e.g., after it has colonized the galaxy, so the problem could happen at that point if not before.
What extra-scary features are you referring to? (Possibly I skipped over the parts you found objectionable since I was already familiar with the basic issue and didn’t read Stuart’s post super carefully.)
Yes. I think that probabilistic backwards chaining, aka “planning as inference”, is the more realistic way to plan, and better represented in the current literature.
That’s not needed for a siren world. Putting human brains into vats and stimulating their pleasure centers doesn’t require much computing power.
Wireheading isn’t a siren world, though. The point of the concept is that it looks like what we want, when we look at it from the outside, but actually, on the inside, something is very wrong. Example: a world full of people who are always smiling and singing about happiness because they will be taken out and shot if they don’t (Lilly Weatherwax’s Genua comes to mind). If the “siren world” fails to look appealing to (most) human sensibilities in the first place, as with wireheading, then it’s simply failing at siren.
The point is that we’re supposed to worry about what happens when we can let computers do our fantasizing for us in high resolution and real time, and then put those fantasies into action, as if we could ever actually do this, because there’s a danger in letting ourselves get caught up in a badly un-thought-through fantasy’s nice aspects without thinking about what it would really be like.
The problem being, no, we can’t actually do that kind of “automated fantasizing” in any real sense, for the same reason that fantasies don’t resemble reality: to fully simulate some fantasy in high resolution (ie: such that choosing to put it into action would involve any substantial causal entanglement between the fantasy and the subsequent realized “utopia”) involves degrees of computing power we just won’t have and which it just wouldn’t even be efficient to use that way.
Backwards chaining from “What if I had a Palantir?” does lead to thinking, “What if Sauron used it to overwhelm my will and enthrall me?”, which sounds wise except that, “What if I had a Palantir?” really ought to lead to, “That’s neither possible nor an efficient way to get what I want.”