But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.
Or once you lose your meta-mortal urge to reach a self-consistent morality. This may not be the wrong (heh) answer along a path that originally started toward reaching self-consistent morality.
Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps—and again, once you lose something of value in your system, you don’t tend to get if back.
Is it a trap? If the cost of iterating the “find a more self-consistent morality” loop for the next N years is greater than the expected benefit of the next incremental change toward a more consistent morality for those same N years, then perhaps it’s time to stop. Just as an example, if the universe can give us 10^20 years of computation, at some point near that 10^20 years we might as well spend all computation on directly fulfilling our morality instead of improving it. If at 10^20 - M years we discover that, hey, the universe will last another 10^50 years that tradeoff will change and it makes sense to compute even more self-consistent morality again.
Similarly, if we end up in a siren world it seems like it would be more useful to restart our search for moral complexity by the same criteria; it becomes worthwhile to change our morality again because the cost of continued existence in the current morality outweighs the cost of potentially improving it.
Additionally, I think that losing values is not a feature of reaching a more self-consistent morality. Removing a value from an existing moral system does not make the result consistent with the original morality; it is incompatible with reference to that value. Rather, self-consistent morality is approached by better carving reality at its joints in value space; defining existing values in terms of new values that are the best approximation to the old value in the situations where it was valued, while extending morality along the new dimensions into territory not covered by the original value. This should make it possible to escape from siren worlds by the same mechanism; entering a siren world is possible only if reality was improperly carved so that the siren world appeared to fulfill values along dimensions that it eventually did not, or that the siren world eventually contradicted some original value due to replacement values being an imperfect approximation. Once this disagreement is noticed it should be possible to more accurately carve reality and notice how the current values have become inconsistent with previous values and fix them.
Or once you lose your meta-mortal urge to reach a self-consistent morality. This may not be the wrong (heh) answer along a path that originally started toward reaching self-consistent morality.
The main problem with siren worlds is that humans are very vulnerable to certain types of seduction/trickery, and it’s very possible AIs with certain structures and goals would be equally vulnerable to (different) tricks. Defining what is a legit change and what isn’t is the challenge here.
The problem is that un-self-consistent morality is unstable under general self improvement
Even self-consistent morality is unstable if general self improvement allows for removal of values, even if removal is only a practical side effect of ignoring a value because it is more expensive to satisfy than other values. E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don’t demand that we continue to maintain that value, so it receives less attention. We also put less value on the older definition of honor (as a thing to be defended and fought for and maintained at the expense of convenience) that earlier centuries had, despite its general consistency with other values for honesty, trustworthiness, social status, etc. I think this is probably for the same reason; it’s expensive to maintain honor and most other values can be satisfied without it. In general, if U(more_satisfaction_of_value1) > U(more_satisfaction_of_value2) then maximization should tend to ignore value2 regardless of its consistency. If U(make_values_self_consistent_value) > U(satisfying_any_other_value) then the obvious solution is to drop the other values and be done.
A sort of opposite approach is “make reality consistent with these pre-existing values” which involves finding a domain in reality state space under which existing values are self-consistent, and then trying to mold reality into that domain. The risk (unless you’re a negative utilitarian) is that the domain is null. Finding the largest domain consistent with all values would make life more complex and interesting, so that would probably be a safe value. If domains form disjoint sets of reality with no continuous physical transitions between them then one would have to choose one physically continuous sub-domain and stick with it forever (or figure out how to switch the entire universe from one set to another). One could also start with preexisting values and compute a possible world where the values are self-consistent, then simulate it.
It is expensive to honor ancestors, and ancestors don’t demand that we continue to maintain that value, so it receives less attention.
That’s something different—a human trait that makes us want to avoid expensive commitments while paying them lip service. A self consistent system would not have this trait, and would keep “honor ancestors” in it, and do so or not depending on the cost and the interaction with other moral values.
If you want to look at even self-consistent systems being unstable, I suggest looking at social situations, where other entities reward value-change. Or a no-free-lunch result of the type “This powerful being will not trade with agents having value V.”
E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don’t demand that we continue to maintain that value, so it receives less attention.
This sweeps the model-dependence of “values” under the rug. The reason we don’t value honoring our ancestors is that we don’t believe they continue to exist after death, and so we don’t believe social relations of any kind can be carried on with them.
The reason we don’t value honoring our ancestors is that we don’t believe they continue to exist after death
This could be a case of typical mind fallacy. I can point to a number of statistical studies that show that a large number of Westerners claim that their ancestors do continue to exist after death.
Anyone who believes that some sort of heaven or hell exists.
And a lot of these people nonetheless don’t accord their ancestors all that much in the way of honour...
I can point to a number of statistical studies that show that a large number of Westerners claim that their ancestors do continue to exist after death.
Because the things that people would do if they believed in and acted as though they believe in life after death are profoundly weird, and we don’t see any of that around. Can you imagine the same people who say that the dead “went to a better place” being sad that someone has not died, for instance? (Unless they’re suffering so much or causing so much suffering that death is preferable even without an afterlife.)
You are assuming that human beings are much more altruistic than they actually are. If your wife has the chance of leaving you and having a much better life where you will never hear from her again, you will not be sad if she does not take the chance.
Because the things that people would do if they believed in and acted as though they believe in life after death are profoundly weird, and we don’t see any of that around.
I don’t see why they need to be “profoundly weird”. Remember, this subthread started with “honoring ancestors”. The Chinese culture is probably the most obvious one where honoring ancestors is a big thing. What “profoundly weird” things does it involve?
Sorry, I don’t know enough about Chinese culture to answer. But I’d guess that either they do have weird beliefs (that I’m not familiar with so I can’t name them), or they don’t and honoring ancestors is an isolated thing they do as a ritual. (The answer may be different for different people, of course.)
Insofar as anyone expects saints to perform the function of demigods and intervene causally with miracles on behalf of the person praying, yes, it is “profoundly weird” magical thinking.
Why do you ask a site full of atheists if they think religion is irrational?
-- Be happy that people have died and sad that they remain alive (same qualifiers as before: person is not suffering so much that even nothingness is preferable, etc.) and the reverse for people who they don’t like
-- Want to kill people to benefit them (certainly, we could improve a lot of third world suffering by nuking places, if they have a bad life but a good afterlife. Note that the objection “their culture would die out” would not be true if there is an afterlife.)
-- In the case of people who oppose abortions because fetuses are people (which I expect overlaps highly with belief in life after death), be in favor of abortions if the fetus gets a good afterlife
-- Be less willing to kill their enemies the worse the enemy is
-- Do extensive scientific research trying to figure out what life after death is like.
-- Genuinely think that having their child die is no worse than having their child move away to a place where the child cannot contact them
-- Drastically reduce how bad they think death is when making public policy decisions; there would be still some effect because death is separation and things that cause death also cause suffering, but we act as though causing death makes some policy uniquely bad and preventing it uniquely good
-- Not oppose suicide
Edit: Support the death penalty as more humane than life imprisonment.
(Some of these might not apply if they believe in life after death but also in Hell, but that has its own bizarre consequences.)
-- Be less willing to kill their enemies the worse the enemy is
Now might I do it pat. Now he is praying. And now I’ll do ’t. And so he goes to heaven. And so am I revenged.—That would be scanned. A villain kills my father, and, for that, I, his sole son, do this same villain send To heaven. Oh, this is hire and salary, not revenge. He took my father grossly, full of bread, With all his crimes broad blown, as flush as May. And how his audit stands who knows save heaven? But in our circumstance and course of thought ’Tis heavy with him. And am I then revenged To take him in the purging of his soul When he is fit and seasoned for his passage? No. Up, sword, and know thou a more horrid hent. When he is drunk asleep, or in his rage, Or in th’ incestuous pleasure of his bed, At game a-swearing, or about some act That has no relish of salvation in ’t— Then trip him, that his heels may kick at heaven, And that his soul may be as damned and black As hell, whereto it goes. My mother stays This physic but prolongs thy sickly days.
In Christianity, we are as soldiers on duty who cannot desert their post. Suicide and murder are mortal sins, damning one to perdition hereafter. Christians differ on whether this is a causal connection: works → fate, or predestined by grace: grace → works and grace → fate. Either way, the consequences of believing in the Christian conception of life after death add up to practicing Christian virtue in this life.
In Buddhism, you get reincarnated, but only if you have lived a virtuous life do you get a favorable rebirth. Killing, including of yourself, is one of the worst sins and guarantees you a good many aeons in the hell worlds. The consequences of believing in the Buddhist conception of life after death add up to practicing Buddhist virtue in this life.
In Islam, paradise awaits the virtuous and hell the wicked. The consequences of believing in the Islamic conception of life after death add up to practicing Islamic virtue in this life. We can observe these consequences in current affairs.
I don’t think that helps. For instance, if they alieve in an afterlife but their religion says that suicide and murder are mortal sins, they won’t actually commit murder or suicide, but they would still not think it was sad that someone died in the way we think it’s sad, would not insist that public policies should reduce deaths, etc.
You would also expect a lot of people to start thinking of religious prohibitions on murder and suicide like many people think of religious prohibitions on homosexuality—If God really wants that, he’s being a jerk and hurting people for no obvious reason. And you’d expect believers to simply rationalize away religious prohibitions on murder and suicide and say that they don’t apply just like religious believers already do to lots of other religious teachings (of which I’m sure you can name your own examples).
If God really wants that, he’s being a jerk and hurting people for no obvious reason.
Ask a Christian and they’ll give you reasons. Ask a Jew and they’ll give you reasons, except for those among the laws that are to be obeyed because God says so, despite there not being a reason known to Man. Ask a Buddhist, ask a Moslem.
There is no low-hanging fruit here, no instant knock-down arguments against any of these faiths that their educated practitioners do not know already and have answers to.
Yes, but in the real world, when a religious demand conflicts with something people really believe, it can go either way. Some people will find reasons that justify the demand. But some will find reasons that go in the other direction—instead of reasons why the religion demands some absurd thing, they’d give reasons as to why the religion’s obvious demand really isn’t a demand at all. Where are the people saying that the prohibition on murder is meant metaphorically, or only means “you shouldn’t commit murders in this specific situation that only existed thousands of years ago”? For that matter, where are the people saying “sure, my religion says we shouldn’t murder, but I have no right to impose that on nonbelievers by force of law”, in the same way that they might say that about other mortal sins?
Be happy that people have died and sad that they remain alive (same qualifiers as before: person is not suffering so much that even nothingness is preferable, etc.) and the reverse for people who they don’t like
Hmmm.
What is known is that people who go to the afterlife don’t generally come back (or, at least, don’t generally come back with their memories intact). Historical evidence strongly suggests that anyone who remains alive will eventually die… so remaining alive means you have more time to enjoy what is nice here before moving on.
So, I don’t imagine this would be the case unless the afterlife is strongly known to be significantly better than here.
Want to kill people to benefit them (certainly, we could improve a lot of third world suffering by nuking places, if they have a bad life but a good afterlife. Note that the objection “their culture would die out” would not be true if there is an afterlife.)
Is it possible for people in the afterlife to have children? It may be that their culture will quickly run out of new members if they are all killed off. Again, though, this is only true if the afterlife is certain to be better than here.
In the case of people who oppose abortions because fetuses are people (which I expect overlaps highly with belief in life after death), be in favor of abortions if the fetus gets a good afterlife
Be less willing to kill their enemies the worse the enemy is
Both true if and only if the afterlife is known to be better.
Do extensive scientific research trying to figure out what life after death is like.
People have tried various experiments, like asking people who have undergone near-death experiences. However, there is very little data to work with and I know of no experiment that will actually give any sort of unambiguous result.
Genuinely think that having their child die is no worse than having their child move away to a place where the child cannot contact them
And where their child cannot contact anyone else who is still alive, either. Thrown into a strange and unfamiliar place with people who the parent knows nothing about. I can see that making parents nervous...
Drastically reduce how bad they think death is when making public policy decisions; there would be still some effect because death is separation and things that cause death also cause suffering, but we act as though causing death makes some policy uniquely bad and preventing it uniquely good
Exile is also generally considered uniquely bad; and since the dead have never been known to return, death is at the very least a form of exile that can never be revoked.
Not oppose suicide
...depends. Many people who believe in life after death also believe that suicide makes things very difficult for the victim there.
Support the death penalty as more humane than life imprisonment.
Again, this depends; if there is a Hell, then the death penalty kills a person without allowing him much of a chance to try to repent, and could therefore be seen as less humane than life imprisonment.
The worse the afterlife is, the more similar people’s reactions will be to a world where there is no afterlife. In the limit, the afterlife is as bad as or worse than nonexistence and people would be as death-averse as they are now. Except that this is contrary to how people claim to think of the afterlife when they assert belief in it. The afterlife can’t be good enough to be comforting and still bad enough not to lead to any of the conclusions I described. And this includes being bad for reasons such as being like exile, being irreversible, etc.
And I already said that if there is a Hell (a selectively bad afterlife), many of these won’t apply, but the existence of Hell has its own problems.
The worse the afterlife is, the more similar people’s reactions will be to a world where there is no afterlife.
I’d phrase it as “the scarier the afterlife is, the more similar people’s reactions will be to a world where there is no afterlife.” The word “scarier” is important, because something can look scary but be harmless, or even beneficial.
And people’s reactions do not depend on what the afterlife is like; they depend on what people think about the afterlife.
And one of the scariest things to do is to jump into a complete unknown… even if you’re pretty sure it’ll be harmless, or even beneficial, jumping into a complete unknown from which there is no way back is still pretty scary...
But is jumping into a “complete unknown” which you think should be beneficial really going to get the same reaction as jumping into one that you believe to be harmful?
I can point to a number of statistical studies that show that a large number of Westerners claim that their ancestors do continue to exist after death.
No, they believe-in-the-belief that their ancestors continue to exist after death. They rarely, and doubtingly, if ever, generate the concrete expectation that anything they can do puts them in causal contact with the ghosts of their ancestors, such that they would expect to see something different from their ancestors being permanently gone.
Actually, I’d argue the main problem with “Siren Worlds” is the assumption that you can “envision”, or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.
That kind of computing power would require, well, something like the mass of a whole country/planet/galaxy and then some. Even if we generously assume a very low fidelity of simulation, comparable with mere weather simulations or even mere video games, we’re still talking whole server/compute farms being turned towards nothing but the task of pretending to possess a magical crystal ball for no sensible reason.
tl;dr: human values are already quite fragile and vulnerable to human-generated siren worlds.
Simulation complexity has not stopped humans from implementing totalitarian dictatorships (based on divine right of kings, fundamentalism, communism, fascism, people’s democracy, what-have-you) due to envisioning a siren world that is ultimately unrealistic.
It doesn’t require detailed simulation of a physical world, it only requires sufficient simulation of human desires, biases, blind spots, etc. that can lead people to abandon previously held values because they believe the siren world values will be necessary and sufficient to achieve what the siren world shows them. It exploits a flaw in human reasoning, not a flaw in accurate physical simulation.
That’s shifting the definition of “siren world” from “something which looks very nice when simulated in high-resolution but has things horrendously wrong on the inside” to a very standard “Human beings imagine things in low-resolution and don’t always think them out clearly.”
You don’t need to pour extra Lovecraft Sauce on your existing irrationalities just for your enjoyment of Lovecraft Sauce.
It depends a lot on how the world is being shown. If the AI is your “guide”, it can show you the seductive features of the world, or choose the fidelity of the simulation in just the right ways in the right places, etc… Without needing a full fledged simulation. You can have a siren world in text, just through the AI’s (technically accurate) descriptions, given your questions.
You’re missing my point, which is that proposing you’ve got “an AI” (with no dissolved understanding of how the thing actually works underneath what you’d get from a Greg Egan novel) which “simulates” possible “worlds” is already engaging in several layers of magical thinking, and you shouldn’t be surprised to draw silly conclusions from magical thinking.
I think I’m not getting your point either. Isn’t Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won’t be making decisions like this?
Isn’t Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won’t be making decisions like this?
While I do think that real AIs won’t make decisions in this fashion, that aside, as I had understood Stuart’s article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which “the AI” was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties, but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
The “But also...” part is the bit I actually object to.
Let’s focus on a simple version, without the metaphors. We’re talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.
So what is happening is that various possible future worlds will be considered by the AI according to its desirability criteria, these worlds will be described to humans according to its description criteria, and humans will choose according to whatever criteria we use. So we have a combination of criteria that result in a final decision. A siren world is a world that ranks very high in these combined criteria but is actually nasty.
If we stick to that scenario and assume the AI is truthful, the main siren world generator is the ability of the AI to describe them in ways that sound very attractive to humans. Since human beliefs and preferences are not clearly distinct., this ranges from misleading (incorrect human beliefs) to actively seductive (influencing human preferences to favour these worlds).
The higher the bandwidth the AI has, the more chance it has of “seduction”, or of exploiting known or unknown human irrationalities (again, there’s often no clear distinction between exploiting irrationalities for beliefs or preferences).
One scenario—Paul Christiano’s—is a bit different but has essentially unlimited bandwidth (or, more precisely, has an AI estimating the result of a setup that has essentially unlimited bandwidth).
but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
This category can include irrationalities we don’t yet know about, better exploitation of irrationalities we do know about, and a host of speculative scenarios about hacking the human brain, which I don’t want to rule out completely at this stage.
We’re talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.
No. We’re not. That’s dumb. Like, sorry to be spiteful, but that is already a bad move. You do not treat any scenario involving “an AI”, without dissolving the concept, as desirable or realistic. You have “an AI”, without having either removed its “an AI”-ness (in the LW sense of “an AI”) entirely or guaranteed Friendliness? You’re already dead.
Can we assume, that since I’ve been working all this time on AI safety, that I’m not an idiot? When presenting a scenario (“assume AI contained, and truthful”) I’m investigating whether we have safety within the terms of that scenario. Which here we don’t, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.
Ah. Then here’s the difference in assumptions: I don’t believe a contained, truthful UFAI is safe in the first place. I just have an incredibly low prior on that. So low, in fact, that I didn’t think anyone would take it seriously enough to imagine scenarios which prove it’s unsafe, because it’s just so bloody obvious that you do not build UFAI for any reason, because it will go wrong in some way you didn’t plan for.
See the point on Paul Christiano’s design. The problem I discussed applies not only to UFAIs but to other designs that seek to get round it, but use potentially unrestricted search.
I’m puzzled. Are you sure that’s your main objection? Because,
you make a different objection (I think) in your response to the sibling, and
it seems to me that since any simulation of this kind will be incomplete, and I assume the AI will seek the most efficient way to achieve its programmed goals, the scenario you describe is in fact horribly dangerous; the AI has an incentive to deceive us. (And somewhat like Wei Dai, I thought we were really talking about an AI goal system that talks about extrapolating human responses to various futures.)
It would be completely unfair of me to focus on the line, “as thorough as a film might be today”. But since it’s funny, I give you Cracked.com on Independence Day.
as I had understood Stuart’s article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which “the AI” was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties
It’s true that Stuart wrote about Oracle AI in his Siren worlds post, but I thought that was mostly just to explain the idea of what a Siren world is. Later on in the post he talks about how Paul Christiano’s take on indirect normativity has a similar problem. Basically the problem can occur if an AI tries to model a human as accurately as possible, then uses the model directly as its utility function and tries to find a feasible future world that maximizes the utility function.
It seems plausible that even if the AI couldn’t produce a high resolution simulation of a Siren world W, it could still infer (using various approximations and heuristics) that with high probability its utility function assigns a high score to W, and choose to realize W on that basis. It also seems plausible that an AI eventually would have enough computing power to produce high resolution simulations of Siren worlds, e.g., after it has colonized the galaxy, so the problem could happen at that point if not before.
but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
What extra-scary features are you referring to? (Possibly I skipped over the parts you found objectionable since I was already familiar with the basic issue and didn’t read Stuart’s post super carefully.)
Are you arguing that real AIs won’t be making decisions like this?
Yes. I think that probabilistic backwards chaining, aka “planning as inference”, is the more realistic way to plan, and better represented in the current literature.
Actually, I’d argue the main problem with “Siren Worlds” is the assumption that you can “envision”, or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.
That’s not needed for a siren world. Putting human brains into vats and stimulating their pleasure centers doesn’t require much computing power.
Wireheading isn’t a siren world, though. The point of the concept is that it looks like what we want, when we look at it from the outside, but actually, on the inside, something is very wrong. Example: a world full of people who are always smiling and singing about happiness because they will be taken out and shot if they don’t (Lilly Weatherwax’s Genua comes to mind). If the “siren world” fails to look appealing to (most) human sensibilities in the first place, as with wireheading, then it’s simply failing at siren.
The point is that we’re supposed to worry about what happens when we can let computers do our fantasizing for us in high resolution and real time, and then put those fantasies into action, as if we could ever actually do this, because there’s a danger in letting ourselves get caught up in a badly un-thought-through fantasy’s nice aspects without thinking about what it would really be like.
The problem being, no, we can’t actually do that kind of “automated fantasizing” in any real sense, for the same reason that fantasies don’t resemble reality: to fully simulate some fantasy in high resolution (ie: such that choosing to put it into action would involve any substantial causal entanglement between the fantasy and the subsequent realized “utopia”) involves degrees of computing power we just won’t have and which it just wouldn’t even be efficient to use that way.
Backwards chaining from “What if I had a Palantir?” does lead to thinking, “What if Sauron used it to overwhelm my will and enthrall me?”, which sounds wise except that, “What if I had a Palantir?” really ought to lead to, “That’s neither possible nor an efficient way to get what I want.”
Or once you lose your meta-mortal urge to reach a self-consistent morality. This may not be the wrong (heh) answer along a path that originally started toward reaching self-consistent morality.
Is it a trap? If the cost of iterating the “find a more self-consistent morality” loop for the next N years is greater than the expected benefit of the next incremental change toward a more consistent morality for those same N years, then perhaps it’s time to stop. Just as an example, if the universe can give us 10^20 years of computation, at some point near that 10^20 years we might as well spend all computation on directly fulfilling our morality instead of improving it. If at 10^20 - M years we discover that, hey, the universe will last another 10^50 years that tradeoff will change and it makes sense to compute even more self-consistent morality again.
Similarly, if we end up in a siren world it seems like it would be more useful to restart our search for moral complexity by the same criteria; it becomes worthwhile to change our morality again because the cost of continued existence in the current morality outweighs the cost of potentially improving it.
Additionally, I think that losing values is not a feature of reaching a more self-consistent morality. Removing a value from an existing moral system does not make the result consistent with the original morality; it is incompatible with reference to that value. Rather, self-consistent morality is approached by better carving reality at its joints in value space; defining existing values in terms of new values that are the best approximation to the old value in the situations where it was valued, while extending morality along the new dimensions into territory not covered by the original value. This should make it possible to escape from siren worlds by the same mechanism; entering a siren world is possible only if reality was improperly carved so that the siren world appeared to fulfill values along dimensions that it eventually did not, or that the siren world eventually contradicted some original value due to replacement values being an imperfect approximation. Once this disagreement is noticed it should be possible to more accurately carve reality and notice how the current values have become inconsistent with previous values and fix them.
The problem is that un-self-consistent morality is unstable under general self improvement (and self-improvement is very general, see http://lesswrong.com/r/discussion/lw/mir/selfimprovement_without_selfmodification/ ).
The main problem with siren worlds is that humans are very vulnerable to certain types of seduction/trickery, and it’s very possible AIs with certain structures and goals would be equally vulnerable to (different) tricks. Defining what is a legit change and what isn’t is the challenge here.
Even self-consistent morality is unstable if general self improvement allows for removal of values, even if removal is only a practical side effect of ignoring a value because it is more expensive to satisfy than other values. E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don’t demand that we continue to maintain that value, so it receives less attention. We also put less value on the older definition of honor (as a thing to be defended and fought for and maintained at the expense of convenience) that earlier centuries had, despite its general consistency with other values for honesty, trustworthiness, social status, etc. I think this is probably for the same reason; it’s expensive to maintain honor and most other values can be satisfied without it. In general, if U(more_satisfaction_of_value1) > U(more_satisfaction_of_value2) then maximization should tend to ignore value2 regardless of its consistency. If U(make_values_self_consistent_value) > U(satisfying_any_other_value) then the obvious solution is to drop the other values and be done.
A sort of opposite approach is “make reality consistent with these pre-existing values” which involves finding a domain in reality state space under which existing values are self-consistent, and then trying to mold reality into that domain. The risk (unless you’re a negative utilitarian) is that the domain is null. Finding the largest domain consistent with all values would make life more complex and interesting, so that would probably be a safe value. If domains form disjoint sets of reality with no continuous physical transitions between them then one would have to choose one physically continuous sub-domain and stick with it forever (or figure out how to switch the entire universe from one set to another). One could also start with preexisting values and compute a possible world where the values are self-consistent, then simulate it.
That’s something different—a human trait that makes us want to avoid expensive commitments while paying them lip service. A self consistent system would not have this trait, and would keep “honor ancestors” in it, and do so or not depending on the cost and the interaction with other moral values.
If you want to look at even self-consistent systems being unstable, I suggest looking at social situations, where other entities reward value-change. Or a no-free-lunch result of the type “This powerful being will not trade with agents having value V.”
This sweeps the model-dependence of “values” under the rug. The reason we don’t value honoring our ancestors is that we don’t believe they continue to exist after death, and so we don’t believe social relations of any kind can be carried on with them.
This could be a case of typical mind fallacy. I can point to a number of statistical studies that show that a large number of Westerners claim that their ancestors do continue to exist after death.
Anyone who believes that some sort of heaven or hell exists.
And a lot of these people nonetheless don’t accord their ancestors all that much in the way of honour...
They may believe it, but they don’t alieve it.
How do you know?
Because the things that people would do if they believed in and acted as though they believe in life after death are profoundly weird, and we don’t see any of that around. Can you imagine the same people who say that the dead “went to a better place” being sad that someone has not died, for instance? (Unless they’re suffering so much or causing so much suffering that death is preferable even without an afterlife.)
You are assuming that human beings are much more altruistic than they actually are. If your wife has the chance of leaving you and having a much better life where you will never hear from her again, you will not be sad if she does not take the chance.
I don’t see why they need to be “profoundly weird”. Remember, this subthread started with “honoring ancestors”. The Chinese culture is probably the most obvious one where honoring ancestors is a big thing. What “profoundly weird” things does it involve?
Given that this is the Chinese we’re talking about, expecting one’s ancestors to improve investment returns in return for a good sacrifice.
Sorry, I don’t know enough about Chinese culture to answer. But I’d guess that either they do have weird beliefs (that I’m not familiar with so I can’t name them), or they don’t and honoring ancestors is an isolated thing they do as a ritual. (The answer may be different for different people, of course.)
Speaking of “profoundly weird” things, does the veneration of saints in Catholicism qualify? :-)
Insofar as anyone expects saints to perform the function of demigods and intervene causally with miracles on behalf of the person praying, yes, it is “profoundly weird” magical thinking.
Why do you ask a site full of atheists if they think religion is irrational?
“Irrational” and “weird” are quite different adjectives.
Okay, now I’m curious. What exactly do you think that people would do if they believed in life after death?
-- Be happy that people have died and sad that they remain alive (same qualifiers as before: person is not suffering so much that even nothingness is preferable, etc.) and the reverse for people who they don’t like
-- Want to kill people to benefit them (certainly, we could improve a lot of third world suffering by nuking places, if they have a bad life but a good afterlife. Note that the objection “their culture would die out” would not be true if there is an afterlife.)
-- In the case of people who oppose abortions because fetuses are people (which I expect overlaps highly with belief in life after death), be in favor of abortions if the fetus gets a good afterlife
-- Be less willing to kill their enemies the worse the enemy is
-- Do extensive scientific research trying to figure out what life after death is like.
-- Genuinely think that having their child die is no worse than having their child move away to a place where the child cannot contact them
-- Drastically reduce how bad they think death is when making public policy decisions; there would be still some effect because death is separation and things that cause death also cause suffering, but we act as though causing death makes some policy uniquely bad and preventing it uniquely good
-- Not oppose suicide
Edit: Support the death penalty as more humane than life imprisonment.
(Some of these might not apply if they believe in life after death but also in Hell, but that has its own bizarre consequences.)
Now might I do it pat. Now he is praying.
And now I’ll do ’t. And so he goes to heaven.
And so am I revenged.—That would be scanned.
A villain kills my father, and, for that,
I, his sole son, do this same villain send
To heaven.
Oh, this is hire and salary, not revenge.
He took my father grossly, full of bread,
With all his crimes broad blown, as flush as May.
And how his audit stands who knows save heaven?
But in our circumstance and course of thought
’Tis heavy with him. And am I then revenged
To take him in the purging of his soul
When he is fit and seasoned for his passage?
No.
Up, sword, and know thou a more horrid hent.
When he is drunk asleep, or in his rage,
Or in th’ incestuous pleasure of his bed,
At game a-swearing, or about some act
That has no relish of salvation in ’t—
Then trip him, that his heels may kick at heaven,
And that his soul may be as damned and black
As hell, whereto it goes. My mother stays
This physic but prolongs thy sickly days.
-- Hamlet, Act 3, scene 3.
Beware fictional evidence.
In Christianity, we are as soldiers on duty who cannot desert their post. Suicide and murder are mortal sins, damning one to perdition hereafter. Christians differ on whether this is a causal connection: works → fate, or predestined by grace: grace → works and grace → fate. Either way, the consequences of believing in the Christian conception of life after death add up to practicing Christian virtue in this life.
In Buddhism, you get reincarnated, but only if you have lived a virtuous life do you get a favorable rebirth. Killing, including of yourself, is one of the worst sins and guarantees you a good many aeons in the hell worlds. The consequences of believing in the Buddhist conception of life after death add up to practicing Buddhist virtue in this life.
In Islam, paradise awaits the virtuous and hell the wicked. The consequences of believing in the Islamic conception of life after death add up to practicing Islamic virtue in this life. We can observe these consequences in current affairs.
I don’t think that helps. For instance, if they alieve in an afterlife but their religion says that suicide and murder are mortal sins, they won’t actually commit murder or suicide, but they would still not think it was sad that someone died in the way we think it’s sad, would not insist that public policies should reduce deaths, etc.
You would also expect a lot of people to start thinking of religious prohibitions on murder and suicide like many people think of religious prohibitions on homosexuality—If God really wants that, he’s being a jerk and hurting people for no obvious reason. And you’d expect believers to simply rationalize away religious prohibitions on murder and suicide and say that they don’t apply just like religious believers already do to lots of other religious teachings (of which I’m sure you can name your own examples).
Ask a Christian and they’ll give you reasons. Ask a Jew and they’ll give you reasons, except for those among the laws that are to be obeyed because God says so, despite there not being a reason known to Man. Ask a Buddhist, ask a Moslem.
There is no low-hanging fruit here, no instant knock-down arguments against any of these faiths that their educated practitioners do not know already and have answers to.
Yes, but in the real world, when a religious demand conflicts with something people really believe, it can go either way. Some people will find reasons that justify the demand. But some will find reasons that go in the other direction—instead of reasons why the religion demands some absurd thing, they’d give reasons as to why the religion’s obvious demand really isn’t a demand at all. Where are the people saying that the prohibition on murder is meant metaphorically, or only means “you shouldn’t commit murders in this specific situation that only existed thousands of years ago”? For that matter, where are the people saying “sure, my religion says we shouldn’t murder, but I have no right to impose that on nonbelievers by force of law”, in the same way that they might say that about other mortal sins?
Hmmm.
What is known is that people who go to the afterlife don’t generally come back (or, at least, don’t generally come back with their memories intact). Historical evidence strongly suggests that anyone who remains alive will eventually die… so remaining alive means you have more time to enjoy what is nice here before moving on.
So, I don’t imagine this would be the case unless the afterlife is strongly known to be significantly better than here.
Is it possible for people in the afterlife to have children? It may be that their culture will quickly run out of new members if they are all killed off. Again, though, this is only true if the afterlife is certain to be better than here.
Both true if and only if the afterlife is known to be better.
People have tried various experiments, like asking people who have undergone near-death experiences. However, there is very little data to work with and I know of no experiment that will actually give any sort of unambiguous result.
And where their child cannot contact anyone else who is still alive, either. Thrown into a strange and unfamiliar place with people who the parent knows nothing about. I can see that making parents nervous...
Exile is also generally considered uniquely bad; and since the dead have never been known to return, death is at the very least a form of exile that can never be revoked.
...depends. Many people who believe in life after death also believe that suicide makes things very difficult for the victim there.
Again, this depends; if there is a Hell, then the death penalty kills a person without allowing him much of a chance to try to repent, and could therefore be seen as less humane than life imprisonment.
The worse the afterlife is, the more similar people’s reactions will be to a world where there is no afterlife. In the limit, the afterlife is as bad as or worse than nonexistence and people would be as death-averse as they are now. Except that this is contrary to how people claim to think of the afterlife when they assert belief in it. The afterlife can’t be good enough to be comforting and still bad enough not to lead to any of the conclusions I described. And this includes being bad for reasons such as being like exile, being irreversible, etc.
And I already said that if there is a Hell (a selectively bad afterlife), many of these won’t apply, but the existence of Hell has its own problems.
I’d phrase it as “the scarier the afterlife is, the more similar people’s reactions will be to a world where there is no afterlife.” The word “scarier” is important, because something can look scary but be harmless, or even beneficial.
And people’s reactions do not depend on what the afterlife is like; they depend on what people think about the afterlife.
And one of the scariest things to do is to jump into a complete unknown… even if you’re pretty sure it’ll be harmless, or even beneficial, jumping into a complete unknown from which there is no way back is still pretty scary...
But is jumping into a “complete unknown” which you think should be beneficial really going to get the same reaction as jumping into one that you believe to be harmful?
No, it should not.
The knowledge that there’s no return would make people wary about it, but they’d be a lot more wary if they thought it would be harmful.
No, they believe-in-the-belief that their ancestors continue to exist after death. They rarely, and doubtingly, if ever, generate the concrete expectation that anything they can do puts them in causal contact with the ghosts of their ancestors, such that they would expect to see something different from their ancestors being permanently gone.
Actually, I’d argue the main problem with “Siren Worlds” is the assumption that you can “envision”, or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.
That kind of computing power would require, well, something like the mass of a whole country/planet/galaxy and then some. Even if we generously assume a very low fidelity of simulation, comparable with mere weather simulations or even mere video games, we’re still talking whole server/compute farms being turned towards nothing but the task of pretending to possess a magical crystal ball for no sensible reason.
tl;dr: human values are already quite fragile and vulnerable to human-generated siren worlds.
Simulation complexity has not stopped humans from implementing totalitarian dictatorships (based on divine right of kings, fundamentalism, communism, fascism, people’s democracy, what-have-you) due to envisioning a siren world that is ultimately unrealistic.
It doesn’t require detailed simulation of a physical world, it only requires sufficient simulation of human desires, biases, blind spots, etc. that can lead people to abandon previously held values because they believe the siren world values will be necessary and sufficient to achieve what the siren world shows them. It exploits a flaw in human reasoning, not a flaw in accurate physical simulation.
That’s shifting the definition of “siren world” from “something which looks very nice when simulated in high-resolution but has things horrendously wrong on the inside” to a very standard “Human beings imagine things in low-resolution and don’t always think them out clearly.”
You don’t need to pour extra Lovecraft Sauce on your existing irrationalities just for your enjoyment of Lovecraft Sauce.
It depends a lot on how the world is being shown. If the AI is your “guide”, it can show you the seductive features of the world, or choose the fidelity of the simulation in just the right ways in the right places, etc… Without needing a full fledged simulation. You can have a siren world in text, just through the AI’s (technically accurate) descriptions, given your questions.
You’re missing my point, which is that proposing you’ve got “an AI” (with no dissolved understanding of how the thing actually works underneath what you’d get from a Greg Egan novel) which “simulates” possible “worlds” is already engaging in several layers of magical thinking, and you shouldn’t be surprised to draw silly conclusions from magical thinking.
I think I’m not getting your point either. Isn’t Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won’t be making decisions like this?
While I do think that real AIs won’t make decisions in this fashion, that aside, as I had understood Stuart’s article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which “the AI” was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties, but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
The “But also...” part is the bit I actually object to.
Let’s focus on a simple version, without the metaphors. We’re talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.
So what is happening is that various possible future worlds will be considered by the AI according to its desirability criteria, these worlds will be described to humans according to its description criteria, and humans will choose according to whatever criteria we use. So we have a combination of criteria that result in a final decision. A siren world is a world that ranks very high in these combined criteria but is actually nasty.
If we stick to that scenario and assume the AI is truthful, the main siren world generator is the ability of the AI to describe them in ways that sound very attractive to humans. Since human beliefs and preferences are not clearly distinct., this ranges from misleading (incorrect human beliefs) to actively seductive (influencing human preferences to favour these worlds).
The higher the bandwidth the AI has, the more chance it has of “seduction”, or of exploiting known or unknown human irrationalities (again, there’s often no clear distinction between exploiting irrationalities for beliefs or preferences).
One scenario—Paul Christiano’s—is a bit different but has essentially unlimited bandwidth (or, more precisely, has an AI estimating the result of a setup that has essentially unlimited bandwidth).
This category can include irrationalities we don’t yet know about, better exploitation of irrationalities we do know about, and a host of speculative scenarios about hacking the human brain, which I don’t want to rule out completely at this stage.
No. We’re not. That’s dumb. Like, sorry to be spiteful, but that is already a bad move. You do not treat any scenario involving “an AI”, without dissolving the concept, as desirable or realistic. You have “an AI”, without having either removed its “an AI”-ness (in the LW sense of “an AI”) entirely or guaranteed Friendliness? You’re already dead.
Can we assume, that since I’ve been working all this time on AI safety, that I’m not an idiot? When presenting a scenario (“assume AI contained, and truthful”) I’m investigating whether we have safety within the terms of that scenario. Which here we don’t, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.
Ah. Then here’s the difference in assumptions: I don’t believe a contained, truthful UFAI is safe in the first place. I just have an incredibly low prior on that. So low, in fact, that I didn’t think anyone would take it seriously enough to imagine scenarios which prove it’s unsafe, because it’s just so bloody obvious that you do not build UFAI for any reason, because it will go wrong in some way you didn’t plan for.
See the point on Paul Christiano’s design. The problem I discussed applies not only to UFAIs but to other designs that seek to get round it, but use potentially unrestricted search.
I’m puzzled. Are you sure that’s your main objection? Because,
you make a different objection (I think) in your response to the sibling, and
it seems to me that since any simulation of this kind will be incomplete, and I assume the AI will seek the most efficient way to achieve its programmed goals, the scenario you describe is in fact horribly dangerous; the AI has an incentive to deceive us. (And somewhat like Wei Dai, I thought we were really talking about an AI goal system that talks about extrapolating human responses to various futures.)
It would be completely unfair of me to focus on the line, “as thorough as a film might be today”. But since it’s funny, I give you Cracked.com on Independence Day.
To be honest, I was assuming we’re not talking about a “contained” UFAI, since that’s, you know, trivially unsafe.
It’s true that Stuart wrote about Oracle AI in his Siren worlds post, but I thought that was mostly just to explain the idea of what a Siren world is. Later on in the post he talks about how Paul Christiano’s take on indirect normativity has a similar problem. Basically the problem can occur if an AI tries to model a human as accurately as possible, then uses the model directly as its utility function and tries to find a feasible future world that maximizes the utility function.
It seems plausible that even if the AI couldn’t produce a high resolution simulation of a Siren world W, it could still infer (using various approximations and heuristics) that with high probability its utility function assigns a high score to W, and choose to realize W on that basis. It also seems plausible that an AI eventually would have enough computing power to produce high resolution simulations of Siren worlds, e.g., after it has colonized the galaxy, so the problem could happen at that point if not before.
What extra-scary features are you referring to? (Possibly I skipped over the parts you found objectionable since I was already familiar with the basic issue and didn’t read Stuart’s post super carefully.)
Yes. I think that probabilistic backwards chaining, aka “planning as inference”, is the more realistic way to plan, and better represented in the current literature.
That’s not needed for a siren world. Putting human brains into vats and stimulating their pleasure centers doesn’t require much computing power.
Wireheading isn’t a siren world, though. The point of the concept is that it looks like what we want, when we look at it from the outside, but actually, on the inside, something is very wrong. Example: a world full of people who are always smiling and singing about happiness because they will be taken out and shot if they don’t (Lilly Weatherwax’s Genua comes to mind). If the “siren world” fails to look appealing to (most) human sensibilities in the first place, as with wireheading, then it’s simply failing at siren.
The point is that we’re supposed to worry about what happens when we can let computers do our fantasizing for us in high resolution and real time, and then put those fantasies into action, as if we could ever actually do this, because there’s a danger in letting ourselves get caught up in a badly un-thought-through fantasy’s nice aspects without thinking about what it would really be like.
The problem being, no, we can’t actually do that kind of “automated fantasizing” in any real sense, for the same reason that fantasies don’t resemble reality: to fully simulate some fantasy in high resolution (ie: such that choosing to put it into action would involve any substantial causal entanglement between the fantasy and the subsequent realized “utopia”) involves degrees of computing power we just won’t have and which it just wouldn’t even be efficient to use that way.
Backwards chaining from “What if I had a Palantir?” does lead to thinking, “What if Sauron used it to overwhelm my will and enthrall me?”, which sounds wise except that, “What if I had a Palantir?” really ought to lead to, “That’s neither possible nor an efficient way to get what I want.”