“In what direction should we nudge the future, to maximize the chances and impact of a positive Singularity?”
Friendly AI is incredible hard to get right and a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically.
I vote for antinatalism. It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering. Friendly AI is simply too risky.
I think that humans are not psychological equal. Not only are there many outliers, but most humans would turn into abhorrent creatures given their own pocket universe, unlimited power and a genie. And even given our current world, if we were to remove the huge memeplex of western civilization, most people would act like stone age hunter-gatherer. And that would be bad enough. After all, violence is the major cause of death within stone age socities.
Even proposals like CEV (Coherent Extrapolated Volition) can turn out to be a living hell for a percentage of all beings. I don’t expect any amount of knowledge, or intelligence, to cause humans to abandon their horrible preferences.
Eliezer Yudkowsky says that intelligence does not imply benevolence. That an artificial general intelligence won’t turn out to be friendly. That we have to make it friendly. Yet his best proposal is that humanity will do what is right if we only knew more, thought faster, were more the people we wished we were and had grown up farther together. The idea is that knowledge and intelligence implies benevolence for people. I don’t think so.
The problem is that if you extrapolate chaotic systems, e.g. human preferences given real world influence, small differences in initial conditions are going to yield widely diverging outcomes. That our extrapolated volition converges rather than diverges seems to be a bold prediction.
I just don’t see that a paperclip maximizer burning the cosmic commons is as bad as it is currently portrayed. Sure, it is “bad”. But everything else might be much worse.
Here is a question for those who think that antinatalism is just stupid. Would you be willing to rerun the history of the universe to obtain the current state? Would you be willing to create another Genghis Khan, a new holocaust, allowing intelligent life to evolve?
As Greg Egan wrote: “To get from micro-organisms to intelligent life this way would involve an immense amount of suffering, with billions of sentient creatures living, struggling and dying along the way.”
If you are not willing to do that, then why are you willing to do the same now, just for much longer, by trying to colonize the universe? Are you so sure that the time to come will be much better? How sure are you?
ETA
I expect any friendly AI outcome that fails to be friendly in a certain way to increase negative utility and only a perfectly “friendly” (whatever that means, it is still questionable if the whole idea makes sense) AI to yield a positive utility outcome.
That is because the closer any given AGI design is to friendliness the more likely it is that humans will be kept alive but might suffer. Whereas an unfriendly AI in complete ignorance of human values will more likely just see humans as a material resource without having any particular incentive to keep humans around.
Just imagine a friendly AI which fails to “understand” or care about human boredom.
There are several possibilities by which SIAI could actually cause a direct increase in negative utility.
1) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically.
2) Humans are not provably friendly. Given the power to shape the universe the SIAI might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies.
Do you actually disagree with anything or are you just trying to ridicule it? Do you think that the possibility that FAI research might increase negative utility is not to be taken seriously? Do you think that world states where faulty FAI designs are implemented have on average higher utility than world states where nobody is alive? If so, what research could I possible do to come to the same conclusion? What arguments do I miss? Do I just have to think about it longer?
Consider the way Eliezer Yudkowsky agrues in favor of FAI research:
Two hundred million years from now, the children’s children’s children of humanity in their galaxy-civilizations, are unlikely to look back and say, “You know, in retrospect, it really would have been worth not colonizing the Herculus supercluster if only we could have saved 80% of species instead of 20%”. I don’t think they’ll spend much time fretting about it at all, really. It is really incredibly hard to make the consequentialist utilitarian case here, as opposed to the warm-fuzzies case.
or
This is crunch time. This is crunch time for the entire human species. … and it’s crunch time not just for us, it’s crunch time for the intergalactic civilization whose existence depends on us. I think that if you’re actually just going to sort of confront it, rationally, full-on, then you can’t really justify trading off any part of that intergalactic civilization for any intrinsic thing that you could get nowadays …
Is his style of argumentation any different from mine except that he promises lots of positive utility?
I was just amused by the anticlimacticness of the quoted sentence (or maybe by how it would be anticlimactic anywhere else but here), the way it explains why a living hell for the rest of time is a bad thing by associating it with something so abstract as a dramatic increase in negative utility. That’s all I meant by that.
It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering.
Have you considered the many ways something like that could go wrong?
The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it
PM decides there’s a chance that it’s in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them
PM is itself capable of suffering
PM decides to create lots of descendent AIs in order to maximize paperclip production and they happen to be capable of suffering. (Our genes made us to maximize copies of them and we happen to be capable of suffering.)
somebody steals PM’s source code before it’s launched, and makes a sadistic AI
From your perspective, wouldn’t it be better to just build a really big bomb and blow up Earth? Or alternatively, if you want to minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you’d have to solve a lot of the same problems as FAI.
The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it
I don’t think that it is likely that it will encounter anything that has equal resources and if it does that suffering would occur (see below).
PM decides there’s a chance that it’s in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them
That seems like one of the problems that have to be solved in order to build an AI that transforms the universe into an inanimate state. But I think it is much easier to make an AI not simulate any other agents than to create a friendly AI. Much more can go wrong by creating a friendly AI, including the possibility that it tortures trillions of beings. In the case of a transformer you just have to make sure that it values an universe that is as close as possible to a state where no computation takes place and that does not engage in any kind of trade, acausal or otherwise.
PM is itself capable of suffering
I believe that any sort of morally significant suffering is an effect of (natural) evolution, and may in fact be dependent on that. I think that the kind of maximizer that SI has in mind is more akin to a transformation process that isn’t consciousness, does not have emotions and cannot suffer. If those qualities would be necessary requirements then I don’t think that we will build an artificial general intelligence any time soon and that if we do it will happen slowly and not be able to undergo dangerous recursive self-improvement.
somebody steals PM’s source code before it’s launched, and makes a sadistic AI
I think that this is more likely to be the case with friendly AI research because it takes longer.
Have you considered the many ways something like that could go wrong? [...] From your perspective, wouldn’t it be better to [...] minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you’d have to solve a lot of the same problems as FAI.
The reason for why I think that working towards FAI might be a bad idea is that it increases the chance of something going horrible wrong.
If I was to accept the framework of beliefs hold by SI then I would assign a low probability to the possibility that the default scenario in which an AI undergoes recursive self-improvement will include a lot of blackmailing that leads to a lot of suffering. Where the default is that nobody tries to make AI friendly.
I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways:
I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state.
ETA Not sure why the graph looks so messed up. Does anyone know of a better graphing tool?
I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state.
I think it’s too early to decide this. There are many questions whose answers will become clearer before we have to make a choice one way or another. If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that’s like SIAI but isn’t too attached to the idea of FAI and just wants to do whatever is best.
Now you can either try to build an organization like that from scratch, or try to push SIAI in that direction (i.e., make it more strategic and less attached to a specific plan). Of course, being lazy, I’m more tempted to do the latter, but your miles may vary. :)
If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that’s like SIAI but isn’t too attached to the idea of FAI and just wants to do whatever is best.
Yes.
I, for one, am ultimately concerned with doing whatever’s best. I’m not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.
I, for one, am ultimately concerned with doing whatever’s best. I’m not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.
Well, that’s great. Still, there are quite a few problems.
How do I know
… that SI does not increase existential risk by solving problems that can be used to build AGI earlier?
… that you won’t launch a half-baked friendly AI that will turn the world into a hell?
… that you don’t implement some strategies that will do really bad things to some people, e.g. myself?
Every time I see a video of one of you people I think, “Wow, those seem like really nice people. I am probably wrong. They are going to do the right thing.”
But seriously, is that enough? Can I trust a few people with the power to shape the whole universe? Can I trust them enough to actually give them money? Can I trust them enough with my life until the end of the universe?
You can’t even tell me what “best” or “right” or “winning” stands for. How do I know that it can be or will be defined in a way that those labels will apply to me as well?
I have no idea what your plans are for the day when time runs out. I just hope that you are not going to hope for the best and run some not quite friendly AI that does really crappy things. I hope you consider the possibility of rather blowing everything up than risking even worse outcomes.
an organization that’s like SIAI but isn’t too attached to a specific kind of FAI design (that may be too complex and prone to fail in particularly horrible ways), and just wants to do whatever is best.
Do you think SingInst is too attached to a specific kind of FAI design? This isn’t my impression. (Also, at this point, it might be useful to unpack “SingInst” into particular people constituting it.)
Do you think SingInst is too attached to a specific kind of FAI design?
XiXiDu seems to think so. I guess I’m less certain but I didn’t want to question that particular premise in my response to him.
It does confuse me that Eliezer set his focus so early on CEV. I think “it’s too early to decide this” applies to CEV just as well as XiXiDu’s anti-natalist AI. Why not explore and keep all the plausible options open until the many strategically important questions become clearer? Why did it fall to someone outside SIAI (me, in particular) to write about the normative and meta-philosophical approaches to FAI? (Note that the former covers XiXiDu’s idea as a special case.) Also concerning is that many criticisms have been directed at CEV but Eliezer seems to ignore most of them.
Also, at this point, it might be useful to unpack “SingInst” into particular people constituting it.
I’d be surprised if there weren’t people within SingInst who disagree with the focus on CEV, but if so, they seem reluctant to disagree in public so it’s hard to tell who exactly, or how much say they have in what SingInst actually does.
I guess this could all be due to PR considerations. Maybe Eliezer just wanted to focus public attention on CEV because it’s the politically least objectionable FAI approach, and isn’t really terribly attached to the idea when it comes to actually building an FAI. But you can see how an outsider might get that impression...
Yeah, I thought it was explicitly intended more as a political manifesto than a philosophical treatise. I have no idea why so many smart people, like lukeprog, seem to be interpreting it not only as a philosophical basis but as outlining a technical solution.
Why do you think an unknown maximizer would be worse than a not quite friendly AI? Failed Utopia #4-2 sounds much better than a bunch of paperclips. Orgasmium sounds at least as good as paper clips.
I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways:
You sound a bit fixated on doom :-(
What do you make of the idea that the world has been consistently getting better for most of the last 3 billion years (give or take the occasional asteroid strike) - and that the progress is likely to continue?
Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.
Do you think it would be possible to design an intelligence which could do this more reliably?
I don’t understand. Is the claim here that you can build a “decide whether the risk of botched Friendly AI is worth taking machine”, and the risk of botching such a machine is much less than the risk of botching a Friendly AI?
A FAI that includes such “Should I run?” heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.
This is the same principle as for AI’s decisions themselves, where we don’t ask AI’s designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision “Should the AI run?” resolved by designers to “No.” into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).
Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.
The machine Steve proposes might not bear as much risk of creating “living hell” by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu’s concerns.
Expected utility maximization is rational and feasible.
We should be extremely conservative about not implementing a half-baked friendly AI.
If you believe that self-improving AI is inevitable and that creating friendly AI is more difficult than creating unfriendly AI then to launch an AI that simply destroys everything as quickly as possible has a higher expected utility than doing nothing or trying to implement an AI that is not completely friendly.
The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances. To prevent those scenarios one can try to solve friendly AI, which will most likely fail (or even increase the chances of a negative singularity), or try to launch a destructive singleton with simple goals to prevent further suffering and the evolution of life elsewhere in the universe. Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.
Assuming your argument is correct, wouldn’t it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.
I’m skeptical that friendly AI is as difficult as all that because, to take an example, humans are generally considered pretty “wicked” by traditional writers and armchair philosophers, but lately we haven’t been murdering each other or deliberately going out of way to make each other’s lives miserable very often. For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I’m just human. Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that? There were a lot of things in history and science that seemed mystically complex but turned out to be formalizable in compressed ways, such as the mathematics of Darwinian population genetics. Who would have imagined that the “Secrets of Life and Creation” would be revealed like that? But they were. Could “sufficient goodness that we can be convinced the agent won’t put us through hell” also have a compact description that was clearly tractable in retrospect?
Assuming your argument is correct, wouldn’t it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.
There might be countless planets that are about to undergo an evolutionary arms race for the next few billions years resulting in a lot of suffering. It is very unlikely that there is a single source of life that is exactly on the right stage of evolution with exactly the right mind design to not only lead worthwhile lives but also get their AI technology exactly right to not turn everything into a living hell.
In case you assign negative utility to suffering, which is likely to be universally accepted to have negative utility, then given that you are an expected utility maximizer it should be a serious consideration to end all life. Because 1) agents that are an effect of evolution have complex values 2) to satisfy complex values you need to meet complex circumstances 3) complex systems can fail in complex ways 4) any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways.
For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I’m just human.
To name just one example where things could go horrible wrong. Humans are by their very nature interested in domination and sex. Our aversion against sexual exploitation is largely dependent on the memeplex of our cultural and societal circumstances. If you knew more, were smarter and could think faster you might very well realize that such an aversion is a unnecessary remnant that you can easily extinguish to open up new pathways to gain utility. That Gandhi would not agree to have his brain modified into a baby-eater is incredible naive. Given the technology people will alter their preferences and personality. Many people actually perceive their moral reservations to be limiting. It only takes some amount of insight to just overcome such limitations.
You simply can’t be sure that future won’t hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.
Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that?
Maybe not, but betting on the possibility that goodness can be easily achieved is like pulling a random AI from mind design space hoping that it turns out to be friendly.
You simply can’t be sure that future won’t hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.
Similarly, it is easier to make piles of rubble than skyscrapers. Yet—amazingly—there are plenty of skyscrapers out there. Obviously something funny is going on...
The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances.
Hang on, though. That’s still normally better than not existing at all! Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as “below zero”. Most plausible futures just aren’t likely to be that bad for the creatures in them.
Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as “below zero”. Most plausible futures just aren’t likely to be that bad for the creatures in them.
The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy. That’s the case for most people. We just tend to remember the good moments more than the rest of our life.
It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you’re going to lose the fight against the general decay. Any temporary success is just a statistical fluke.
The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy.
No, I’m not!
That’s the case for most people. We just tend to remember the good moments more than the rest of our life.
Yet most creatures would rather live than die—and they show that by choosing to live. Dying is an option—they choose not to take it.
It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you’re going to lose the fight against the general decay. Any temporary success is just a statistical fluke.
It sounds as though by now there should be nothing left but dust and decay! Evidently something is wrong with this reasoning. Evolution produces marvellous wonders—as well as entropy. Your existence is an enormous statistical fluke—but you still exist. There’s no need to be “down” about it.
Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.
Where “success” refers to obliterating yourself and all your descendants. That’s not how most Darwinian creatures usually define success. Natural selection does build creatures that want to die—but only rarely and by mistake.
Personally I don’t want to contribute anything to an organisation which admits to explore strategies that are unacceptable by most people. And I wouldn’t suggest anyone else to do so.
Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I’m confused. Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don’t?
Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I’m confused.
Yet I am not secretive about it and I believe that it is one of the less horrible strategies. Given that SI is strongly attached to decision theoretic ideas, which I believe are not the default outcome due to practically intractable problems, I fear that their strategies might turn out to be much worse than the default case.
I think that it is naive to simply trust SI because they seem like nice people. Although I don’t doubt that they are nice people. But I think that any niceness is easily drowned by their eagerness to take rationality to its logical extreme without noticing that they have reached a point where the consequences constitute a reductio ad absurdum. If game and decision theoretic conjectures show that you can maximize expected utility by torturing lots of people, or by voluntary walking into death camps, then that’s the right thing to do. I don’t think that they are psychopathic personalities per se though. Those people are simply hold captive by their idea of rationality. And that is what makes them extremely dangerous.
Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don’t?
I would denounce myself if I would seriously consider that strategy. But I would also admire them for doing so because I believe that it is the right thing to do given their own framework of beliefs. What they are doing right now seems just hypocritical. Researching FAI will almost certainly lead to worse outcomes than researching how to create an anti-natalist AI as soon possible.
What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point.
I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments. The only danger I see is that some people associated with SI/lesswrong might run rampant once someone demonstrates certain AI capabilities.
All in all I think they are just fooling themselves. They collected massive amounts of speculative math and logic and combined it into a framework of beliefs that can be used to squash any common sense. They have seduced themselves with formulas and lost any ability to discern scribbles on paper from real world rationality. They managed to give a whole new meaning to the idea of model uncertainty by making it reach new dramatic heights.
Bayes’ Theorem, the expected utility formula, and Solomonoff induction are unusable in most but a few limited situations where you have a well-defined testable and falsifiable hypothesis or empirical data. In most situations those heuristics are computationally intractable, one more than the other.
There is simply no way to assign utility to world states without deluding yourself to believe that your decisions are more rational than just trusting your intuition. There is no definition of “utility” that’s precise enough to figure out what a being that maximizes it would do. There can’t be, not without unlimited resources. Any finite list of actions maximizes infinitely many different quantities. Utility does only become well-defined if we add limitations on what sort of quantities we consider. And even then...
Preferences are a nice concept. But they are just as elusive as the idea of a “self”. Preferences are not just malleable but they keep changing as we make more observations, and so does the definition of utility. Which makes it impossible to act in a time-consistent way.
What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point.
I agree with the “not enough data to come to any definitive conclusion” part, but think we could prepare for the Singularity by building an organization that is not attached to any particular plan but is ready to act when there is enough data to come to definitive conclusions (and tries to gather more data in the mean time). Do you agree with this, or do you think we should literally do nothing?
I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments.
I guess I have a higher opinion of SIAI than that. Just a few months ago you were saying:
I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.
I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.
What made you change your mind since then?
I did not change my mind. All I am saying is that I wouldn’t suggest anyone to contribute money to SI who fully believes what they believe. Because that would be counterproductive. If I accepted all of their ideas then I would make the same suggestion as you, to build “an organization that is not attached to any particular plan”.
But I do not share all of their beliefs. Particularly I do not currently believe that there is a strong case that uncontrollable recursive self-improvement is possible. And if it is possible I do not think that it is feasible. And even if it is feasible I believe that it won’t happen any time soon. And if it will happen soon I do not think that SI will have anything to do with it.
I believe that SI is an important organisation that deserves money. Although if I would share their idea of rationality and their technological optimism then the risks would outweigh the benefit.
Why I believe SI deserves money:
It makes people think by confronting them with the logical consequences of state of the art ideas from the field of rationality.
It explores topics and fringe theories that are neglected or worthy of consideration.
It challenges the conventional foundations of charitable giving, causing organisations like GiveWell to reassess and possibly improve their position.
It creates a lot of exciting and fun content and dicussions.
All in all I believe that SI will have a valuable influence. I believe that the world needs people and organisations that explore crazy ideas, that try to treat rare diseases in cute kittens and challenge conventional wisdom. And SI is such an organisation. Just like Roger Penrose and Stuart Hameroff. Just like all the creationists who caused evolutionary biologist to hone their arguments. SI will influence lots of fields and make people contemplate their beliefs.
To fully understand why my criticism of SI and willingness to donate does not contradict, you also have to realize that I do not accept the usual idea of charitable giving that is being voiced here. I think that the reasons for why people like me contribute money to charities and causes are complex and can’t be reduced to something as simple as wanting to do the most good. It is not just about wanting to do good, signaling or warm fuzzies. It is is all of it and much more. I also believe that it is piratically impossible to figure out how to maximize good deeds. And even if you were to do it for selfish reasons, you’d have to figure out what you want in the first place. An idea which is probably “not even wrong”.
I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.
What made you change your mind since then?
Before you throw more of what I wrote in the past at me:
I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.
I don’t have a firm opinion on many issues.
There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it.
Most of what I write is not thought-out. I most often do not consciously contemplate what I write.
I find it very easy to argue for whatever position.
I don’t really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.
I am sometimes not completely honest to exploit the karma system. Although I don’t do that deliberately.
If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does.
The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven’t thought about that either :-)
I just don’t have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.
Before you throw more of what I wrote in the past at me:
I have quoted everything XiXiDu said here so that it is not lost in any future edits.
Many of XiXis contributions consist of persuasive denunciations. As he points out in the parent (and quoted below), often these are based off little research, without much contemplation and are done to provoke reactions rather than because they are correct. Since XiXiDu is rather experienced at this mode of communication—and the arguments he uses have been able to be selected for persuasiveness through trial and error—there is a risk that he will be taken more seriously than is warranted.
The parent should be used to keep things in perspective when XiXiDu is rabble rousing.
I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.
I don’t have a firm opinion on many issues.
There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it.
Most of what I write is not thought-out. I most often do not consciously contemplate what I write.
I find it very easy to argue for whatever position.
I don’t really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.
I am sometimes not completely honest to exploit the karma system. Although I don’t do that deliberately.
If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does.
The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven’t thought about that either :-)
I just don’t have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.
That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone’s remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of “something to protect”: “something to not be culpable for failure to protect”.
If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.
I don’t think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of “whatever I did, it was less indefensible than the alternative”.
(I feel like there has to be some kind of third alternative I’m missing here, that would derail the ongoing damage from this sort of desperate effort by him to compel someone or something to magically generate a way out for him. I think the underlying phenomenon is worth developing some insight into. Alex wouldn’t be the only person with some amount of this kind of psychology going on—just the most visible.)
If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.
I certainly wouldn’t try to make him feel culpability. Or, for that matter, “try to make him” anything at all. I don’t believe I have the ability to influence XiXi significantly and I don’t believe it would be useful to try (any more). It is for this reason that I rather explicitly spoke in the third person to any prospective future readers that it may be appropriate to refer here in the future. Pretending that I was actually talking to XiXiDu when I was clearly speaking to others is would just be insulting to him.
There are possible future cases (and plenty of past cases) where a reply to one of XiXiDu’s fallacious denunciations that consists of simply a link here is more useful than ignoring the comment entirely and hoping that the damage done is minimal.
What is your suggestion then? How do I get out? Delete all of my posts, comments and website like Roko?
Seriously, if it wasn’t for assholes like wedrifid I wouldn’t even bother anymore and just quit. The grandparent was an attempt at honesty, trying to leave. Then that guy comes along claiming that most of my submissions consisted of “persuasive denunciations”. Someone as him who does nothing else all the time. Someone who never argues for his case.
ETA Ah fuck it all. I’ll take another attempt and log out now and not get involved anymore. Happy self-adulation.
You’re suggesting that he might be making arguments that are taken more seriously than they warrant. Unless an argument is based on incorrect facts, it should be taken exactly as seriously as it warrants on its own merits. Why does the source matter?
Even if the audience is assumed to be perfect at evaluating evidence on it’s merits then the source matters to the extent that the authority of the author and the authority of the presentation are considered evidence. Knowing how pieces of evidence were selected also gives information, so knowing about the can provide significant information.
And the above assumption definitely doesn’t hold—people are not perfect at evaluating evidence on it’s merits. Considerations about how arguments optimized through trial error for persuasiveness become rather important when all recipients have known biases and you are actively trying to reduce the damage said biases cause.
Finally, considerations about how active provocation may have an undesirable influence on the community are qualitatively different from considerations about whether a denunciation is accurate. Just because I evaluate XiXiDu’s typical ‘arguments’ as terribly nonsensical thinking that does not mean I should be similarly dismissive of the potential damage that can be done by them, given the expressed intent and tactics. I can evaluate the threat that the quoted agenda has as significant even when I don’t personally take the output of that agenda seriously at all.
...without much contemplation and are done to provoke reactions rather than because they are correct.
Here is how I see it. I am just an uneducated below average IQ individual and don’t spend more time on my submissions than it takes to write them. If people are swayed by my ramblings then how firm could their beliefs possible be in the first place?
Many of XiXis contributions consist of persuasive denunciations. [...] there is a risk that he will be taken more seriously than is warranted.
I could have as easily argued in favor of SI. If I was to start now and put some extra effort into it I believe I could actually become more persuasiveness than SI itself. Do you believe that in a world where I did that you would tell people that my arguments are based on little research and that there is a risk that I am taken more seriously than is warranted?
For sure. XiXiDu uses grammar correctly! (Well, enough so that “become more persuasiveness” struck me as an editing error rather than typical.)
If someone uses grammar correctly it is an overwhelmingly strong indicator that either they are significantly educated (self or otherwise) or have enough intelligence to compensate!
Has anyone constructed even a vaguely plausible outline, let alone a definition, of what would constitute a “human-friendly intelligence”, defined in terms other than effects you don’t want it to have? As you note, humans aren’t human-friendly intelligences, or we wouldn’t have internal existential risk.
The CEV proposal seems to attempt to move the hard bit to technological magic (a superintelligence scanning human brains and working out a solution to human desires that is possible, is coherent and won’t destroy us all) - this is saying “then a miracle occurs” in more words.
Has anyone constructed even a vaguely plausible outline, let alone a definition, of what would constitute a “human-friendly intelligence”, defined in terms other than effects you don’t want it to have?
Er, that’s how it is defined—at least by Yudkowsky. You want to argue definitions? Without even offering one of your own? How will that help?
No, I’m pointing out that a purely negative definition isn’t actually a useful definition that describes the thing the label is supposed to be pointing at. How does one work toward a negative? We can say a few things it isn’t—what is it?
The term “Friendly AI” refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals.
That isn’t a “purely negative” definition in the first place.
Even if it was—would you object to the definition of “hole” on similar grounds?
What exactly is wrong with defining some things in terms of what they are not?
It I say a “safe car” is one that doesn’t kill or hurt people, that seems just fine to me.
The word “artificial” there makes it look like it means more than it does. And humans are just as made of atoms. Let’s try it without that:
The term “friendly intelligence” refers to the production of human-benefiting, non-human-harming actions in intelligences that have advanced to the point of making real-world plans in pursuit of goals.
It’s only described in terms of its effects, and then only vaguely. We have no idea what it would actually be. The CEV plan doesn’t include what it would actually be, it just includes a technological magic step where it’s worked out.
This may be better than nothing, but it’s not enough to say it’s talking about anything that’s actually understood in even the vaguest terms.
For an analogy, what would a gorilla-friendly human-level intelligence be like? How would you reasonably make sure it wasn’t harmful to the future of gorillas? (Humans out the box do pretty badly at this.) What steps would the human take to ascertain the CEV of gorillas, assuming tremendous technological resources?
It seems like an attempt at Oracle AI, which simply strives to answer all questions accurately while otherwise exerting as little influence on the world as possible, would be strictly better than a paperclip maximizer, no? At the very least you wouldn’t see any of the risks of “almost friendly AI”.
You might see some humans getting power over other humans, but to be honest I don’t think that would be worse than humans existing, period. Keep in mind that historically, the humans that were put in power over others were the ones who had the ruthlessness necessary to get to the top – they might not be representative. Can you name any female evil dictators?
Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences? Conditional on its possibility, do you think we should attempt to create such an AI?
I share your trepidation about humans and their values, but I see that as implying that we have to be meta enough such that even if humans are wrong, our AI will still do what is right. It seems to me that this is still a real possibility. For an example of an FAI architecture that is more in this direction, check out CFAI.
Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences?
No. I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don’t think that there exists anything like “human preferences” and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don’t think that there is anything like a “self” that can be used to define what constitutes a human being, not practically anyway. I don’t believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.
I believe that stable goals are impossible and that any attempt at extrapolating the volition of people will alter it.
Besides I believe that we won’t be able to figure out any of the following in time:
The nature of consciousness and its moral significance.
The relation and moral significance of suffering/pain/fun/happiness.
I further believe that the following problems are impossible to solve, respectively constitute a reductio ad absurdum of certain ideas:
I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don’t think that there exists anything like “human preferences” and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don’t think that there is anything like a “self” that can be used to define what constitutes a human being, not practically anyway. I don’t believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.
Strange stuff.
Surely “right” and “wrong” make the most sense in the context of a specified moral system.
If you are using those terms outside such a context, it usually implies some kind of moral realism—in which case, one wonders what sort of moral realism you have in mind.
Friendly AI is incredible hard to get right and a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically.
I vote for antinatalism. It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering. Friendly AI is simply too risky.
I think that humans are not psychological equal. Not only are there many outliers, but most humans would turn into abhorrent creatures given their own pocket universe, unlimited power and a genie. And even given our current world, if we were to remove the huge memeplex of western civilization, most people would act like stone age hunter-gatherer. And that would be bad enough. After all, violence is the major cause of death within stone age socities.
Even proposals like CEV (Coherent Extrapolated Volition) can turn out to be a living hell for a percentage of all beings. I don’t expect any amount of knowledge, or intelligence, to cause humans to abandon their horrible preferences.
Eliezer Yudkowsky says that intelligence does not imply benevolence. That an artificial general intelligence won’t turn out to be friendly. That we have to make it friendly. Yet his best proposal is that humanity will do what is right if we only knew more, thought faster, were more the people we wished we were and had grown up farther together. The idea is that knowledge and intelligence implies benevolence for people. I don’t think so.
The problem is that if you extrapolate chaotic systems, e.g. human preferences given real world influence, small differences in initial conditions are going to yield widely diverging outcomes. That our extrapolated volition converges rather than diverges seems to be a bold prediction.
I just don’t see that a paperclip maximizer burning the cosmic commons is as bad as it is currently portrayed. Sure, it is “bad”. But everything else might be much worse.
Here is a question for those who think that antinatalism is just stupid. Would you be willing to rerun the history of the universe to obtain the current state? Would you be willing to create another Genghis Khan, a new holocaust, allowing intelligent life to evolve?
As Greg Egan wrote: “To get from micro-organisms to intelligent life this way would involve an immense amount of suffering, with billions of sentient creatures living, struggling and dying along the way.”
If you are not willing to do that, then why are you willing to do the same now, just for much longer, by trying to colonize the universe? Are you so sure that the time to come will be much better? How sure are you?
ETA
I expect any friendly AI outcome that fails to be friendly in a certain way to increase negative utility and only a perfectly “friendly” (whatever that means, it is still questionable if the whole idea makes sense) AI to yield a positive utility outcome.
That is because the closer any given AGI design is to friendliness the more likely it is that humans will be kept alive but might suffer. Whereas an unfriendly AI in complete ignorance of human values will more likely just see humans as a material resource without having any particular incentive to keep humans around.
Just imagine a friendly AI which fails to “understand” or care about human boredom.
There are several possibilities by which SIAI could actually cause a direct increase in negative utility.
1) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically.
2) Humans are not provably friendly. Given the power to shape the universe the SIAI might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies.
“Ladies and gentlemen, I believe this machine could create a living hell for the rest of time...”
(audience yawns, people look at their watches)
″...increasing negative utility dramatically!”
(shocked gasps, audience riots)
Do you actually disagree with anything or are you just trying to ridicule it? Do you think that the possibility that FAI research might increase negative utility is not to be taken seriously? Do you think that world states where faulty FAI designs are implemented have on average higher utility than world states where nobody is alive? If so, what research could I possible do to come to the same conclusion? What arguments do I miss? Do I just have to think about it longer?
Consider the way Eliezer Yudkowsky agrues in favor of FAI research:
or
Is his style of argumentation any different from mine except that he promises lots of positive utility?
I was just amused by the anticlimacticness of the quoted sentence (or maybe by how it would be anticlimactic anywhere else but here), the way it explains why a living hell for the rest of time is a bad thing by associating it with something so abstract as a dramatic increase in negative utility. That’s all I meant by that.
Have you considered the many ways something like that could go wrong?
The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it
PM decides there’s a chance that it’s in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them
PM is itself capable of suffering
PM decides to create lots of descendent AIs in order to maximize paperclip production and they happen to be capable of suffering. (Our genes made us to maximize copies of them and we happen to be capable of suffering.)
somebody steals PM’s source code before it’s launched, and makes a sadistic AI
From your perspective, wouldn’t it be better to just build a really big bomb and blow up Earth? Or alternatively, if you want to minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you’d have to solve a lot of the same problems as FAI.
I don’t think that it is likely that it will encounter anything that has equal resources and if it does that suffering would occur (see below).
That seems like one of the problems that have to be solved in order to build an AI that transforms the universe into an inanimate state. But I think it is much easier to make an AI not simulate any other agents than to create a friendly AI. Much more can go wrong by creating a friendly AI, including the possibility that it tortures trillions of beings. In the case of a transformer you just have to make sure that it values an universe that is as close as possible to a state where no computation takes place and that does not engage in any kind of trade, acausal or otherwise.
I believe that any sort of morally significant suffering is an effect of (natural) evolution, and may in fact be dependent on that. I think that the kind of maximizer that SI has in mind is more akin to a transformation process that isn’t consciousness, does not have emotions and cannot suffer. If those qualities would be necessary requirements then I don’t think that we will build an artificial general intelligence any time soon and that if we do it will happen slowly and not be able to undergo dangerous recursive self-improvement.
I think that this is more likely to be the case with friendly AI research because it takes longer.
The reason for why I think that working towards FAI might be a bad idea is that it increases the chance of something going horrible wrong.
If I was to accept the framework of beliefs hold by SI then I would assign a low probability to the possibility that the default scenario in which an AI undergoes recursive self-improvement will include a lot of blackmailing that leads to a lot of suffering. Where the default is that nobody tries to make AI friendly.
I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways:
I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state.
ETA Not sure why the graph looks so messed up. Does anyone know of a better graphing tool?
I think it’s too early to decide this. There are many questions whose answers will become clearer before we have to make a choice one way or another. If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that’s like SIAI but isn’t too attached to the idea of FAI and just wants to do whatever is best.
Now you can either try to build an organization like that from scratch, or try to push SIAI in that direction (i.e., make it more strategic and less attached to a specific plan). Of course, being lazy, I’m more tempted to do the latter, but your miles may vary. :)
Yes.
I, for one, am ultimately concerned with doing whatever’s best. I’m not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.
Well, that’s great. Still, there are quite a few problems.
How do I know
… that SI does not increase existential risk by solving problems that can be used to build AGI earlier?
… that you won’t launch a half-baked friendly AI that will turn the world into a hell?
… that you don’t implement some strategies that will do really bad things to some people, e.g. myself?
Every time I see a video of one of you people I think, “Wow, those seem like really nice people. I am probably wrong. They are going to do the right thing.”
But seriously, is that enough? Can I trust a few people with the power to shape the whole universe? Can I trust them enough to actually give them money? Can I trust them enough with my life until the end of the universe?
You can’t even tell me what “best” or “right” or “winning” stands for. How do I know that it can be or will be defined in a way that those labels will apply to me as well?
I have no idea what your plans are for the day when time runs out. I just hope that you are not going to hope for the best and run some not quite friendly AI that does really crappy things. I hope you consider the possibility of rather blowing everything up than risking even worse outcomes.
Hell no.
This is an open problem. See “How can we be sure a Friendly AI development team will be altruistic?” on my list of open problems.
Blowing everying up would be pretty bad. Bad enough to not encourage the possibility.
“Would you murder a child, if it’s the right thing to do?”
If FAI is by definition a machine that does whatever is best, this distinction doesn’t seem meaningful.
Ok, let me rephrase that to be clearer.
Do you think SingInst is too attached to a specific kind of FAI design? This isn’t my impression. (Also, at this point, it might be useful to unpack “SingInst” into particular people constituting it.)
XiXiDu seems to think so. I guess I’m less certain but I didn’t want to question that particular premise in my response to him.
It does confuse me that Eliezer set his focus so early on CEV. I think “it’s too early to decide this” applies to CEV just as well as XiXiDu’s anti-natalist AI. Why not explore and keep all the plausible options open until the many strategically important questions become clearer? Why did it fall to someone outside SIAI (me, in particular) to write about the normative and meta-philosophical approaches to FAI? (Note that the former covers XiXiDu’s idea as a special case.) Also concerning is that many criticisms have been directed at CEV but Eliezer seems to ignore most of them.
I’d be surprised if there weren’t people within SingInst who disagree with the focus on CEV, but if so, they seem reluctant to disagree in public so it’s hard to tell who exactly, or how much say they have in what SingInst actually does.
I guess this could all be due to PR considerations. Maybe Eliezer just wanted to focus public attention on CEV because it’s the politically least objectionable FAI approach, and isn’t really terribly attached to the idea when it comes to actually building an FAI. But you can see how an outsider might get that impression...
I always thought CEV was half-baked as a technical solution, but as a PR tactic it is...genius.
Yeah, I thought it was explicitly intended more as a political manifesto than a philosophical treatise. I have no idea why so many smart people, like lukeprog, seem to be interpreting it not only as a philosophical basis but as outlining a technical solution.
Why do you think an unknown maximizer would be worse than a not quite friendly AI? Failed Utopia #4-2 sounds much better than a bunch of paperclips. Orgasmium sounds at least as good as paper clips.
Graphs make your case more convincing—even when they are drawn wrong and don’t make sense!
...but seriously: where are you getting the figures in the first graph from?
Are you one of these “negative utilittarians”—who thinks that any form of suffering is terrible?
You sound a bit fixated on doom :-(
What do you make of the idea that the world has been consistently getting better for most of the last 3 billion years (give or take the occasional asteroid strike) - and that the progress is likely to continue?
I have previously mentioned my antipathy regarding the FAI concept. I think FAI is very a dangerous concept, it should be dropped. See this article of mine for more info on my views http://hplusmagazine.com/2012/01/16/my-hostility-towards-the-concept-of-friendly-ai/
Before anyone mentions it, hplusmagazine.com is temporarily down, and someone is in the process of fixing it.
Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.
Do you think it would be possible to design an intelligence which could do this more reliably?
I don’t get it. Design a Friendly AI that can better judge whether it’s worth the risk of botching the design of a Friendly AI?
ETA: I suppose your point applies to some of XiXiDu’s concerns but not others?
A lens that sees its flaws.
I don’t understand. Is the claim here that you can build a “decide whether the risk of botched Friendly AI is worth taking machine”, and the risk of botching such a machine is much less than the risk of botching a Friendly AI?
A FAI that includes such “Should I run?” heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.
This is the same principle as for AI’s decisions themselves, where we don’t ask AI’s designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision “Should the AI run?” resolved by designers to “No.” into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).
If we botched the FAI, wouldn’t we also probably have botched its ability to decide whether it should run?
Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.
The machine Steve proposes might not bear as much risk of creating “living hell” by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu’s concerns.
Here are a few premises:
Complex systems can fail in complex ways.
Destruction is easier than creation.
Expected utility maximization is rational and feasible.
We should be extremely conservative about not implementing a half-baked friendly AI.
If you believe that self-improving AI is inevitable and that creating friendly AI is more difficult than creating unfriendly AI then to launch an AI that simply destroys everything as quickly as possible has a higher expected utility than doing nothing or trying to implement an AI that is not completely friendly.
The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances. To prevent those scenarios one can try to solve friendly AI, which will most likely fail (or even increase the chances of a negative singularity), or try to launch a destructive singleton with simple goals to prevent further suffering and the evolution of life elsewhere in the universe. Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.
Assuming your argument is correct, wouldn’t it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.
I’m skeptical that friendly AI is as difficult as all that because, to take an example, humans are generally considered pretty “wicked” by traditional writers and armchair philosophers, but lately we haven’t been murdering each other or deliberately going out of way to make each other’s lives miserable very often. For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I’m just human. Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that? There were a lot of things in history and science that seemed mystically complex but turned out to be formalizable in compressed ways, such as the mathematics of Darwinian population genetics. Who would have imagined that the “Secrets of Life and Creation” would be revealed like that? But they were. Could “sufficient goodness that we can be convinced the agent won’t put us through hell” also have a compact description that was clearly tractable in retrospect?
There might be countless planets that are about to undergo an evolutionary arms race for the next few billions years resulting in a lot of suffering. It is very unlikely that there is a single source of life that is exactly on the right stage of evolution with exactly the right mind design to not only lead worthwhile lives but also get their AI technology exactly right to not turn everything into a living hell.
In case you assign negative utility to suffering, which is likely to be universally accepted to have negative utility, then given that you are an expected utility maximizer it should be a serious consideration to end all life. Because 1) agents that are an effect of evolution have complex values 2) to satisfy complex values you need to meet complex circumstances 3) complex systems can fail in complex ways 4) any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways.
To name just one example where things could go horrible wrong. Humans are by their very nature interested in domination and sex. Our aversion against sexual exploitation is largely dependent on the memeplex of our cultural and societal circumstances. If you knew more, were smarter and could think faster you might very well realize that such an aversion is a unnecessary remnant that you can easily extinguish to open up new pathways to gain utility. That Gandhi would not agree to have his brain modified into a baby-eater is incredible naive. Given the technology people will alter their preferences and personality. Many people actually perceive their moral reservations to be limiting. It only takes some amount of insight to just overcome such limitations.
You simply can’t be sure that future won’t hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.
Maybe not, but betting on the possibility that goodness can be easily achieved is like pulling a random AI from mind design space hoping that it turns out to be friendly.
Similarly, it is easier to make piles of rubble than skyscrapers. Yet—amazingly—there are plenty of skyscrapers out there. Obviously something funny is going on...
Hang on, though. That’s still normally better than not existing at all! Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as “below zero”. Most plausible futures just aren’t likely to be that bad for the creatures in them.
The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy. That’s the case for most people. We just tend to remember the good moments more than the rest of our life.
It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you’re going to lose the fight against the general decay. Any temporary success is just a statistical fluke.
No, I’m not!
Yet most creatures would rather live than die—and they show that by choosing to live. Dying is an option—they choose not to take it.
It sounds as though by now there should be nothing left but dust and decay! Evidently something is wrong with this reasoning. Evolution produces marvellous wonders—as well as entropy. Your existence is an enormous statistical fluke—but you still exist. There’s no need to be “down” about it.
For some people, this is a solved problem.
Where “success” refers to obliterating yourself and all your descendants. That’s not how most Darwinian creatures usually define success. Natural selection does build creatures that want to die—but only rarely and by mistake.
Earlier, you wrote
Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I’m confused. Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don’t?
Yet I am not secretive about it and I believe that it is one of the less horrible strategies. Given that SI is strongly attached to decision theoretic ideas, which I believe are not the default outcome due to practically intractable problems, I fear that their strategies might turn out to be much worse than the default case.
I think that it is naive to simply trust SI because they seem like nice people. Although I don’t doubt that they are nice people. But I think that any niceness is easily drowned by their eagerness to take rationality to its logical extreme without noticing that they have reached a point where the consequences constitute a reductio ad absurdum. If game and decision theoretic conjectures show that you can maximize expected utility by torturing lots of people, or by voluntary walking into death camps, then that’s the right thing to do. I don’t think that they are psychopathic personalities per se though. Those people are simply hold captive by their idea of rationality. And that is what makes them extremely dangerous.
I would denounce myself if I would seriously consider that strategy. But I would also admire them for doing so because I believe that it is the right thing to do given their own framework of beliefs. What they are doing right now seems just hypocritical. Researching FAI will almost certainly lead to worse outcomes than researching how to create an anti-natalist AI as soon possible.
What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point.
I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments. The only danger I see is that some people associated with SI/lesswrong might run rampant once someone demonstrates certain AI capabilities.
All in all I think they are just fooling themselves. They collected massive amounts of speculative math and logic and combined it into a framework of beliefs that can be used to squash any common sense. They have seduced themselves with formulas and lost any ability to discern scribbles on paper from real world rationality. They managed to give a whole new meaning to the idea of model uncertainty by making it reach new dramatic heights.
Bayes’ Theorem, the expected utility formula, and Solomonoff induction are unusable in most but a few limited situations where you have a well-defined testable and falsifiable hypothesis or empirical data. In most situations those heuristics are computationally intractable, one more than the other.
There is simply no way to assign utility to world states without deluding yourself to believe that your decisions are more rational than just trusting your intuition. There is no definition of “utility” that’s precise enough to figure out what a being that maximizes it would do. There can’t be, not without unlimited resources. Any finite list of actions maximizes infinitely many different quantities. Utility does only become well-defined if we add limitations on what sort of quantities we consider. And even then...
Preferences are a nice concept. But they are just as elusive as the idea of a “self”. Preferences are not just malleable but they keep changing as we make more observations, and so does the definition of utility. Which makes it impossible to act in a time-consistent way.
I agree with the “not enough data to come to any definitive conclusion” part, but think we could prepare for the Singularity by building an organization that is not attached to any particular plan but is ready to act when there is enough data to come to definitive conclusions (and tries to gather more data in the mean time). Do you agree with this, or do you think we should literally do nothing?
I guess I have a higher opinion of SIAI than that. Just a few months ago you were saying:
What made you change your mind since then?
I did not change my mind. All I am saying is that I wouldn’t suggest anyone to contribute money to SI who fully believes what they believe. Because that would be counterproductive. If I accepted all of their ideas then I would make the same suggestion as you, to build “an organization that is not attached to any particular plan”.
But I do not share all of their beliefs. Particularly I do not currently believe that there is a strong case that uncontrollable recursive self-improvement is possible. And if it is possible I do not think that it is feasible. And even if it is feasible I believe that it won’t happen any time soon. And if it will happen soon I do not think that SI will have anything to do with it.
I believe that SI is an important organisation that deserves money. Although if I would share their idea of rationality and their technological optimism then the risks would outweigh the benefit.
Why I believe SI deserves money:
It makes people think by confronting them with the logical consequences of state of the art ideas from the field of rationality.
It explores topics and fringe theories that are neglected or worthy of consideration.
It challenges the conventional foundations of charitable giving, causing organisations like GiveWell to reassess and possibly improve their position.
It creates a lot of exciting and fun content and dicussions.
All in all I believe that SI will have a valuable influence. I believe that the world needs people and organisations that explore crazy ideas, that try to treat rare diseases in cute kittens and challenge conventional wisdom. And SI is such an organisation. Just like Roger Penrose and Stuart Hameroff. Just like all the creationists who caused evolutionary biologist to hone their arguments. SI will influence lots of fields and make people contemplate their beliefs.
To fully understand why my criticism of SI and willingness to donate does not contradict, you also have to realize that I do not accept the usual idea of charitable giving that is being voiced here. I think that the reasons for why people like me contribute money to charities and causes are complex and can’t be reduced to something as simple as wanting to do the most good. It is not just about wanting to do good, signaling or warm fuzzies. It is is all of it and much more. I also believe that it is piratically impossible to figure out how to maximize good deeds. And even if you were to do it for selfish reasons, you’d have to figure out what you want in the first place. An idea which is probably “not even wrong”.
Before you throw more of what I wrote in the past at me:
I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.
I don’t have a firm opinion on many issues.
There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it.
Most of what I write is not thought-out. I most often do not consciously contemplate what I write.
I find it very easy to argue for whatever position.
I don’t really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.
I am sometimes not completely honest to exploit the karma system. Although I don’t do that deliberately.
If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does.
The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven’t thought about that either :-)
I just don’t have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.
I have quoted everything XiXiDu said here so that it is not lost in any future edits.
Many of XiXis contributions consist of persuasive denunciations. As he points out in the parent (and quoted below), often these are based off little research, without much contemplation and are done to provoke reactions rather than because they are correct. Since XiXiDu is rather experienced at this mode of communication—and the arguments he uses have been able to be selected for persuasiveness through trial and error—there is a risk that he will be taken more seriously than is warranted.
The parent should be used to keep things in perspective when XiXiDu is rabble rousing.
That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone’s remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of “something to protect”: “something to not be culpable for failure to protect”.
If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.
I don’t think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of “whatever I did, it was less indefensible than the alternative”.
(I feel like there has to be some kind of third alternative I’m missing here, that would derail the ongoing damage from this sort of desperate effort by him to compel someone or something to magically generate a way out for him. I think the underlying phenomenon is worth developing some insight into. Alex wouldn’t be the only person with some amount of this kind of psychology going on—just the most visible.)
I certainly wouldn’t try to make him feel culpability. Or, for that matter, “try to make him” anything at all. I don’t believe I have the ability to influence XiXi significantly and I don’t believe it would be useful to try (any more). It is for this reason that I rather explicitly spoke in the third person to any prospective future readers that it may be appropriate to refer here in the future. Pretending that I was actually talking to XiXiDu when I was clearly speaking to others is would just be insulting to him.
There are possible future cases (and plenty of past cases) where a reply to one of XiXiDu’s fallacious denunciations that consists of simply a link here is more useful than ignoring the comment entirely and hoping that the damage done is minimal.
Show me just one.
You could easily influence me with actual arguments.
What is your suggestion then? How do I get out? Delete all of my posts, comments and website like Roko?
Seriously, if it wasn’t for assholes like wedrifid I wouldn’t even bother anymore and just quit. The grandparent was an attempt at honesty, trying to leave. Then that guy comes along claiming that most of my submissions consisted of “persuasive denunciations”. Someone as him who does nothing else all the time. Someone who never argues for his case.
ETA Ah fuck it all. I’ll take another attempt and log out now and not get involved anymore. Happy self-adulation.
If a denunciation is accurate, does it really matter what the source is? Sometimes, putting pin to balloon is its own reward.
The rhetorical implication appears to be non-sequitur. Again. Please read more carefully.
You’re suggesting that he might be making arguments that are taken more seriously than they warrant. Unless an argument is based on incorrect facts, it should be taken exactly as seriously as it warrants on its own merits. Why does the source matter?
Even if the audience is assumed to be perfect at evaluating evidence on it’s merits then the source matters to the extent that the authority of the author and the authority of the presentation are considered evidence. Knowing how pieces of evidence were selected also gives information, so knowing about the can provide significant information.
And the above assumption definitely doesn’t hold—people are not perfect at evaluating evidence on it’s merits. Considerations about how arguments optimized through trial error for persuasiveness become rather important when all recipients have known biases and you are actively trying to reduce the damage said biases cause.
Finally, considerations about how active provocation may have an undesirable influence on the community are qualitatively different from considerations about whether a denunciation is accurate. Just because I evaluate XiXiDu’s typical ‘arguments’ as terribly nonsensical thinking that does not mean I should be similarly dismissive of the potential damage that can be done by them, given the expressed intent and tactics. I can evaluate the threat that the quoted agenda has as significant even when I don’t personally take the output of that agenda seriously at all.
You might want to save this as well.
Here is how I see it. I am just an uneducated below average IQ individual and don’t spend more time on my submissions than it takes to write them. If people are swayed by my ramblings then how firm could their beliefs possible be in the first place?
I could have as easily argued in favor of SI. If I was to start now and put some extra effort into it I believe I could actually become more persuasiveness than SI itself. Do you believe that in a world where I did that you would tell people that my arguments are based on little research and that there is a risk that I am taken more seriously than is warranted?
Don’t self-deprecate too much. Have you taken a (somewhat recent) IQ test, say an online matrix test or the Mensa one? (If so, personal prediction.)
Even though LW over-estimates its own IQ, don’t forget how stupid IQ 100 really is.
Don’t be ridiculous.
Yesterday I took an IQ test suggested by muflax and scored 78.
Yeah, I took it too and scored 37 - because my eyes were closed.
Do you really believe that you’re dumber than 90% of all people? (~ IQ of 78; I suppose the SD was 15)
Seriously, do you know just how stupid most humans are?
I deny the data.
For sure. XiXiDu uses grammar correctly! (Well, enough so that “become more persuasiveness” struck me as an editing error rather than typical.)
If someone uses grammar correctly it is an overwhelmingly strong indicator that either they are significantly educated (self or otherwise) or have enough intelligence to compensate!
Given all these facts, it’s pretty hard to take what you say seriously...
As pessimistic as this sounds, I’m not sure if I actually disagree with any of it.
Has anyone constructed even a vaguely plausible outline, let alone a definition, of what would constitute a “human-friendly intelligence”, defined in terms other than effects you don’t want it to have? As you note, humans aren’t human-friendly intelligences, or we wouldn’t have internal existential risk.
The CEV proposal seems to attempt to move the hard bit to technological magic (a superintelligence scanning human brains and working out a solution to human desires that is possible, is coherent and won’t destroy us all) - this is saying “then a miracle occurs” in more words.
It’s possible that particular humans might approximate human friendly intelligences.
Assuming it’s not impossible, how would you know? What constitutes a human-friendly intelligence, in other than negative terms?
Er, that’s how it is defined—at least by Yudkowsky. You want to argue definitions? Without even offering one of your own? How will that help?
No, I’m pointing out that a purely negative definition isn’t actually a useful definition that describes the thing the label is supposed to be pointing at. How does one work toward a negative? We can say a few things it isn’t—what is it?
Yudkowsky says:
That isn’t a “purely negative” definition in the first place.
Even if it was—would you object to the definition of “hole” on similar grounds?
What exactly is wrong with defining some things in terms of what they are not?
It I say a “safe car” is one that doesn’t kill or hurt people, that seems just fine to me.
The word “artificial” there makes it look like it means more than it does. And humans are just as made of atoms. Let’s try it without that:
It’s only described in terms of its effects, and then only vaguely. We have no idea what it would actually be. The CEV plan doesn’t include what it would actually be, it just includes a technological magic step where it’s worked out.
This may be better than nothing, but it’s not enough to say it’s talking about anything that’s actually understood in even the vaguest terms.
For an analogy, what would a gorilla-friendly human-level intelligence be like? How would you reasonably make sure it wasn’t harmful to the future of gorillas? (Humans out the box do pretty badly at this.) What steps would the human take to ascertain the CEV of gorillas, assuming tremendous technological resources?
We can’t answer the “how can you do this?” questions today. If we could we would be done.
It’s true that CEV is an 8-year old, moon-onna-stick wishlist—apparently created without much thought about to how to implement it. C’est la vie.
Interesting thoughts.
It seems like an attempt at Oracle AI, which simply strives to answer all questions accurately while otherwise exerting as little influence on the world as possible, would be strictly better than a paperclip maximizer, no? At the very least you wouldn’t see any of the risks of “almost friendly AI”.
You might see some humans getting power over other humans, but to be honest I don’t think that would be worse than humans existing, period. Keep in mind that historically, the humans that were put in power over others were the ones who had the ruthlessness necessary to get to the top – they might not be representative. Can you name any female evil dictators?
[ignore; was off-topic]
Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences? Conditional on its possibility, do you think we should attempt to create such an AI?
I share your trepidation about humans and their values, but I see that as implying that we have to be meta enough such that even if humans are wrong, our AI will still do what is right. It seems to me that this is still a real possibility. For an example of an FAI architecture that is more in this direction, check out CFAI.
No. I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don’t think that there exists anything like “human preferences” and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don’t think that there is anything like a “self” that can be used to define what constitutes a human being, not practically anyway. I don’t believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.
I believe that stable goals are impossible and that any attempt at extrapolating the volition of people will alter it.
Besides I believe that we won’t be able to figure out any of the following in time:
The nature of consciousness and its moral significance.
The relation and moral significance of suffering/pain/fun/happiness.
I further believe that the following problems are impossible to solve, respectively constitute a reductio ad absurdum of certain ideas:
Utility monsters
Pascal’s Mugging
The Lifespan Dilemma
Strange stuff.
Surely “right” and “wrong” make the most sense in the context of a specified moral system.
If you are using those terms outside such a context, it usually implies some kind of moral realism—in which case, one wonders what sort of moral realism you have in mind.