You claim that favoring the “organismal” over the “evolutionary” fails to accurately identify our values in four cases, but I fail to see any problem with these cases.
I find no problem with upholding the human preference for foods which taste fatty, sugary and salty. (Note that consistently applied, the “organismal” preference would be for the fatty, sugary and salty taste and not foods that are actually fatty, sugary and salty. E.g. We like drinking diet Pepsi with Splenda almost as much as Pepsi and in a way roughly proportional to the success with which Splenda mimics the taste of sugar. We could even go one step further and drop the actual food part, valuing just the experience of [seemingly] eating fatty, sugary and salty foods.) This doesn’t necessarily commit me to valuing an unhealthy diet all things considered because we also have many other preferences, e.g. for our health, which may outweigh this true human value.
The next two cases (fear of snakes and enjoying violence) can be dealt with similarly.
The last one is a little trickier but I think it can be addressed by a similar principle in which one value gets outweighed by a different value. In this case, it would be some higher-order value such as treating like cases alike. The difference here is that rather than being a competing value that outweighs the initial value, it is more like a constitutive value which nullifies the initial value. (Technically, I would prefer to talk here of principles which govern our values rather than necessarily higher order values.)
I thought your arguments throughout this post were similarly shallow and uncharitable to the side you were arguing against. For instance, you go on at length about how disagreements about value are present and intuitions are not consistent across cultures and history, but I don’t see how this is supposed to be any more convincing than talking about how many people in history have believed the earth is flat.
Okay, you’ve defeated the view that ethics is about the values all humans throughout history unanimously agree on. Now what about views that extrapolate not from perfectly consistent, unanimous and foundational intuitions or preferences, but from dynamics in human psychology that tend to shape initially inconsistent and incoherent intuitions to be more consistent and coherent—dynamics, the end result of which can be hard to predict when iteratively applied and which can be misapplied in any given instance in a way analogous to applications of the dynamic over beliefs of favoring the simplest hypothesis consistent with the evidence.
By the way, I don’t mean to claim that your conclusion is obviously wrong. I think someone favoring my type of view about ethics has a heavy burden of proof that you hint at, perhaps even one that has been underappreciated here. I just don’t think your arguments here provide any support for your conclusion.
It seems to me that when you try to provide illustrative examples of how opposing views fail, you end up merely attacking straw men. Perhaps you’d do better if you tried to establish that any opposing views must have some property in common and that such a property dooms those views to failure. Or that opposing views must go one of two mutually exclusive and exhaustive routes in response to some central dilemma and both routes doom them to failure.
I really would like to see the most precise and cogent version of your argument here as I think it could prompt some important progress in filling in the gaps present in the sort of ethical view I favor.
Voted up for thought and effort. BTW, when I started writing this last week, I thought I always preferred organismal preferences.
the “organismal” preference would be for the fatty, sugary and salty taste and not foods that are actually fatty, sugary and salty.
That’s a good point. But in the context of designing a Friendly AI that implements human values, it means we have to design the AI to like fatty, sugary, and salty tastes. Doesn’t that seem odd to you? Maybe not the sort of thing we should be fighting to preserve?
The next two cases (fear of snakes and enjoying violence) can be dealt with similarly.
I don’t see how. Are you going to kill the snakes, or not? Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn’t that seem like building an inconsistency into your utopia? Wouldn’t having a large number of such inconsistencies make utopia unstable, or lacking in integrity?
The last one is a little trickier but I think it can be addressed by a similar principle in which one value gets outweighed by a different value.
That’s how I said we resolve all of these cases. Only it doesn’t get outweighed by a single different value (the Prime Mover model); it gets outweighed by an entire, consistent, locally-optimal energy-minimizing set of values.
… but from dynamics, the end result of which can be hard to predict when iteratively applied and which can be misapplied in any given instance in a way analogous to applications of the dynamic over beliefs of favoring the simplest hypothesis consistent with the evidence.
This seems to be at the core of your comment, but I can’t parse that sentence.
Perhaps you’d do better if you tried to establish that any opposing views must have some property in common and that such a property dooms those views to failure.
My emphasis is not on defeating opposing views (except the initial “preferences are propositions” / ethics-as-geometry view), but on setting out my view, and overcoming the objections to it that I came up with. For instance, when I talked about the intuitions of humans over time not being consistent, I wasn’t attacking the view that human values are universal. I was overcoming the objection that we must have an algorithm for choosing evolutionary or organismal preferences, if we seem to agree on the right conclusion in most cases.
I just don’t think your arguments here provide any support for your conclusion.
Which conclusion did you have in mind? The key conclusion is that value can’t be unambiguously analyzed at a finer level of detail than the behavior, in the way that communication can’t be unambiguously analyzed at a finer level of detail than the proposition. You haven’t said anything about that.
(I just realized this makes me a structuralist above some level of detail, but a post-structuralist below it. Damn.)
I really would like to see the most precise and cogent version of your argument here as I think it could prompt some important progress in filling in the gaps present in the sort of ethical view I favor.
I don’t think I will be any more precise or cogent (at least not as long as I’m not getting paid for it), nor that most readers would have preferred an even longer post. It took me two days to write this. If you don’t think my arguments provide any support for my conclusions, the gap between us is too wide for further elaboration to be worthwhile.
The FAI shouldn’t like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever.
“I don’t see how. Are you going to kill the snakes, or not?”
Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.
″ Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn’t that seem like building an inconsistency into your utopia? Wouldn’t having a large number of such inconsistencies make utopia unstable, or lacking in integrity?”
I don’t understand the problem here. I don’t mean that this is the correct solution, though it is the obvious solution, but rather that I don’t see what the problem is. Ancients, who endorsed violence, generally didn’t understand or believe in personal death anyway.
The FAI shouldn’t like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever.
You’re going back to Eliezer’s plan to build a single OS FAI. I should have clarified that I’m speaking of a plan to make AIs that have human values, for the sake of simplicity. (Which IMHO is a much, much better and safer plan.) Yes, if your goal is to build an OS FAI, that’s correct. It doesn’t get around the problem. Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That’s a tragic waste of a universe.
Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.
Why extrapolate over different possible environments to make a decision in this environment? What does that buy you? Do you do that today?
EDIT: I think I see what you mean. You mean construct a distribution of possible extensions of existing preferences into different environments, and weigh each one according to some function. Such as internal consistency / energy minimization. Which, I would guess, is a preferred Bayesian method of doing CEV.
My intuition is that this won’t work, because what you need to make it work is prior odds over events that have never been observed. I think we need to figure out a way to do the math to settle this.
I don’t understand the problem here.
It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted. It also seems like a recipe for unrest. And, from an engineering perspective, it’s an ugly design. It’s like building a car with extra controls that don’t do anything.
Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That’s a tragic waste of a universe.
Well a key hard problem is: what features about ourselves that we like should we try to ensure endure into the future? Yes some features seem hopelessly provincial, while others seem more universally good, but how can we systematically judge this?
It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted.
I think you’re dancing around a bigger problem: once we have a sufficiently powerful AI, you and I are just a bunch of extra meat and buggy programming. Our physical and mental effort is just not needed or relevant. The purpose of FAI is to make sure that we get put out to pasture in a Friendly way. Or, depending on your mood, you could phrase it as living on in true immortality to watch the glory that we have created unfold.
It’s like building a car with extra controls that don’t do anything.
I think the more important question is what, in this analogy, does the car do?
I get the impression that’s part of the SIAI plan, but it seems to me that the plan entails that that’s all there is, from then on, for the universe. The FAI needs control of all resources to prevent other AIs from being made; and the FAI has no other goals than its human-value-fulfilling goals; so it turns the universe into a rest home for humans.
That’s just another variety of paperclipper.
If I’m wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways, please correct me, and explain how this is possible.
I’m not talking about what I want to do, I’m talking about what SIAI wants to do. What I want to do is incompatible with constructing a singleton and telling it to extrapolate human values and run the universe according to them; as I have explained before.
If you think the future would be less than it could be if the universe was tiled with “rest homes for humans”, why do you expect that an AI which was maximizing human utility would do that?
It depends how far meta you want to go when you say “human utility”. Does that mean sex and chocolate, or complexity and continual novelty?
That’s an ambiguity in CEV—the AI extrapolates human volition, but what’s happening to the humans in the meanwhile? Do they stay the way they are now? Are they continuing to develop? If we suppose that human volition is incompatible with trilobite volition, that means we should expect the humans to evolve/develop new values that are incompatible with the AI’s values extrapolated from humans.
If for some reason humans who liked to torture toddlers became very fit, future humans would evolve to possess values that resulted in many toddlers being tortured. I don’t want that to happen, and am perfectly happy constraining future intelligences (even if they “evolve” from humans or even me) so they don’t. And as always, if you think that you want the future to contain some value shifting, why don’t you believe that an AI designed to fulfill the desires of humanity will cause/let that happen?
I think your article successfully argued that we’re not going to find some “ultimate” set of values that is correct or can be proven. In the end, the programmers of an FAI are going to choose a set of values that they like.
The good news is that human values can include things like generosity, non-interference, personal development, and exploration. “Human values” could even include tolerance of existential risk in return for not destroying other species. Any way that you want an FAI to be is a human value. We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.
But no matter how noble and farsighted the programmers are, to those who don’t share the programmers’ values, the FAI will be a paperclipper.
We’re all paperclippers, and in the true prisoners’ dilemma, we always defect.
We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.
Eliezer needs to say whether he wants to do this, or to save humans. I don’t think you can have it both ways. The OS FAI does not have ambitions or curiousity of its own.
But no matter how noble and farsighted the programmers are, to those who don’t share the programmers’ values, the FAI will be a paperclipper.
I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.
I notice that some of my comment wars with other people arise because they automatically assume that whenever we’re talking about a superintelligence, there’s only one of them. This is in danger of becoming a LW communal assumption. It’s not even likely. (More generally, there’s a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about—even if he doesn’t insist that they are likely.)
I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals.
It is widely expected that this will arise as an important instrumental goal; nothing more than that. I can’t tell if this is what you mean. (When you point out that “trying to take over the universe isn’t utility-maximizing under many circumstances”, it sounds like you’re thinking of taking over the universe as a separate terminal goal, which would indeed be terrible design; an AI without that terminal goal, that can reason the same way you can, can decide not to try to take over the universe if that looks best.)
I notice that some of my comment wars with other people arise because they automatically assume that whenever we’re talking about a superintelligence, there’s only one of them. This is in danger of becoming a LW communal assumption. It’s not even likely.
I probably missed it in some other comment, but which of these do you not buy: (a) huge first-mover advantages from self-improvement (b) preventing other superintelligences as a convergent subgoal (c) that the conjunction of these implies that a singleton superintelligence is likely?
(More generally, there’s a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about—even if he doesn’t insist that they are likely.)
This sounds plausible and bad. Can you think of some other examples?
(More generally, there’s a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about—even if he doesn’t insist that they are likely.)
This is probably just availability bias. These scenarios are easy to recall because we’ve read about them, and we’re psychologically primed for them just by coming to this website.
The assumption of a single AI comes from an assumption that an AI will have zero risk tolerance. It follows from that assumption that the most powerful AI will destroy or limit all other sentient beings within reach.
There’s no reason that an AI couldn’t be programmed to have tolerance for risk. Pursuing a lot of the more noble human values may require it.
I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.
I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.
If I thought they had settled on this and that they were likely to succeed I would probably feel it was very important to work to destroy them. I’m currently not sure about the first and think the second is highly unlikely so it is not a pressing concern.
I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.
It is, however, necessary for an AI to do something of the sort if it’s trying to maximize any sort of utility. Otherwise, risk / waste / competition will cause the universe to be less than optimal.
Trying to take over the universe isn’t utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources, or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents.
By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world—is it purely stupidity that it doesn’t?
The modern world is more peaceful, more enjoyable, and richer because we’ve learned that utility is better maximized by cooperation than by everyone trying to rule the world. Why does this lesson not apply to AIs?
Just what do you think “controlling the universe” means? My cat controls the universe. It probably doesn’t exert this control in a way anywhere near optimal to most sensible preferences, but it does have an impact on everything. How do we decide that a superintelligence “controls the universe”, while my cat “doesn’t”? The only difference is in what kind of the universe we have, which preference it is optimized for. Whatever you truly want, roughly means preferring some states of the universe to other states, and making the universe better for you means controlling it towards your preference. The better the universe, the more specifically its state is specified, the stronger the control. These concepts are just different aspects of the same phenomenon.
By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world—is it purely stupidity that it doesn’t?
For one, the U.S. doesn’t have the military strength. Russia still has enough nuclear warheads and ICBMs to prevent that. (And we suck at being occupying forces.)
I think the situation of the US is similar to a hypothesized AI. Sure, Russia could kill a lot of Americans. But we would probably “win” in the end. By all the logic I’ve heard in this thread, and in others lately about paperclippers, the US should rationally do whatever it has to to be the last man standing.
Well, also the US isn’t a single entity that agrees on all its goals. Some of us for example place a high value on human life. And we vote. Even if the leadership of the United States wanted to wipe out the rest of the planet, there would be limits to how much they could do before others would step in.
Also, most forms of modern human morality strongly disfavor large scale wars simply to impose one’s views. If our AI doesn’t have that sort of belief then that’s not an issue. And if we restrict ourselves to just the issue of other AIs, I’m not sure if I gave a smart AI my morals and preferences it would necessarily see anything wrong with making sure that no other general smart AIs were created.
Well, also the US isn’t a single entity that agrees on all its goals.
I think it is quite plausible that an AI structured with a central unitary authority would be at a competitive disadvantage with an AI that granted some autonomy to sub systems. This at least raises the possibility of goal conflicts between different sub-modules of an efficient AI. There are many examples in nature and in human societies of a tension between efficiency and centralization. It is not clear that an AI could maintain a fully centralized and unified goal structure and out-compete less centralized designs.
An AI that wanted to control even a relatively small region of space like the Earth will still run into issues with the speed of light when it comes to projecting force through geographically dispersed physical presences. The turnaround time is such that decision making autonomy would have to be dispersed to local processing clusters in order to be effective. Hell, even today’s high end processors run into issues with the time it takes an electron to get from one side of the die to the other. It is not obvious that the optimum efficiency balance between local decision making autonomy and a centralized unitary goal system will always favour a singleton type AI.
There is some evidence of evolutionary competition between different cell lines within a single organism. Human history is full of examples of the tension between centralized planning and less centrally coordinated but more efficient systems of delegated authority. We do not see a clear unidirectional trend towards more centralized control or towards larger conglomerations of purely co-operating units (whether they be cells, organisms, humans or genes) in nature or in human societies. It seems to me that the burden of proof is on those who would propose that a system with a unitary goal structure has an unbounded upper physical extent of influence where it can outcompete less unitary arrangements (or even that it can do so over volumes exceeding a few meters to a side).
There is a natural tendency for humans to think of themselves as having a unitary centralized consciousness with a unified goal system. It is pretty clear that this is not the case. It is also natural for programmers trained on single threaded Von-Neumann architectures or those with a mathematical bent to ignore the physical constraints of the speed of light when imagining what an AI might look like. If a human can’t even catch a ball without delegating authority to a semi-autonomous sub-unit I don’t see why we should be confident that non human intelligences subject to the same laws of physics should be immune to such problems.
This at least raises the possibility of goal conflicts between different sub-modules of an efficient AI.
A well designed AI should have an alignment of goals between sub modules that is not achieved in modern decentralized societies. A distributed AI would be like multiple TDT/UDT agents with mutual knowledge that they are maximizing the same utility function, not a bunch of middle managers engaging in empire building at the expense of the corporation they work for.
This is not even something that human AI designers have to figure out how to implement, the seed can be single agent, and it will figure out the multiple sub agent architecture when it needs it over the course of self improvement.
Even if this is possible (which I believe is still an open problem, if you think otherwise I’m sure Eliezer would love to hear from you) you are assuming no competition. The question is not whether this AI can outcompete humans but whether it can outcompete other AIs that are less rigid.
It is not obvious that the optimum efficiency balance between local decision making autonomy and a centralized unitary goal system will always favor a singleton type AI.
I agree that it would probably make a lot of sense for an AI who wished to control any large area of territory to create other AIs to manage local issues. However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests. There is no reason to assume that an AI would create another one, which it intends to delegate substantial power to, which it could get into values disagreements with.
However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests.
This is mere supposition. You are assuming the FAI problem is solvable. I think both evolutionary and economic arguments weigh against this belief. Even if this is possible in theory it may take far longer for a singleton AI to craft its faultlessly loyal minions than for a more… entrepreneurial… AI to churn out ‘good enough’ foot soldiers to wipe out the careful AI.
So if you cloned yourself you would be 100% confident you would never find yourself in a situation where your interests conflicted with your clone? Again, you are assuming the FAI problem is solvable and that the idea of an AI with unchanging values is even coherent.
I am not an AI. I am not an optimization process with an explicit utility function. A copy of an AI that undertook actions which appeared to work against another copy, would be found, on reflection, to have been furthering the terms of their shared utility function.
I am not an optimization process with an explicit utility function.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings. If you disagree with this you need to provide some reasoning—the burden of proof is on those who would claim otherwise it seems to me.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard. I further agree that it is quite likely that the first powerful optimization process created does not have this property, because of the difficulty involved, even if this is the goal that all AI creators have. I will however state that if the first such powerful optimization process is not of the sort I specified, we will all die.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings.
I also agree that the vast majority of mind-design space consists of sloppy approximations that will break down outside their intended environments. This means that most AI designs will kill us. When I use the word AI, I don’t mean a randomly selected mind, I mean a reflectively consistent, effective optimizer of a specific utility function.
In many environments, quick and dirty heuristics will cause enormous advantage, so long as those environments can be expected to continue in the same way for the length of time the heuristic will be in operation. This means that if you have two minds, with equal resources (ie the situation you describe) the one willing to use slapdash heuristics will win, as long as the conditions facing it don’t change. But, the situation where two minds are created with equal resources is unlikely to occur, given that one of them is an AI (as I use the term), even if that AI is not maximizing human utility (ie an FAI). The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment. Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
In short: The AI problem is hard. Sloppy minds are quite possibly going to be made before proper AIs. But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard.
We know that perfect solutions to even quite simple optimization problems are a different kind of hard. We have quite good reason to suspect that this is an essential property of reality and that we will never be able to solve such problems simply. The kinds of problems we are talking about seem likely to be more complex to solve. In other words if (and it is a big if) it is possible to create an optimization process that provably advances a set of values (let’s call it ‘friendly’) it is unlikely to be a perfect optimization process. It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers. I see no reason to suppose that the most effective optimizers will happily fall into the ‘friendly’ subset.
The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment.
I don’t consider this hypothesis proved or self-evident. It is at least plausible but I can think of lots of reasons why it might not be true. Taking an outside view, we do not see much evidence from evolution or human societies of ‘winner takes all’ being a common outcome (we see much diversity in nature and human society), nor of first mover advantage always leading to an insurmountable lead. And yes, I know there are lots of reasons why ‘self improving AI is different’ but I don’t consider the matter settled. It is a realistic enough concern for me to broadly support SIAI’s efforts but it is by no means the only possible outcome.
Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
Why does any goal directed agent ‘allow’ other agents to conflict with its goals? Because it isn’t strong enough to prevent them. We know of no counter examples in all of history to the hypothesis that all goal directed agents have limits. This does not rule out the possibility that a self improving AI would be the first counter-example but neither does it make me as sure of that claim as many here seem to be.
But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
I understand the claim. I am not yet convinced it is possible or likely.
It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers.
I agree that human values are unlikely to be the easiest to maximize. However, for another mind to optimize our universe, it needs to be created. This is why SIAI advocates creating an AI friendly to humans before other optimization processes are created.
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment? Would convincing argument for the intelligence explosion cause you to change your mind?
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment?
More or less, though I actually lean towards it being likely rather than merely possible. I am also making the related claim that a widely spatially dispersed entity with a single coherent goal system may be a highly unstable configuration.
Would convincing argument for the intelligence explosion cause you to change your mind?
On the first point, yes. I don’t believe I’ve seen my points addressed in detail, though it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground. I will take some time to follow up on that later.
it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground.
I’m working my way through it and indeed it does. Robin Hanson’s post Dreams of Autarky is close to my position. I think there are other computational, economic and physical arguments in this direction as well.
It certainly does if the utility function doesn’t refer to anything indexical; and an agent with an indexical utility function can build another agent (not a copy of itself, though) with a differently-represented (non-indexical) utility function that represents the same third-person preference ordering.
It should apply to AIs if you think that there will be multiple AIs that are at roughly the same capability level. A common assumption here is that as soon as there is a single general AI it will quickly improve to the point where it is so far beyond everything else in capability that there capabilities won’t matter. Frankly, I find this assumption to be highly questionable and very optimistic about potential fooming rates among other problems, but if one accepts the idea it makes some sense. The analogy might be to the hypothetical situation of the US instead of having just the strongest military but also having monopolies on cheap fusion power, an immortality pill, and having a bunch of superheroes on their side. The distinction between the US controlling everything and the US having direct military control might quickly become irrelevant.
Edit: Thinking about the rate of fooming issue. I’d be really interested if a fast-foom proponent would be willing to put together a top-level post outlining why fooming will happen so quickly.
Eliezer and Robin had a lengthy debate on this perhaps a year ago. I don’t remember if it’s on OB or LW. Robin believes in no foom, using economic arguments.
The people who design the first AI could build a large number of AIs in different locations and turn them on at the same time. This plan would have a high probability of leading to disaster; but so do all the other plans that I’ve heard.
Trying to take over the universe isn’t utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources
Obviously, if you can’t take over the world, then trying is stupid. If you can (for example, if you’re the first SAI to go foom) then it’s a different story.
or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents.
Taking over the world does not require you to destroy all other life if that is contrary to your utility function. I’m not sure what you mean regarding future-discounting; if reorganizing the whole damn universe isn’t worth it, then I doubt anything else will be in any case.
If Michael was responding to the problem that human preference systems can’t be unambiguously extended into new environments, then my chronologically first response applies, but needs more thought; and I’m embarrassed that I didn’t anticipate that particular response.
If he was responding to the problem that human preferences as described by their actions, and as described by their beliefs, are not the same, then my second response applies.
Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.
If a person could label each preference system “evolutionary” or “organismal”, meaning which value they preferred, then you could use that to help you extrapolate their values into novel environments.
The problem is that the person is reasoning only over the propositional part of their values. They don’t know what their values are; they know only what the contribution within the propositional part is. That’s one of the main points of my post. The values they come up with will not always be the values they actually implement.
If you define a person’s values as being what they believe their values are, then, sure, most of what I posted will not be a problem. I think you’re missing the point of the post, and are using the geometry-based definition of identity.
If you can’t say whether the right value to choose in each case is evolutionary or organismal, then extrapolating into future environments isn’t going to help. You can’t gain information to make a decision in your current environment by hypothesizing an extension to your environment, making observations in that imagined environment, and using them to refine your current-environment estimates. That’s like trying to refine your estimate of an asteroid’s current position by simulating its movement into the future, and then tracking backwards along that projected trajectory to the present. It’s trying to get information for free. You can’t do that.
(I think what I said under “Fuzzy values and fancy math don’t help” is also relevant.)
I suppose I might count as someone who favors “organismal” preferences over confusing the metaphorical “preferences” of our genes with those of the individual. I think your argument against this is pretty weak.
You claim that favoring the “organismal” over the “evolutionary” fails to accurately identify our values in four cases, but I fail to see any problem with these cases.
I find no problem with upholding the human preference for foods which taste fatty, sugary and salty. (Note that consistently applied, the “organismal” preference would be for the fatty, sugary and salty taste and not foods that are actually fatty, sugary and salty. E.g. We like drinking diet Pepsi with Splenda almost as much as Pepsi and in a way roughly proportional to the success with which Splenda mimics the taste of sugar. We could even go one step further and drop the actual food part, valuing just the experience of [seemingly] eating fatty, sugary and salty foods.) This doesn’t necessarily commit me to valuing an unhealthy diet all things considered because we also have many other preferences, e.g. for our health, which may outweigh this true human value.
The next two cases (fear of snakes and enjoying violence) can be dealt with similarly.
The last one is a little trickier but I think it can be addressed by a similar principle in which one value gets outweighed by a different value. In this case, it would be some higher-order value such as treating like cases alike. The difference here is that rather than being a competing value that outweighs the initial value, it is more like a constitutive value which nullifies the initial value. (Technically, I would prefer to talk here of principles which govern our values rather than necessarily higher order values.)
I thought your arguments throughout this post were similarly shallow and uncharitable to the side you were arguing against. For instance, you go on at length about how disagreements about value are present and intuitions are not consistent across cultures and history, but I don’t see how this is supposed to be any more convincing than talking about how many people in history have believed the earth is flat.
Okay, you’ve defeated the view that ethics is about the values all humans throughout history unanimously agree on. Now what about views that extrapolate not from perfectly consistent, unanimous and foundational intuitions or preferences, but from dynamics in human psychology that tend to shape initially inconsistent and incoherent intuitions to be more consistent and coherent—dynamics, the end result of which can be hard to predict when iteratively applied and which can be misapplied in any given instance in a way analogous to applications of the dynamic over beliefs of favoring the simplest hypothesis consistent with the evidence.
By the way, I don’t mean to claim that your conclusion is obviously wrong. I think someone favoring my type of view about ethics has a heavy burden of proof that you hint at, perhaps even one that has been underappreciated here. I just don’t think your arguments here provide any support for your conclusion.
It seems to me that when you try to provide illustrative examples of how opposing views fail, you end up merely attacking straw men. Perhaps you’d do better if you tried to establish that any opposing views must have some property in common and that such a property dooms those views to failure. Or that opposing views must go one of two mutually exclusive and exhaustive routes in response to some central dilemma and both routes doom them to failure.
I really would like to see the most precise and cogent version of your argument here as I think it could prompt some important progress in filling in the gaps present in the sort of ethical view I favor.
Voted up for thought and effort. BTW, when I started writing this last week, I thought I always preferred organismal preferences.
That’s a good point. But in the context of designing a Friendly AI that implements human values, it means we have to design the AI to like fatty, sugary, and salty tastes. Doesn’t that seem odd to you? Maybe not the sort of thing we should be fighting to preserve?
I don’t see how. Are you going to kill the snakes, or not? Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn’t that seem like building an inconsistency into your utopia? Wouldn’t having a large number of such inconsistencies make utopia unstable, or lacking in integrity?
That’s how I said we resolve all of these cases. Only it doesn’t get outweighed by a single different value (the Prime Mover model); it gets outweighed by an entire, consistent, locally-optimal energy-minimizing set of values.
This seems to be at the core of your comment, but I can’t parse that sentence.
My emphasis is not on defeating opposing views (except the initial “preferences are propositions” / ethics-as-geometry view), but on setting out my view, and overcoming the objections to it that I came up with. For instance, when I talked about the intuitions of humans over time not being consistent, I wasn’t attacking the view that human values are universal. I was overcoming the objection that we must have an algorithm for choosing evolutionary or organismal preferences, if we seem to agree on the right conclusion in most cases.
Which conclusion did you have in mind? The key conclusion is that value can’t be unambiguously analyzed at a finer level of detail than the behavior, in the way that communication can’t be unambiguously analyzed at a finer level of detail than the proposition. You haven’t said anything about that.
(I just realized this makes me a structuralist above some level of detail, but a post-structuralist below it. Damn.)
I don’t think I will be any more precise or cogent (at least not as long as I’m not getting paid for it), nor that most readers would have preferred an even longer post. It took me two days to write this. If you don’t think my arguments provide any support for my conclusions, the gap between us is too wide for further elaboration to be worthwhile.
What is the ethical view you favor?
The FAI shouldn’t like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever.
“I don’t see how. Are you going to kill the snakes, or not?”
Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.
″ Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn’t that seem like building an inconsistency into your utopia? Wouldn’t having a large number of such inconsistencies make utopia unstable, or lacking in integrity?”
I don’t understand the problem here. I don’t mean that this is the correct solution, though it is the obvious solution, but rather that I don’t see what the problem is. Ancients, who endorsed violence, generally didn’t understand or believe in personal death anyway.
You’re going back to Eliezer’s plan to build a single OS FAI. I should have clarified that I’m speaking of a plan to make AIs that have human values, for the sake of simplicity. (Which IMHO is a much, much better and safer plan.) Yes, if your goal is to build an OS FAI, that’s correct. It doesn’t get around the problem. Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That’s a tragic waste of a universe.
Why extrapolate over different possible environments to make a decision in this environment? What does that buy you? Do you do that today?
EDIT: I think I see what you mean. You mean construct a distribution of possible extensions of existing preferences into different environments, and weigh each one according to some function. Such as internal consistency / energy minimization. Which, I would guess, is a preferred Bayesian method of doing CEV.
My intuition is that this won’t work, because what you need to make it work is prior odds over events that have never been observed. I think we need to figure out a way to do the math to settle this.
It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted. It also seems like a recipe for unrest. And, from an engineering perspective, it’s an ugly design. It’s like building a car with extra controls that don’t do anything.
Well a key hard problem is: what features about ourselves that we like should we try to ensure endure into the future? Yes some features seem hopelessly provincial, while others seem more universally good, but how can we systematically judge this?
It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted.
I think you’re dancing around a bigger problem: once we have a sufficiently powerful AI, you and I are just a bunch of extra meat and buggy programming. Our physical and mental effort is just not needed or relevant. The purpose of FAI is to make sure that we get put out to pasture in a Friendly way. Or, depending on your mood, you could phrase it as living on in true immortality to watch the glory that we have created unfold.
It’s like building a car with extra controls that don’t do anything.
I think the more important question is what, in this analogy, does the car do?
I get the impression that’s part of the SIAI plan, but it seems to me that the plan entails that that’s all there is, from then on, for the universe. The FAI needs control of all resources to prevent other AIs from being made; and the FAI has no other goals than its human-value-fulfilling goals; so it turns the universe into a rest home for humans.
That’s just another variety of paperclipper.
If I’m wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways, please correct me, and explain how this is possible.
If you want the universe to develop in interesting ways, then why not explicitly optimize it for interestingness, however you define that?
I’m not talking about what I want to do, I’m talking about what SIAI wants to do. What I want to do is incompatible with constructing a singleton and telling it to extrapolate human values and run the universe according to them; as I have explained before.
If you think the future would be less than it could be if the universe was tiled with “rest homes for humans”, why do you expect that an AI which was maximizing human utility would do that?
It depends how far meta you want to go when you say “human utility”. Does that mean sex and chocolate, or complexity and continual novelty?
That’s an ambiguity in CEV—the AI extrapolates human volition, but what’s happening to the humans in the meanwhile? Do they stay the way they are now? Are they continuing to develop? If we suppose that human volition is incompatible with trilobite volition, that means we should expect the humans to evolve/develop new values that are incompatible with the AI’s values extrapolated from humans.
If for some reason humans who liked to torture toddlers became very fit, future humans would evolve to possess values that resulted in many toddlers being tortured. I don’t want that to happen, and am perfectly happy constraining future intelligences (even if they “evolve” from humans or even me) so they don’t. And as always, if you think that you want the future to contain some value shifting, why don’t you believe that an AI designed to fulfill the desires of humanity will cause/let that happen?
I think your article successfully argued that we’re not going to find some “ultimate” set of values that is correct or can be proven. In the end, the programmers of an FAI are going to choose a set of values that they like.
The good news is that human values can include things like generosity, non-interference, personal development, and exploration. “Human values” could even include tolerance of existential risk in return for not destroying other species. Any way that you want an FAI to be is a human value. We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.
But no matter how noble and farsighted the programmers are, to those who don’t share the programmers’ values, the FAI will be a paperclipper.
We’re all paperclippers, and in the true prisoners’ dilemma, we always defect.
Upvoted, but -
Eliezer needs to say whether he wants to do this, or to save humans. I don’t think you can have it both ways. The OS FAI does not have ambitions or curiousity of its own.
I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.
I notice that some of my comment wars with other people arise because they automatically assume that whenever we’re talking about a superintelligence, there’s only one of them. This is in danger of becoming a LW communal assumption. It’s not even likely. (More generally, there’s a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about—even if he doesn’t insist that they are likely.)
It is widely expected that this will arise as an important instrumental goal; nothing more than that. I can’t tell if this is what you mean. (When you point out that “trying to take over the universe isn’t utility-maximizing under many circumstances”, it sounds like you’re thinking of taking over the universe as a separate terminal goal, which would indeed be terrible design; an AI without that terminal goal, that can reason the same way you can, can decide not to try to take over the universe if that looks best.)
I probably missed it in some other comment, but which of these do you not buy: (a) huge first-mover advantages from self-improvement (b) preventing other superintelligences as a convergent subgoal (c) that the conjunction of these implies that a singleton superintelligence is likely?
This sounds plausible and bad. Can you think of some other examples?
This is probably just availability bias. These scenarios are easy to recall because we’ve read about them, and we’re psychologically primed for them just by coming to this website.
He did. FAI should not be a person—it’s just an optimization process.
ETA: link
Thanks! I’ll take that as definitive.
The assumption of a single AI comes from an assumption that an AI will have zero risk tolerance. It follows from that assumption that the most powerful AI will destroy or limit all other sentient beings within reach.
There’s no reason that an AI couldn’t be programmed to have tolerance for risk. Pursuing a lot of the more noble human values may require it.
I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.
If I thought they had settled on this and that they were likely to succeed I would probably feel it was very important to work to destroy them. I’m currently not sure about the first and think the second is highly unlikely so it is not a pressing concern.
It is, however, necessary for an AI to do something of the sort if it’s trying to maximize any sort of utility. Otherwise, risk / waste / competition will cause the universe to be less than optimal.
Trying to take over the universe isn’t utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources, or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents.
By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world—is it purely stupidity that it doesn’t?
The modern world is more peaceful, more enjoyable, and richer because we’ve learned that utility is better maximized by cooperation than by everyone trying to rule the world. Why does this lesson not apply to AIs?
Just what do you think “controlling the universe” means? My cat controls the universe. It probably doesn’t exert this control in a way anywhere near optimal to most sensible preferences, but it does have an impact on everything. How do we decide that a superintelligence “controls the universe”, while my cat “doesn’t”? The only difference is in what kind of the universe we have, which preference it is optimized for. Whatever you truly want, roughly means preferring some states of the universe to other states, and making the universe better for you means controlling it towards your preference. The better the universe, the more specifically its state is specified, the stronger the control. These concepts are just different aspects of the same phenomenon.
For one, the U.S. doesn’t have the military strength. Russia still has enough nuclear warheads and ICBMs to prevent that. (And we suck at being occupying forces.)
I think the situation of the US is similar to a hypothesized AI. Sure, Russia could kill a lot of Americans. But we would probably “win” in the end. By all the logic I’ve heard in this thread, and in others lately about paperclippers, the US should rationally do whatever it has to to be the last man standing.
Well, also the US isn’t a single entity that agrees on all its goals. Some of us for example place a high value on human life. And we vote. Even if the leadership of the United States wanted to wipe out the rest of the planet, there would be limits to how much they could do before others would step in.
Also, most forms of modern human morality strongly disfavor large scale wars simply to impose one’s views. If our AI doesn’t have that sort of belief then that’s not an issue. And if we restrict ourselves to just the issue of other AIs, I’m not sure if I gave a smart AI my morals and preferences it would necessarily see anything wrong with making sure that no other general smart AIs were created.
I think it is quite plausible that an AI structured with a central unitary authority would be at a competitive disadvantage with an AI that granted some autonomy to sub systems. This at least raises the possibility of goal conflicts between different sub-modules of an efficient AI. There are many examples in nature and in human societies of a tension between efficiency and centralization. It is not clear that an AI could maintain a fully centralized and unified goal structure and out-compete less centralized designs.
An AI that wanted to control even a relatively small region of space like the Earth will still run into issues with the speed of light when it comes to projecting force through geographically dispersed physical presences. The turnaround time is such that decision making autonomy would have to be dispersed to local processing clusters in order to be effective. Hell, even today’s high end processors run into issues with the time it takes an electron to get from one side of the die to the other. It is not obvious that the optimum efficiency balance between local decision making autonomy and a centralized unitary goal system will always favour a singleton type AI.
There is some evidence of evolutionary competition between different cell lines within a single organism. Human history is full of examples of the tension between centralized planning and less centrally coordinated but more efficient systems of delegated authority. We do not see a clear unidirectional trend towards more centralized control or towards larger conglomerations of purely co-operating units (whether they be cells, organisms, humans or genes) in nature or in human societies. It seems to me that the burden of proof is on those who would propose that a system with a unitary goal structure has an unbounded upper physical extent of influence where it can outcompete less unitary arrangements (or even that it can do so over volumes exceeding a few meters to a side).
There is a natural tendency for humans to think of themselves as having a unitary centralized consciousness with a unified goal system. It is pretty clear that this is not the case. It is also natural for programmers trained on single threaded Von-Neumann architectures or those with a mathematical bent to ignore the physical constraints of the speed of light when imagining what an AI might look like. If a human can’t even catch a ball without delegating authority to a semi-autonomous sub-unit I don’t see why we should be confident that non human intelligences subject to the same laws of physics should be immune to such problems.
A well designed AI should have an alignment of goals between sub modules that is not achieved in modern decentralized societies. A distributed AI would be like multiple TDT/UDT agents with mutual knowledge that they are maximizing the same utility function, not a bunch of middle managers engaging in empire building at the expense of the corporation they work for.
This is not even something that human AI designers have to figure out how to implement, the seed can be single agent, and it will figure out the multiple sub agent architecture when it needs it over the course of self improvement.
Even if this is possible (which I believe is still an open problem, if you think otherwise I’m sure Eliezer would love to hear from you) you are assuming no competition. The question is not whether this AI can outcompete humans but whether it can outcompete other AIs that are less rigid.
I agree that it would probably make a lot of sense for an AI who wished to control any large area of territory to create other AIs to manage local issues. However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests. There is no reason to assume that an AI would create another one, which it intends to delegate substantial power to, which it could get into values disagreements with.
This is mere supposition. You are assuming the FAI problem is solvable. I think both evolutionary and economic arguments weigh against this belief. Even if this is possible in theory it may take far longer for a singleton AI to craft its faultlessly loyal minions than for a more… entrepreneurial… AI to churn out ‘good enough’ foot soldiers to wipe out the careful AI.
No. All an AI needs to do to create another AI which shares its values is to copy itself.
So if you cloned yourself you would be 100% confident you would never find yourself in a situation where your interests conflicted with your clone? Again, you are assuming the FAI problem is solvable and that the idea of an AI with unchanging values is even coherent.
I am not an AI. I am not an optimization process with an explicit utility function. A copy of an AI that undertook actions which appeared to work against another copy, would be found, on reflection, to have been furthering the terms of their shared utility function.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings. If you disagree with this you need to provide some reasoning—the burden of proof is on those who would claim otherwise it seems to me.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard. I further agree that it is quite likely that the first powerful optimization process created does not have this property, because of the difficulty involved, even if this is the goal that all AI creators have. I will however state that if the first such powerful optimization process is not of the sort I specified, we will all die.
I also agree that the vast majority of mind-design space consists of sloppy approximations that will break down outside their intended environments. This means that most AI designs will kill us. When I use the word AI, I don’t mean a randomly selected mind, I mean a reflectively consistent, effective optimizer of a specific utility function.
In many environments, quick and dirty heuristics will cause enormous advantage, so long as those environments can be expected to continue in the same way for the length of time the heuristic will be in operation. This means that if you have two minds, with equal resources (ie the situation you describe) the one willing to use slapdash heuristics will win, as long as the conditions facing it don’t change. But, the situation where two minds are created with equal resources is unlikely to occur, given that one of them is an AI (as I use the term), even if that AI is not maximizing human utility (ie an FAI). The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment. Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
In short: The AI problem is hard. Sloppy minds are quite possibly going to be made before proper AIs. But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
We know that perfect solutions to even quite simple optimization problems are a different kind of hard. We have quite good reason to suspect that this is an essential property of reality and that we will never be able to solve such problems simply. The kinds of problems we are talking about seem likely to be more complex to solve. In other words if (and it is a big if) it is possible to create an optimization process that provably advances a set of values (let’s call it ‘friendly’) it is unlikely to be a perfect optimization process. It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers. I see no reason to suppose that the most effective optimizers will happily fall into the ‘friendly’ subset.
I don’t consider this hypothesis proved or self-evident. It is at least plausible but I can think of lots of reasons why it might not be true. Taking an outside view, we do not see much evidence from evolution or human societies of ‘winner takes all’ being a common outcome (we see much diversity in nature and human society), nor of first mover advantage always leading to an insurmountable lead. And yes, I know there are lots of reasons why ‘self improving AI is different’ but I don’t consider the matter settled. It is a realistic enough concern for me to broadly support SIAI’s efforts but it is by no means the only possible outcome.
Why does any goal directed agent ‘allow’ other agents to conflict with its goals? Because it isn’t strong enough to prevent them. We know of no counter examples in all of history to the hypothesis that all goal directed agents have limits. This does not rule out the possibility that a self improving AI would be the first counter-example but neither does it make me as sure of that claim as many here seem to be.
I understand the claim. I am not yet convinced it is possible or likely.
I agree that human values are unlikely to be the easiest to maximize. However, for another mind to optimize our universe, it needs to be created. This is why SIAI advocates creating an AI friendly to humans before other optimization processes are created.
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment? Would convincing argument for the intelligence explosion cause you to change your mind?
More or less, though I actually lean towards it being likely rather than merely possible. I am also making the related claim that a widely spatially dispersed entity with a single coherent goal system may be a highly unstable configuration.
On the first point, yes. I don’t believe I’ve seen my points addressed in detail, though it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground. I will take some time to follow up on that later.
I’m working my way through it and indeed it does. Robin Hanson’s post Dreams of Autarky is close to my position. I think there are other computational, economic and physical arguments in this direction as well.
It’s not obvious that “shared utility function” means something definite, though.
It certainly does if the utility function doesn’t refer to anything indexical; and an agent with an indexical utility function can build another agent (not a copy of itself, though) with a differently-represented (non-indexical) utility function that represents the same third-person preference ordering.
It should apply to AIs if you think that there will be multiple AIs that are at roughly the same capability level. A common assumption here is that as soon as there is a single general AI it will quickly improve to the point where it is so far beyond everything else in capability that there capabilities won’t matter. Frankly, I find this assumption to be highly questionable and very optimistic about potential fooming rates among other problems, but if one accepts the idea it makes some sense. The analogy might be to the hypothetical situation of the US instead of having just the strongest military but also having monopolies on cheap fusion power, an immortality pill, and having a bunch of superheroes on their side. The distinction between the US controlling everything and the US having direct military control might quickly become irrelevant.
Edit: Thinking about the rate of fooming issue. I’d be really interested if a fast-foom proponent would be willing to put together a top-level post outlining why fooming will happen so quickly.
Eliezer and Robin had a lengthy debate on this perhaps a year ago. I don’t remember if it’s on OB or LW. Robin believes in no foom, using economic arguments.
The people who design the first AI could build a large number of AIs in different locations and turn them on at the same time. This plan would have a high probability of leading to disaster; but so do all the other plans that I’ve heard.
http://wiki.lesswrong.com/wiki/The_Hanson-Yudkowsky_AI-Foom_Debate
Reading now. Looks very interesting.
Obviously, if you can’t take over the world, then trying is stupid. If you can (for example, if you’re the first SAI to go foom) then it’s a different story.
Taking over the world does not require you to destroy all other life if that is contrary to your utility function. I’m not sure what you mean regarding future-discounting; if reorganizing the whole damn universe isn’t worth it, then I doubt anything else will be in any case.
I’m getting lost in my own argument.
If Michael was responding to the problem that human preference systems can’t be unambiguously extended into new environments, then my chronologically first response applies, but needs more thought; and I’m embarrassed that I didn’t anticipate that particular response.
If he was responding to the problem that human preferences as described by their actions, and as described by their beliefs, are not the same, then my second response applies.
If a person could label each preference system “evolutionary” or “organismal”, meaning which value they preferred, then you could use that to help you extrapolate their values into novel environments.
The problem is that the person is reasoning only over the propositional part of their values. They don’t know what their values are; they know only what the contribution within the propositional part is. That’s one of the main points of my post. The values they come up with will not always be the values they actually implement.
If you define a person’s values as being what they believe their values are, then, sure, most of what I posted will not be a problem. I think you’re missing the point of the post, and are using the geometry-based definition of identity.
If you can’t say whether the right value to choose in each case is evolutionary or organismal, then extrapolating into future environments isn’t going to help. You can’t gain information to make a decision in your current environment by hypothesizing an extension to your environment, making observations in that imagined environment, and using them to refine your current-environment estimates. That’s like trying to refine your estimate of an asteroid’s current position by simulating its movement into the future, and then tracking backwards along that projected trajectory to the present. It’s trying to get information for free. You can’t do that.
(I think what I said under “Fuzzy values and fancy math don’t help” is also relevant.)