However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests.
This is mere supposition. You are assuming the FAI problem is solvable. I think both evolutionary and economic arguments weigh against this belief. Even if this is possible in theory it may take far longer for a singleton AI to craft its faultlessly loyal minions than for a more… entrepreneurial… AI to churn out ‘good enough’ foot soldiers to wipe out the careful AI.
So if you cloned yourself you would be 100% confident you would never find yourself in a situation where your interests conflicted with your clone? Again, you are assuming the FAI problem is solvable and that the idea of an AI with unchanging values is even coherent.
I am not an AI. I am not an optimization process with an explicit utility function. A copy of an AI that undertook actions which appeared to work against another copy, would be found, on reflection, to have been furthering the terms of their shared utility function.
I am not an optimization process with an explicit utility function.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings. If you disagree with this you need to provide some reasoning—the burden of proof is on those who would claim otherwise it seems to me.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard. I further agree that it is quite likely that the first powerful optimization process created does not have this property, because of the difficulty involved, even if this is the goal that all AI creators have. I will however state that if the first such powerful optimization process is not of the sort I specified, we will all die.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings.
I also agree that the vast majority of mind-design space consists of sloppy approximations that will break down outside their intended environments. This means that most AI designs will kill us. When I use the word AI, I don’t mean a randomly selected mind, I mean a reflectively consistent, effective optimizer of a specific utility function.
In many environments, quick and dirty heuristics will cause enormous advantage, so long as those environments can be expected to continue in the same way for the length of time the heuristic will be in operation. This means that if you have two minds, with equal resources (ie the situation you describe) the one willing to use slapdash heuristics will win, as long as the conditions facing it don’t change. But, the situation where two minds are created with equal resources is unlikely to occur, given that one of them is an AI (as I use the term), even if that AI is not maximizing human utility (ie an FAI). The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment. Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
In short: The AI problem is hard. Sloppy minds are quite possibly going to be made before proper AIs. But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard.
We know that perfect solutions to even quite simple optimization problems are a different kind of hard. We have quite good reason to suspect that this is an essential property of reality and that we will never be able to solve such problems simply. The kinds of problems we are talking about seem likely to be more complex to solve. In other words if (and it is a big if) it is possible to create an optimization process that provably advances a set of values (let’s call it ‘friendly’) it is unlikely to be a perfect optimization process. It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers. I see no reason to suppose that the most effective optimizers will happily fall into the ‘friendly’ subset.
The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment.
I don’t consider this hypothesis proved or self-evident. It is at least plausible but I can think of lots of reasons why it might not be true. Taking an outside view, we do not see much evidence from evolution or human societies of ‘winner takes all’ being a common outcome (we see much diversity in nature and human society), nor of first mover advantage always leading to an insurmountable lead. And yes, I know there are lots of reasons why ‘self improving AI is different’ but I don’t consider the matter settled. It is a realistic enough concern for me to broadly support SIAI’s efforts but it is by no means the only possible outcome.
Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
Why does any goal directed agent ‘allow’ other agents to conflict with its goals? Because it isn’t strong enough to prevent them. We know of no counter examples in all of history to the hypothesis that all goal directed agents have limits. This does not rule out the possibility that a self improving AI would be the first counter-example but neither does it make me as sure of that claim as many here seem to be.
But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
I understand the claim. I am not yet convinced it is possible or likely.
It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers.
I agree that human values are unlikely to be the easiest to maximize. However, for another mind to optimize our universe, it needs to be created. This is why SIAI advocates creating an AI friendly to humans before other optimization processes are created.
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment? Would convincing argument for the intelligence explosion cause you to change your mind?
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment?
More or less, though I actually lean towards it being likely rather than merely possible. I am also making the related claim that a widely spatially dispersed entity with a single coherent goal system may be a highly unstable configuration.
Would convincing argument for the intelligence explosion cause you to change your mind?
On the first point, yes. I don’t believe I’ve seen my points addressed in detail, though it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground. I will take some time to follow up on that later.
it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground.
I’m working my way through it and indeed it does. Robin Hanson’s post Dreams of Autarky is close to my position. I think there are other computational, economic and physical arguments in this direction as well.
It certainly does if the utility function doesn’t refer to anything indexical; and an agent with an indexical utility function can build another agent (not a copy of itself, though) with a differently-represented (non-indexical) utility function that represents the same third-person preference ordering.
This is mere supposition. You are assuming the FAI problem is solvable. I think both evolutionary and economic arguments weigh against this belief. Even if this is possible in theory it may take far longer for a singleton AI to craft its faultlessly loyal minions than for a more… entrepreneurial… AI to churn out ‘good enough’ foot soldiers to wipe out the careful AI.
No. All an AI needs to do to create another AI which shares its values is to copy itself.
So if you cloned yourself you would be 100% confident you would never find yourself in a situation where your interests conflicted with your clone? Again, you are assuming the FAI problem is solvable and that the idea of an AI with unchanging values is even coherent.
I am not an AI. I am not an optimization process with an explicit utility function. A copy of an AI that undertook actions which appeared to work against another copy, would be found, on reflection, to have been furthering the terms of their shared utility function.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings. If you disagree with this you need to provide some reasoning—the burden of proof is on those who would claim otherwise it seems to me.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard. I further agree that it is quite likely that the first powerful optimization process created does not have this property, because of the difficulty involved, even if this is the goal that all AI creators have. I will however state that if the first such powerful optimization process is not of the sort I specified, we will all die.
I also agree that the vast majority of mind-design space consists of sloppy approximations that will break down outside their intended environments. This means that most AI designs will kill us. When I use the word AI, I don’t mean a randomly selected mind, I mean a reflectively consistent, effective optimizer of a specific utility function.
In many environments, quick and dirty heuristics will cause enormous advantage, so long as those environments can be expected to continue in the same way for the length of time the heuristic will be in operation. This means that if you have two minds, with equal resources (ie the situation you describe) the one willing to use slapdash heuristics will win, as long as the conditions facing it don’t change. But, the situation where two minds are created with equal resources is unlikely to occur, given that one of them is an AI (as I use the term), even if that AI is not maximizing human utility (ie an FAI). The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment. Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
In short: The AI problem is hard. Sloppy minds are quite possibly going to be made before proper AIs. But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
We know that perfect solutions to even quite simple optimization problems are a different kind of hard. We have quite good reason to suspect that this is an essential property of reality and that we will never be able to solve such problems simply. The kinds of problems we are talking about seem likely to be more complex to solve. In other words if (and it is a big if) it is possible to create an optimization process that provably advances a set of values (let’s call it ‘friendly’) it is unlikely to be a perfect optimization process. It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers. I see no reason to suppose that the most effective optimizers will happily fall into the ‘friendly’ subset.
I don’t consider this hypothesis proved or self-evident. It is at least plausible but I can think of lots of reasons why it might not be true. Taking an outside view, we do not see much evidence from evolution or human societies of ‘winner takes all’ being a common outcome (we see much diversity in nature and human society), nor of first mover advantage always leading to an insurmountable lead. And yes, I know there are lots of reasons why ‘self improving AI is different’ but I don’t consider the matter settled. It is a realistic enough concern for me to broadly support SIAI’s efforts but it is by no means the only possible outcome.
Why does any goal directed agent ‘allow’ other agents to conflict with its goals? Because it isn’t strong enough to prevent them. We know of no counter examples in all of history to the hypothesis that all goal directed agents have limits. This does not rule out the possibility that a self improving AI would be the first counter-example but neither does it make me as sure of that claim as many here seem to be.
I understand the claim. I am not yet convinced it is possible or likely.
I agree that human values are unlikely to be the easiest to maximize. However, for another mind to optimize our universe, it needs to be created. This is why SIAI advocates creating an AI friendly to humans before other optimization processes are created.
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment? Would convincing argument for the intelligence explosion cause you to change your mind?
More or less, though I actually lean towards it being likely rather than merely possible. I am also making the related claim that a widely spatially dispersed entity with a single coherent goal system may be a highly unstable configuration.
On the first point, yes. I don’t believe I’ve seen my points addressed in detail, though it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground. I will take some time to follow up on that later.
I’m working my way through it and indeed it does. Robin Hanson’s post Dreams of Autarky is close to my position. I think there are other computational, economic and physical arguments in this direction as well.
It’s not obvious that “shared utility function” means something definite, though.
It certainly does if the utility function doesn’t refer to anything indexical; and an agent with an indexical utility function can build another agent (not a copy of itself, though) with a differently-represented (non-indexical) utility function that represents the same third-person preference ordering.