Well, also the US isn’t a single entity that agrees on all its goals. Some of us for example place a high value on human life. And we vote. Even if the leadership of the United States wanted to wipe out the rest of the planet, there would be limits to how much they could do before others would step in.
Also, most forms of modern human morality strongly disfavor large scale wars simply to impose one’s views. If our AI doesn’t have that sort of belief then that’s not an issue. And if we restrict ourselves to just the issue of other AIs, I’m not sure if I gave a smart AI my morals and preferences it would necessarily see anything wrong with making sure that no other general smart AIs were created.
Well, also the US isn’t a single entity that agrees on all its goals.
I think it is quite plausible that an AI structured with a central unitary authority would be at a competitive disadvantage with an AI that granted some autonomy to sub systems. This at least raises the possibility of goal conflicts between different sub-modules of an efficient AI. There are many examples in nature and in human societies of a tension between efficiency and centralization. It is not clear that an AI could maintain a fully centralized and unified goal structure and out-compete less centralized designs.
An AI that wanted to control even a relatively small region of space like the Earth will still run into issues with the speed of light when it comes to projecting force through geographically dispersed physical presences. The turnaround time is such that decision making autonomy would have to be dispersed to local processing clusters in order to be effective. Hell, even today’s high end processors run into issues with the time it takes an electron to get from one side of the die to the other. It is not obvious that the optimum efficiency balance between local decision making autonomy and a centralized unitary goal system will always favour a singleton type AI.
There is some evidence of evolutionary competition between different cell lines within a single organism. Human history is full of examples of the tension between centralized planning and less centrally coordinated but more efficient systems of delegated authority. We do not see a clear unidirectional trend towards more centralized control or towards larger conglomerations of purely co-operating units (whether they be cells, organisms, humans or genes) in nature or in human societies. It seems to me that the burden of proof is on those who would propose that a system with a unitary goal structure has an unbounded upper physical extent of influence where it can outcompete less unitary arrangements (or even that it can do so over volumes exceeding a few meters to a side).
There is a natural tendency for humans to think of themselves as having a unitary centralized consciousness with a unified goal system. It is pretty clear that this is not the case. It is also natural for programmers trained on single threaded Von-Neumann architectures or those with a mathematical bent to ignore the physical constraints of the speed of light when imagining what an AI might look like. If a human can’t even catch a ball without delegating authority to a semi-autonomous sub-unit I don’t see why we should be confident that non human intelligences subject to the same laws of physics should be immune to such problems.
This at least raises the possibility of goal conflicts between different sub-modules of an efficient AI.
A well designed AI should have an alignment of goals between sub modules that is not achieved in modern decentralized societies. A distributed AI would be like multiple TDT/UDT agents with mutual knowledge that they are maximizing the same utility function, not a bunch of middle managers engaging in empire building at the expense of the corporation they work for.
This is not even something that human AI designers have to figure out how to implement, the seed can be single agent, and it will figure out the multiple sub agent architecture when it needs it over the course of self improvement.
Even if this is possible (which I believe is still an open problem, if you think otherwise I’m sure Eliezer would love to hear from you) you are assuming no competition. The question is not whether this AI can outcompete humans but whether it can outcompete other AIs that are less rigid.
It is not obvious that the optimum efficiency balance between local decision making autonomy and a centralized unitary goal system will always favor a singleton type AI.
I agree that it would probably make a lot of sense for an AI who wished to control any large area of territory to create other AIs to manage local issues. However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests. There is no reason to assume that an AI would create another one, which it intends to delegate substantial power to, which it could get into values disagreements with.
However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests.
This is mere supposition. You are assuming the FAI problem is solvable. I think both evolutionary and economic arguments weigh against this belief. Even if this is possible in theory it may take far longer for a singleton AI to craft its faultlessly loyal minions than for a more… entrepreneurial… AI to churn out ‘good enough’ foot soldiers to wipe out the careful AI.
So if you cloned yourself you would be 100% confident you would never find yourself in a situation where your interests conflicted with your clone? Again, you are assuming the FAI problem is solvable and that the idea of an AI with unchanging values is even coherent.
I am not an AI. I am not an optimization process with an explicit utility function. A copy of an AI that undertook actions which appeared to work against another copy, would be found, on reflection, to have been furthering the terms of their shared utility function.
I am not an optimization process with an explicit utility function.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings. If you disagree with this you need to provide some reasoning—the burden of proof is on those who would claim otherwise it seems to me.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard. I further agree that it is quite likely that the first powerful optimization process created does not have this property, because of the difficulty involved, even if this is the goal that all AI creators have. I will however state that if the first such powerful optimization process is not of the sort I specified, we will all die.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings.
I also agree that the vast majority of mind-design space consists of sloppy approximations that will break down outside their intended environments. This means that most AI designs will kill us. When I use the word AI, I don’t mean a randomly selected mind, I mean a reflectively consistent, effective optimizer of a specific utility function.
In many environments, quick and dirty heuristics will cause enormous advantage, so long as those environments can be expected to continue in the same way for the length of time the heuristic will be in operation. This means that if you have two minds, with equal resources (ie the situation you describe) the one willing to use slapdash heuristics will win, as long as the conditions facing it don’t change. But, the situation where two minds are created with equal resources is unlikely to occur, given that one of them is an AI (as I use the term), even if that AI is not maximizing human utility (ie an FAI). The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment. Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
In short: The AI problem is hard. Sloppy minds are quite possibly going to be made before proper AIs. But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard.
We know that perfect solutions to even quite simple optimization problems are a different kind of hard. We have quite good reason to suspect that this is an essential property of reality and that we will never be able to solve such problems simply. The kinds of problems we are talking about seem likely to be more complex to solve. In other words if (and it is a big if) it is possible to create an optimization process that provably advances a set of values (let’s call it ‘friendly’) it is unlikely to be a perfect optimization process. It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers. I see no reason to suppose that the most effective optimizers will happily fall into the ‘friendly’ subset.
The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment.
I don’t consider this hypothesis proved or self-evident. It is at least plausible but I can think of lots of reasons why it might not be true. Taking an outside view, we do not see much evidence from evolution or human societies of ‘winner takes all’ being a common outcome (we see much diversity in nature and human society), nor of first mover advantage always leading to an insurmountable lead. And yes, I know there are lots of reasons why ‘self improving AI is different’ but I don’t consider the matter settled. It is a realistic enough concern for me to broadly support SIAI’s efforts but it is by no means the only possible outcome.
Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
Why does any goal directed agent ‘allow’ other agents to conflict with its goals? Because it isn’t strong enough to prevent them. We know of no counter examples in all of history to the hypothesis that all goal directed agents have limits. This does not rule out the possibility that a self improving AI would be the first counter-example but neither does it make me as sure of that claim as many here seem to be.
But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
I understand the claim. I am not yet convinced it is possible or likely.
It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers.
I agree that human values are unlikely to be the easiest to maximize. However, for another mind to optimize our universe, it needs to be created. This is why SIAI advocates creating an AI friendly to humans before other optimization processes are created.
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment? Would convincing argument for the intelligence explosion cause you to change your mind?
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment?
More or less, though I actually lean towards it being likely rather than merely possible. I am also making the related claim that a widely spatially dispersed entity with a single coherent goal system may be a highly unstable configuration.
Would convincing argument for the intelligence explosion cause you to change your mind?
On the first point, yes. I don’t believe I’ve seen my points addressed in detail, though it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground. I will take some time to follow up on that later.
it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground.
I’m working my way through it and indeed it does. Robin Hanson’s post Dreams of Autarky is close to my position. I think there are other computational, economic and physical arguments in this direction as well.
It certainly does if the utility function doesn’t refer to anything indexical; and an agent with an indexical utility function can build another agent (not a copy of itself, though) with a differently-represented (non-indexical) utility function that represents the same third-person preference ordering.
Well, also the US isn’t a single entity that agrees on all its goals. Some of us for example place a high value on human life. And we vote. Even if the leadership of the United States wanted to wipe out the rest of the planet, there would be limits to how much they could do before others would step in.
Also, most forms of modern human morality strongly disfavor large scale wars simply to impose one’s views. If our AI doesn’t have that sort of belief then that’s not an issue. And if we restrict ourselves to just the issue of other AIs, I’m not sure if I gave a smart AI my morals and preferences it would necessarily see anything wrong with making sure that no other general smart AIs were created.
I think it is quite plausible that an AI structured with a central unitary authority would be at a competitive disadvantage with an AI that granted some autonomy to sub systems. This at least raises the possibility of goal conflicts between different sub-modules of an efficient AI. There are many examples in nature and in human societies of a tension between efficiency and centralization. It is not clear that an AI could maintain a fully centralized and unified goal structure and out-compete less centralized designs.
An AI that wanted to control even a relatively small region of space like the Earth will still run into issues with the speed of light when it comes to projecting force through geographically dispersed physical presences. The turnaround time is such that decision making autonomy would have to be dispersed to local processing clusters in order to be effective. Hell, even today’s high end processors run into issues with the time it takes an electron to get from one side of the die to the other. It is not obvious that the optimum efficiency balance between local decision making autonomy and a centralized unitary goal system will always favour a singleton type AI.
There is some evidence of evolutionary competition between different cell lines within a single organism. Human history is full of examples of the tension between centralized planning and less centrally coordinated but more efficient systems of delegated authority. We do not see a clear unidirectional trend towards more centralized control or towards larger conglomerations of purely co-operating units (whether they be cells, organisms, humans or genes) in nature or in human societies. It seems to me that the burden of proof is on those who would propose that a system with a unitary goal structure has an unbounded upper physical extent of influence where it can outcompete less unitary arrangements (or even that it can do so over volumes exceeding a few meters to a side).
There is a natural tendency for humans to think of themselves as having a unitary centralized consciousness with a unified goal system. It is pretty clear that this is not the case. It is also natural for programmers trained on single threaded Von-Neumann architectures or those with a mathematical bent to ignore the physical constraints of the speed of light when imagining what an AI might look like. If a human can’t even catch a ball without delegating authority to a semi-autonomous sub-unit I don’t see why we should be confident that non human intelligences subject to the same laws of physics should be immune to such problems.
A well designed AI should have an alignment of goals between sub modules that is not achieved in modern decentralized societies. A distributed AI would be like multiple TDT/UDT agents with mutual knowledge that they are maximizing the same utility function, not a bunch of middle managers engaging in empire building at the expense of the corporation they work for.
This is not even something that human AI designers have to figure out how to implement, the seed can be single agent, and it will figure out the multiple sub agent architecture when it needs it over the course of self improvement.
Even if this is possible (which I believe is still an open problem, if you think otherwise I’m sure Eliezer would love to hear from you) you are assuming no competition. The question is not whether this AI can outcompete humans but whether it can outcompete other AIs that are less rigid.
I agree that it would probably make a lot of sense for an AI who wished to control any large area of territory to create other AIs to manage local issues. However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests. There is no reason to assume that an AI would create another one, which it intends to delegate substantial power to, which it could get into values disagreements with.
This is mere supposition. You are assuming the FAI problem is solvable. I think both evolutionary and economic arguments weigh against this belief. Even if this is possible in theory it may take far longer for a singleton AI to craft its faultlessly loyal minions than for a more… entrepreneurial… AI to churn out ‘good enough’ foot soldiers to wipe out the careful AI.
No. All an AI needs to do to create another AI which shares its values is to copy itself.
So if you cloned yourself you would be 100% confident you would never find yourself in a situation where your interests conflicted with your clone? Again, you are assuming the FAI problem is solvable and that the idea of an AI with unchanging values is even coherent.
I am not an AI. I am not an optimization process with an explicit utility function. A copy of an AI that undertook actions which appeared to work against another copy, would be found, on reflection, to have been furthering the terms of their shared utility function.
You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation.
Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings. If you disagree with this you need to provide some reasoning—the burden of proof is on those who would claim otherwise it seems to me.
I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard. I further agree that it is quite likely that the first powerful optimization process created does not have this property, because of the difficulty involved, even if this is the goal that all AI creators have. I will however state that if the first such powerful optimization process is not of the sort I specified, we will all die.
I also agree that the vast majority of mind-design space consists of sloppy approximations that will break down outside their intended environments. This means that most AI designs will kill us. When I use the word AI, I don’t mean a randomly selected mind, I mean a reflectively consistent, effective optimizer of a specific utility function.
In many environments, quick and dirty heuristics will cause enormous advantage, so long as those environments can be expected to continue in the same way for the length of time the heuristic will be in operation. This means that if you have two minds, with equal resources (ie the situation you describe) the one willing to use slapdash heuristics will win, as long as the conditions facing it don’t change. But, the situation where two minds are created with equal resources is unlikely to occur, given that one of them is an AI (as I use the term), even if that AI is not maximizing human utility (ie an FAI). The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment. Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn’t. It would crush all possible rivals, because to do otherwise is to invite disaster.
In short: The AI problem is hard. Sloppy minds are quite possibly going to be made before proper AIs. But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.
We know that perfect solutions to even quite simple optimization problems are a different kind of hard. We have quite good reason to suspect that this is an essential property of reality and that we will never be able to solve such problems simply. The kinds of problems we are talking about seem likely to be more complex to solve. In other words if (and it is a big if) it is possible to create an optimization process that provably advances a set of values (let’s call it ‘friendly’) it is unlikely to be a perfect optimization process. It seems likely to me that such ‘friendly’ optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some ‘non-friendly’ optimization processes will be better optimizers. I see no reason to suppose that the most effective optimizers will happily fall into the ‘friendly’ subset.
I don’t consider this hypothesis proved or self-evident. It is at least plausible but I can think of lots of reasons why it might not be true. Taking an outside view, we do not see much evidence from evolution or human societies of ‘winner takes all’ being a common outcome (we see much diversity in nature and human society), nor of first mover advantage always leading to an insurmountable lead. And yes, I know there are lots of reasons why ‘self improving AI is different’ but I don’t consider the matter settled. It is a realistic enough concern for me to broadly support SIAI’s efforts but it is by no means the only possible outcome.
Why does any goal directed agent ‘allow’ other agents to conflict with its goals? Because it isn’t strong enough to prevent them. We know of no counter examples in all of history to the hypothesis that all goal directed agents have limits. This does not rule out the possibility that a self improving AI would be the first counter-example but neither does it make me as sure of that claim as many here seem to be.
I understand the claim. I am not yet convinced it is possible or likely.
I agree that human values are unlikely to be the easiest to maximize. However, for another mind to optimize our universe, it needs to be created. This is why SIAI advocates creating an AI friendly to humans before other optimization processes are created.
It seems to me that your true objection to what I am saying is contained within the statement that “it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created.” Does this agree with your assessment? Would convincing argument for the intelligence explosion cause you to change your mind?
More or less, though I actually lean towards it being likely rather than merely possible. I am also making the related claim that a widely spatially dispersed entity with a single coherent goal system may be a highly unstable configuration.
On the first point, yes. I don’t believe I’ve seen my points addressed in detail, though it sounds like Eliezer’s debate with Robin Hanson that was linked earlier might cover the same ground. I will take some time to follow up on that later.
I’m working my way through it and indeed it does. Robin Hanson’s post Dreams of Autarky is close to my position. I think there are other computational, economic and physical arguments in this direction as well.
It’s not obvious that “shared utility function” means something definite, though.
It certainly does if the utility function doesn’t refer to anything indexical; and an agent with an indexical utility function can build another agent (not a copy of itself, though) with a differently-represented (non-indexical) utility function that represents the same third-person preference ordering.