Suppose your initial optimizer is an AGI which knows the experimental setup, and has some arbitrary values. For example, a crude simulation of a human brain, trying to take over the world and aware of the experimental setup. What will happen?
I would suggest against creating a seed AI that has drives related to the outside world. I don’t see why optimizers for mathematical functions necessarily need such drives.
So clearly your argument needs to depend somehow on the nature of the seed AI. How much extra do you need to ask of it? The answer seems to be “quite a lot,” if it is a powerful enough optimization process to get this sort of thing going.
I think the only “extra” is that it’s a program meant to do well on the sample problems and that doesn’t have drives related to the external world, like most machine learning techniques.
I think the only “extra” is that it’s a program meant to do well on the sample problems and that doesn’t have drives related to the external world, like most machine learning techniques.
Most machine learning techniques cannot be used to drive the sort of self-improvement process you are describing here. It may be that no techniques can drive this sort of self-improvement—in this case, we are not really worried about the possibility of an uncontrolled takeoff, because there is not likely to be a takeoff. Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.
If general intelligence does first come from recursive self-improvement, it won’t be starting from contemporary machine learning techniques or anything that looks like them.
As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictions and guide the local search. Does this seem safe for use as your seed AI? This is colorful, but with a gooey center of wisdom.
Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictions and guide the local search. Does this seem safe for use as your seed AI?
Yes, it does. I’m assuming what you mean is that it will use something similar to genetic algorithms or hill climbing to find solutions; that is, it comes up with one solution, then looks for similar ones that have higher scores. I think this will be safe because it’s still not doing anything long-term. All this local search finds an immediate solution. There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them. In other words, the “utility function” emphasizes current ability to solve optimization problems above all else.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
If such a system were around, it would be straightforward to create a human-level AI that has a utility function—just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.)
There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them.
There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The “explicit” self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.
I would suggest against creating a seed AI that has drives related to the outside world. I don’t see why optimizers for mathematical functions necessarily need such drives.
I think the only “extra” is that it’s a program meant to do well on the sample problems and that doesn’t have drives related to the external world, like most machine learning techniques.
Most machine learning techniques cannot be used to drive the sort of self-improvement process you are describing here. It may be that no techniques can drive this sort of self-improvement—in this case, we are not really worried about the possibility of an uncontrolled takeoff, because there is not likely to be a takeoff. Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.
If general intelligence does first come from recursive self-improvement, it won’t be starting from contemporary machine learning techniques or anything that looks like them.
As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictions and guide the local search. Does this seem safe for use as your seed AI? This is colorful, but with a gooey center of wisdom.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
Yes, it does. I’m assuming what you mean is that it will use something similar to genetic algorithms or hill climbing to find solutions; that is, it comes up with one solution, then looks for similar ones that have higher scores. I think this will be safe because it’s still not doing anything long-term. All this local search finds an immediate solution. There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them. In other words, the “utility function” emphasizes current ability to solve optimization problems above all else.
If such a system were around, it would be straightforward to create a human-level AI that has a utility function—just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.)
There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The “explicit” self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.