Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictions and guide the local search. Does this seem safe for use as your seed AI?
Yes, it does. I’m assuming what you mean is that it will use something similar to genetic algorithms or hill climbing to find solutions; that is, it comes up with one solution, then looks for similar ones that have higher scores. I think this will be safe because it’s still not doing anything long-term. All this local search finds an immediate solution. There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them. In other words, the “utility function” emphasizes current ability to solve optimization problems above all else.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
If such a system were around, it would be straightforward to create a human-level AI that has a utility function—just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.)
There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them.
There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The “explicit” self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
Yes, it does. I’m assuming what you mean is that it will use something similar to genetic algorithms or hill climbing to find solutions; that is, it comes up with one solution, then looks for similar ones that have higher scores. I think this will be safe because it’s still not doing anything long-term. All this local search finds an immediate solution. There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them. In other words, the “utility function” emphasizes current ability to solve optimization problems above all else.
If such a system were around, it would be straightforward to create a human-level AI that has a utility function—just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.)
There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The “explicit” self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.