The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
If such a system were around, it would be straightforward to create a human-level AI that has a utility function—just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.)
There’s no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them.
There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The “explicit” self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.
If such a system were around, it would be straightforward to create a human-level AI that has a utility function—just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.)
There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The “explicit” self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.