Not a utility function, but rather a (quite resources-intensive) technique for generating one:
Rather than building one AI, build about five hundred of them, with a rudimentary utility function template and the ability to learn and revise it. Give them a simulated universe to live in, unaware of the existence of our universe. (You may need to supplement the population of 500 with some human operators, but they should have an interface which makes them appear to be inhabiting the simulated world.) Keep track of which ones act most pathologically, delete them, and recombine the remaining AIs with mutation to get a second generation of 500. Keep doing this until you have an AI that consistently minds its manners, and then create a new copy of that AI to live in our world.
After one round of self-improvement, it’s pathological again. You can’t test for stability under self-improvement by using a simulated universe which lacks the resources necessary to self-improve.
If it’s possible to self-improve in our universe, it’s possible to self-improve in the simulated universe. The only thing stopping us from putting together a reasonable simulation of the laws of physics, at this point, is raw computing power. Developing AGI is a problem of an entirely different sort: we simply don’t know how to do it yet, even in principle.
You’re right, but let me revise that slightly. In a simulated universe, some forms of self-improvement are possible, but others are cut off. Specifically, all forms of self-improvement which require more resources than you provide in the simulated universe are cut off. The problem is that that includes most of the interesting ones, and it’s entirely possible that it will self-modify into something bad but only when you give it more hardware.
Not a utility function, but rather a (quite resources-intensive) technique for generating one:
Rather than building one AI, build about five hundred of them, with a rudimentary utility function template and the ability to learn and revise it. Give them a simulated universe to live in, unaware of the existence of our universe. (You may need to supplement the population of 500 with some human operators, but they should have an interface which makes them appear to be inhabiting the simulated world.) Keep track of which ones act most pathologically, delete them, and recombine the remaining AIs with mutation to get a second generation of 500. Keep doing this until you have an AI that consistently minds its manners, and then create a new copy of that AI to live in our world.
After one round of self-improvement, it’s pathological again. You can’t test for stability under self-improvement by using a simulated universe which lacks the resources necessary to self-improve.
If it’s possible to self-improve in our universe, it’s possible to self-improve in the simulated universe. The only thing stopping us from putting together a reasonable simulation of the laws of physics, at this point, is raw computing power. Developing AGI is a problem of an entirely different sort: we simply don’t know how to do it yet, even in principle.
You’re right, but let me revise that slightly. In a simulated universe, some forms of self-improvement are possible, but others are cut off. Specifically, all forms of self-improvement which require more resources than you provide in the simulated universe are cut off. The problem is that that includes most of the interesting ones, and it’s entirely possible that it will self-modify into something bad but only when you give it more hardware.