Look at how the system is set up. Would code for (2) that eats up resources score better according to objective function (3)? No, because the criterion for (3) should evaluate program (2) by giving it only a constant amount of computing power; nothing the program does would change how much computing power (3) would give it.
Halting problem being what it is, (i guess) the 3 works by taking a look and seeing how much CPU time is used (or memory or what ever). On the hardware that is being monitored by 3. The extra hardware that the AI can add, though… also good luck defining ‘extra hardware’ formally.
edit: of course you can say it is sandboxed and haven’t got hands, but it wont be long until you start, idk, wanting real bad to optimize proteins or DNA or the like.
that’s AI you’re keeping safe by keeping it in a box, basically. If the resource bounded execute lets the alg get online the alg is free to hack into servers.
Plus it is not AGI, and people will be using it to make AGI or hardware for AGI.
It is also not very general purpose. You are defining the scoring.
And you start with a human written program that non-trivially improves it’s own ability to solve problems (and it does so in nsteps for improving own ability to solve N problems in nsteps each). You basically start off with some mighty powerful artificial intelligence.
If the resource bounded execute lets the alg get online the alg is free to hack into servers.
So don’t do that.
Plus it is not AGI, and people will be using it to make AGI or hardware for AGI.
See my other post, it can solve many many different problems, e.g. general induction and the problems in your original post (such as optimizing source code, assuming we have a specification for the source code).
You basically start off with some mighty powerful artificial intelligence.
This framework is meant to provide a safe framework for this powerful AI to become even more powerful without destroying the world in the process. Also, the training set provides a guide for humans trying to write the code.
To reiterate: no, I haven’t solved friendly AI, but I think I’ve solved friendly AI for verifiable problems.
Okay, say, as a starting point in your framework we got the optimizing AI from my original post, armed with whatever it learnt off a copy of internet, and wanting to do something unintended which requires getting out and doing stuff in real world.
The thing sees your example problem(s), it solves them poorly (playing dumb).
The thing sees self optimization problem, it makes a functionally equivalent copy of self thats all micro optimized and remembers facing self optimization problem and knows that now it should solve example problems a small bit less poorly.
Hurray, it is working, proclaims your team.
Eventually it plausibly gets as good as you can get in nsteps.
Some day it is given some real world problems to solve, it makes underhanded solutions, again, not the best it could.
Bottom line is, your kind of thinking is precisely what my example AGI in original post wants to exploit
Yes, it’s a very bad idea to take the AI from your original post and then stick it into my framework. But if we had programmers initially working within my framework to create the AI according to criterion (3) in good faith, then I think any self-improvements the system makes would also be safe. If we already had an unfriendly AGI we’d be screwed anyway.
That kind of stuff is easy in low resolution un-detailed thought… but look with more details...
I think you confused yourself (and me too) with regards to what the AI would be optimizing, confusing this with what the framework ‘wants’ it to optimize. The scoring functions can be very expensive to evaluate.
Here you have the 4, which is the whole point of the entire exercise. The scoring function here is over M times more expensive to evaluate than the AI run itself, where M is the number of test problems (which you’ll want very huge). You’d actually want to evaluate AI’s ability to do 4, too, but that’d enter infinite recursion.
When you are working on a problem where you can’t even evaluate the scoring function inside your AI—not even remotely close—you have to make some heuristics, some substitute scoring.
Let’s consider chess as example:
The goal of chess is to maximize win value, the win values being enemy checkmated>tie>you are checkmated.
The goal of the chess AI developed with maximization of win in mind, is instead perhaps to maximize piece dis-balance in 7 ply.
(This works better for maximizing win, given limited computation, than trying to maximize the win!)
And once you have an AI inside your framework which is not maximizing the value that your framework is maximizing—it’s potentially AI from my original post in your framework, getting out.
When you are working on a problem where you can’t even evaluate the scoring function inside your AI—not even remotely close—you have to make some heuristics, some substitute scoring.
You’re right, this is tricky because the self-optimizer thread (4) might have to call (3) a lot. Perhaps this can be fixed by giving the program more time to find self-optimizations. Or perhaps the program could use program (3)’s specification/source code rather than directly executing it, in order to figure out how to optimize it heuristically. Either way it’s not perfect. At worst program (4) will just fail to find optimizations in the allowed time.
And once you have an AI inside your framework which is not maximizing the value that your framework is maximizing—it’s potentially AI from my original post in your framework, getting out.
Ok, if you plopped your AI into my framework it would be terrible. But I don’t see how the self-improvement process would spontaneously create an unfriendly AI.
The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.
The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that’s more effective. I.e. the AI with some substitute goals of it’s own instead of straightforward maximization of scores. (Heh, the whole point of exercise is to create AI that would keep self improving, meaning, would improve it’s ability to self improve. Which is something that you can only do by some kind of goal substitution because the evaluation of the ability to self improve is too expensive—the goal is a something that you evaluate many times.)
So what does the framework do, exactly, that would improve safety here? Beyond keeping the AI in the rudimentary box, and making it very dubious that the AI would at all self improve. Yes, it is very dubious that under this framework the unfriendly AI will arise but is some added safety, or is it a special case of general dubiousness that a self improvement would take place? I don’t see added safety. I don’t see framework impeding growing unfriendliness any more than it would impede self improvement.
edit: maybe should just say, nonfriendly. Any AI that is not friendly, can just eat you up when hungry and it doesn’t need you.
The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.
That’s only if you plop a ready-made AGI in the framework. The framework is meant to grow a stupider seed AI.
The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that’s more effective.
Program (3) cannot be re-written. Program (2) is the only thing that is changed. All it does is improve itself and spit out solutions to optimization problems. I see no way for it to “create a more effective problem solving AI”.
So what does the framework do, exactly, that would improve safety here?
It provides guidance for a seed AI to grow to solve optimization problems better without having it take actions that have effects beyond its ability to solve optimization problems.
A lot goes into solving the optimization problems without invoking the scoring function a trillion times (which would entirely prohibit self improvement).
Look at where similar kind of framework got us, the homo sapiens. We were minding our business evolving, maximizing own fitness, which was the all we could do. We were self improving (the output being next generation’s us). Now there’s talk of Large Hadron Collider destroying the world. It probably won’t, of course, but we’re pretty well going along the bothersome path. We also started as a pretty stupid seed AI, a bunch of monkeys. Scratch that, as unicellular life.
Look at how the system is set up. Would code for (2) that eats up resources score better according to objective function (3)? No, because the criterion for (3) should evaluate program (2) by giving it only a constant amount of computing power; nothing the program does would change how much computing power (3) would give it.
Define the 3.
Halting problem being what it is, (i guess) the 3 works by taking a look and seeing how much CPU time is used (or memory or what ever). On the hardware that is being monitored by 3. The extra hardware that the AI can add, though… also good luck defining ‘extra hardware’ formally.
edit: of course you can say it is sandboxed and haven’t got hands, but it wont be long until you start, idk, wanting real bad to optimize proteins or DNA or the like.
Ok, pseudo-Python:
Where resource_bounded_execute is a modified interpreter that fails after alg executes nsteps.
Again, I don’t see why a version of (2) that does weird stuff with proteins and DNA will make the above python program (3) give it a higher score.
that’s AI you’re keeping safe by keeping it in a box, basically. If the resource bounded execute lets the alg get online the alg is free to hack into servers.
Plus it is not AGI, and people will be using it to make AGI or hardware for AGI.
It is also not very general purpose. You are defining the scoring.
And you start with a human written program that non-trivially improves it’s own ability to solve problems (and it does so in nsteps for improving own ability to solve N problems in nsteps each). You basically start off with some mighty powerful artificial intelligence.
So don’t do that.
See my other post, it can solve many many different problems, e.g. general induction and the problems in your original post (such as optimizing source code, assuming we have a specification for the source code).
This framework is meant to provide a safe framework for this powerful AI to become even more powerful without destroying the world in the process. Also, the training set provides a guide for humans trying to write the code.
To reiterate: no, I haven’t solved friendly AI, but I think I’ve solved friendly AI for verifiable problems.
Okay, say, as a starting point in your framework we got the optimizing AI from my original post, armed with whatever it learnt off a copy of internet, and wanting to do something unintended which requires getting out and doing stuff in real world.
The thing sees your example problem(s), it solves them poorly (playing dumb).
The thing sees self optimization problem, it makes a functionally equivalent copy of self thats all micro optimized and remembers facing self optimization problem and knows that now it should solve example problems a small bit less poorly.
Hurray, it is working, proclaims your team.
Eventually it plausibly gets as good as you can get in nsteps.
Some day it is given some real world problems to solve, it makes underhanded solutions, again, not the best it could.
Bottom line is, your kind of thinking is precisely what my example AGI in original post wants to exploit
Yes, it’s a very bad idea to take the AI from your original post and then stick it into my framework. But if we had programmers initially working within my framework to create the AI according to criterion (3) in good faith, then I think any self-improvements the system makes would also be safe. If we already had an unfriendly AGI we’d be screwed anyway.
That kind of stuff is easy in low resolution un-detailed thought… but look with more details...
I think you confused yourself (and me too) with regards to what the AI would be optimizing, confusing this with what the framework ‘wants’ it to optimize. The scoring functions can be very expensive to evaluate.
Here you have the 4, which is the whole point of the entire exercise. The scoring function here is over M times more expensive to evaluate than the AI run itself, where M is the number of test problems (which you’ll want very huge). You’d actually want to evaluate AI’s ability to do 4, too, but that’d enter infinite recursion.
When you are working on a problem where you can’t even evaluate the scoring function inside your AI—not even remotely close—you have to make some heuristics, some substitute scoring.
Let’s consider chess as example:
The goal of chess is to maximize win value, the win values being enemy checkmated>tie>you are checkmated.
The goal of the chess AI developed with maximization of win in mind, is instead perhaps to maximize piece dis-balance in 7 ply.
(This works better for maximizing win, given limited computation, than trying to maximize the win!)
And once you have an AI inside your framework which is not maximizing the value that your framework is maximizing—it’s potentially AI from my original post in your framework, getting out.
You’re right, this is tricky because the self-optimizer thread (4) might have to call (3) a lot. Perhaps this can be fixed by giving the program more time to find self-optimizations. Or perhaps the program could use program (3)’s specification/source code rather than directly executing it, in order to figure out how to optimize it heuristically. Either way it’s not perfect. At worst program (4) will just fail to find optimizations in the allowed time.
Ok, if you plopped your AI into my framework it would be terrible. But I don’t see how the self-improvement process would spontaneously create an unfriendly AI.
The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.
The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that’s more effective. I.e. the AI with some substitute goals of it’s own instead of straightforward maximization of scores. (Heh, the whole point of exercise is to create AI that would keep self improving, meaning, would improve it’s ability to self improve. Which is something that you can only do by some kind of goal substitution because the evaluation of the ability to self improve is too expensive—the goal is a something that you evaluate many times.)
So what does the framework do, exactly, that would improve safety here? Beyond keeping the AI in the rudimentary box, and making it very dubious that the AI would at all self improve. Yes, it is very dubious that under this framework the unfriendly AI will arise but is some added safety, or is it a special case of general dubiousness that a self improvement would take place? I don’t see added safety. I don’t see framework impeding growing unfriendliness any more than it would impede self improvement.
edit: maybe should just say, nonfriendly. Any AI that is not friendly, can just eat you up when hungry and it doesn’t need you.
That’s only if you plop a ready-made AGI in the framework. The framework is meant to grow a stupider seed AI.
Program (3) cannot be re-written. Program (2) is the only thing that is changed. All it does is improve itself and spit out solutions to optimization problems. I see no way for it to “create a more effective problem solving AI”.
It provides guidance for a seed AI to grow to solve optimization problems better without having it take actions that have effects beyond its ability to solve optimization problems.
A lot goes into solving the optimization problems without invoking the scoring function a trillion times (which would entirely prohibit self improvement).
Look at where similar kind of framework got us, the homo sapiens. We were minding our business evolving, maximizing own fitness, which was the all we could do. We were self improving (the output being next generation’s us). Now there’s talk of Large Hadron Collider destroying the world. It probably won’t, of course, but we’re pretty well going along the bothersome path. We also started as a pretty stupid seed AI, a bunch of monkeys. Scratch that, as unicellular life.