When you are working on a problem where you can’t even evaluate the scoring function inside your AI—not even remotely close—you have to make some heuristics, some substitute scoring.
You’re right, this is tricky because the self-optimizer thread (4) might have to call (3) a lot. Perhaps this can be fixed by giving the program more time to find self-optimizations. Or perhaps the program could use program (3)’s specification/source code rather than directly executing it, in order to figure out how to optimize it heuristically. Either way it’s not perfect. At worst program (4) will just fail to find optimizations in the allowed time.
And once you have an AI inside your framework which is not maximizing the value that your framework is maximizing—it’s potentially AI from my original post in your framework, getting out.
Ok, if you plopped your AI into my framework it would be terrible. But I don’t see how the self-improvement process would spontaneously create an unfriendly AI.
Right, I think more discussion is warranted.
If general problem-solving is even possible then an algorithm exists that solves the problems well without cheating.
I think this won’t happen because all the progress is driven by criterion (3). In order for a non-meta program (2) to create a meta-version, there would need to be some kind of benefit according to (3). Theoretically if (3) were hackable then it would be possible for the new proposed version of (2) to exploit this; but I don’t see why the current version of (2) would be more likely than, say, random chance, to create hacky versions of itself.
Ok, I’ve qualified my statement. If it all works I’ve solved friendly AI for a limited subset of problems.