Now, why exactly should we expect the superintelligence that grows out of the seed to value what we really mean by ‘pleasure’, when all we programmed it to do was X, our probably-failed attempt at summarizing our values?
Maybe we didn’t do it ithat way. Maybe we did it Loosemore’s way, where you code in the high-level sentence, and let the AI figure it out. Maybe that would avoid the problem. Maybe Loosemore has solved FAi much more straightforwardly than EY.
Maybe we told it to. Maybe we gave it the low-level expansion of “happy” that we or our seed AI came up with together with an instruction that it is meant to capture the meaning of the high-level statement, and that the HL statement is the Prime Directive, and that if the AI judges that the expansion is wrong, then it should reject the expansion.
Maybe the AI will value getting things right because it is rational.
But it will scare friendly ones, which will want to keep their values stable.
It takes stupidity to misinterpret friendlienss.