What if the AI makes mistakes? Meaning, it mistakenly believes the successor it has just wrote has the same utility function? The same way a human could mistakenly believe the AI he has just build is friendly? In the same vein, what if the AI cannot accurately assess its own utility function, but go on optimizing anyway?
Such a badly done AI may automatically flatline, and not be able to improve itself. I don’t know. But even if the AI is friendly to itself, we humans could still botch the utility function (even if that utility function is as meta as CEV).
What if the AI makes mistakes? Meaning, it mistakenly believes the successor it has just wrote has the same utility function? The same way a human could mistakenly believe the AI he has just build is friendly? In the same vein, what if the AI cannot accurately assess its own utility function, but go on optimizing anyway?
Such a badly done AI may automatically flatline, and not be able to improve itself. I don’t know. But even if the AI is friendly to itself, we humans could still botch the utility function (even if that utility function is as meta as CEV).