If you have an intelligence in a box, it will have access to its source code. If it modifies itself enough, you will not be able to predict what it wants.
Eliezer thinks he can write a self-modifying AI that will self-modify to want the same things its original self wanted. I’m proposing that he choose a different thing for the AI to want that will be easier to code, as an intermediate step to building a truly friendly AI.
I think this may be of interest to you.
If you have an intelligence in a box, it will have access to its source code. If it modifies itself enough, you will not be able to predict what it wants.
I’ve read that.
Eliezer thinks he can write a self-modifying AI that will self-modify to want the same things its original self wanted. I’m proposing that he choose a different thing for the AI to want that will be easier to code, as an intermediate step to building a truly friendly AI.