[Eli’s personal notes. Feel free to ignore or to engage.]
Supposing we intend the first use of AGI to be solving some bounded and well-specified task, but we misunderstand or badly implement it so much that what we end up with is actually unboundedly optimising some objective function. Then it seems better if that objective is something abstract like puzzle solving rather than something more directly connected to human preferences: consider, as a toy example, if the sign (positive/negative) around the objective were wrong.
The basic idea here is that if we screw up so badly that what we thought was a safely bounded tool-AI, is actually optimizing to tile the universe with something, it is better if it tiles the universe with data-centers doing math proofs than something that refers to what humans want?
because whereas math-proof data-centers might result in our inadvertent death, something that refers to what humans want might result in deliberate torture.
I want to note that either case of screwing up this badly currently feels pretty implausible to me.
[Eli’s personal notes. Feel free to ignore or to engage.]
The basic idea here is that if we screw up so badly that what we thought was a safely bounded tool-AI, is actually optimizing to tile the universe with something, it is better if it tiles the universe with data-centers doing math proofs than something that refers to what humans want?
Why would that be?
because whereas math-proof data-centers might result in our inadvertent death, something that refers to what humans want might result in deliberate torture.
I want to note that either case of screwing up this badly currently feels pretty implausible to me.