I think that at least the weak orthogonality thesis survives these arguments in the sense that any coherent utility function over an ontology “closely matching” reality should in principle be reachable for arbitrarily intelligent agents, along some path of optimization/learning. Your only point that seems to contradict this is the existence of optimization daemons, but I’m confident that an anti-daemon immune system can be designed, so any agent that chooses to design itself in a way where it can be overtaken by daemons must do this with the knowledge that something close to its values will still be optimized for—so this shouldn’t cause much observable shift in values.
It’s unclear how much measure is assigned to various “final/limiting” utility functions by various agent construction schemes—I think this is far beyond our current technical ability to answer.
Personally, I suspect that the angle is more like 60 degrees, not 3.
I think that at least the weak orthogonality thesis survives these arguments in the sense that any coherent utility function over an ontology “closely matching” reality should in principle be reachable for arbitrarily intelligent agents, along some path of optimization/learning. Your only point that seems to contradict this is the existence of optimization daemons, but I’m confident that an anti-daemon immune system can be designed, so any agent that chooses to design itself in a way where it can be overtaken by daemons must do this with the knowledge that something close to its values will still be optimized for—so this shouldn’t cause much observable shift in values.
It’s unclear how much measure is assigned to various “final/limiting” utility functions by various agent construction schemes—I think this is far beyond our current technical ability to answer.
Personally, I suspect that the angle is more like 60 degrees, not 3.