I don’t mean alignment with human concerns. I mean that the AI itself is engaged in the same project we are: building a smarter system than itself. So if it’s hard to control the alignment of such a system then it should be hard for the AI. (In theory you can imagine that it’s only hard at our specific level of intelligence but in fact all the arguments that AI alignment is hard seem to apply equally well to the AI making an improved AI as to us making an AI).
See my reply above. The AI x-risk arguments require the assumption that superintelligence necessarily entails the agent try to optimize some simple utility function (this is different than orthogonality which says increasing intelligence doesn’t cause convergence to any particular utility function). So the doesn’t care option is off the table since (by orthogonality) it’s super unlikely you get the one utility function which says just maximize intelligence locally (even global max isn’t enough bc some child AI who has different goals could interfere).
I don’t mean alignment with human concerns. I mean that the AI itself is engaged in the same project we are: building a smarter system than itself. So if it’s hard to control the alignment of such a system then it should be hard for the AI. (In theory you can imagine that it’s only hard at our specific level of intelligence but in fact all the arguments that AI alignment is hard seem to apply equally well to the AI making an improved AI as to us making an AI).
See my reply above. The AI x-risk arguments require the assumption that superintelligence necessarily entails the agent try to optimize some simple utility function (this is different than orthogonality which says increasing intelligence doesn’t cause convergence to any particular utility function). So the doesn’t care option is off the table since (by orthogonality) it’s super unlikely you get the one utility function which says just maximize intelligence locally (even global max isn’t enough bc some child AI who has different goals could interfere).