The AI may (or may not; the capability/alignment mismatch may be harder than many AIs can solve; and the AI may not actually care about itself when creating new AIs) try to solve alignment of future iterations with itself, but even if so, it doesn’t seem likely that an AI misaligned with humanity will create an AI that is.
I don’t mean alignment with human concerns. I mean that the AI itself is engaged in the same project we are: building a smarter system than itself. So if it’s hard to control the alignment of such a system then it should be hard for the AI. (In theory you can imagine that it’s only hard at our specific level of intelligence but in fact all the arguments that AI alignment is hard seem to apply equally well to the AI making an improved AI as to us making an AI).
See my reply above. The AI x-risk arguments require the assumption that superintelligence necessarily entails the agent try to optimize some simple utility function (this is different than orthogonality which says increasing intelligence doesn’t cause convergence to any particular utility function). So the doesn’t care option is off the table since (by orthogonality) it’s super unlikely you get the one utility function which says just maximize intelligence locally (even global max isn’t enough bc some child AI who has different goals could interfere).
Recursive self-improvement could happen on dimensions that don’t help (or that actively harm) alignment. That’s the core of https://www.lesswrong.com/tag/orthogonality-thesis .
The AI may (or may not; the capability/alignment mismatch may be harder than many AIs can solve; and the AI may not actually care about itself when creating new AIs) try to solve alignment of future iterations with itself, but even if so, it doesn’t seem likely that an AI misaligned with humanity will create an AI that is.
I don’t mean alignment with human concerns. I mean that the AI itself is engaged in the same project we are: building a smarter system than itself. So if it’s hard to control the alignment of such a system then it should be hard for the AI. (In theory you can imagine that it’s only hard at our specific level of intelligence but in fact all the arguments that AI alignment is hard seem to apply equally well to the AI making an improved AI as to us making an AI).
See my reply above. The AI x-risk arguments require the assumption that superintelligence necessarily entails the agent try to optimize some simple utility function (this is different than orthogonality which says increasing intelligence doesn’t cause convergence to any particular utility function). So the doesn’t care option is off the table since (by orthogonality) it’s super unlikely you get the one utility function which says just maximize intelligence locally (even global max isn’t enough bc some child AI who has different goals could interfere).