As I understand it, the argument (roughly) is that if you build an AI from scratch, using just tools available now, you will have to specify its utility function, in a way that the program can understand, as part of that process. Anyone actually trying to work out a utility function that can be programmed would have to have a fairly deep understanding—you can’t just type “make nice things happen and no bad things”, but have to think in terms that can be converted into C or Perl or whatever. In doing so, you would have to have some kind of understanding in your own head of what you’re telling the computer to do, and would be likely to avoid at least the most obvious failure modes.
However, in (say) twenty years that might not be the case—it might be (as an example) that we have natural language processing programs that can take a sentence like ‘make people happy’ and have some form of ‘understanding’ of it, while still not being Turing-test-passing, self-modification-capable fully general AIs. It could then get to the stage that some half-clever person could think “Hmm… If I put this and this and this together, I’ll have a self-modifying AI. And then I’ll just tell it to make everyone smile. What could go wrong?”
As I understand it, the argument (roughly) is that if you build an AI from scratch, using just tools available now, you will have to specify its utility function, in a way that the program can understand, as part of that process. Anyone actually trying to work out a utility function that can be programmed would have to have a fairly deep understanding—you can’t just type “make nice things happen and no bad things”, but have to think in terms that can be converted into C or Perl or whatever. In doing so, you would have to have some kind of understanding in your own head of what you’re telling the computer to do, and would be likely to avoid at least the most obvious failure modes.
However, in (say) twenty years that might not be the case—it might be (as an example) that we have natural language processing programs that can take a sentence like ‘make people happy’ and have some form of ‘understanding’ of it, while still not being Turing-test-passing, self-modification-capable fully general AIs. It could then get to the stage that some half-clever person could think “Hmm… If I put this and this and this together, I’ll have a self-modifying AI. And then I’ll just tell it to make everyone smile. What could go wrong?”