Let me clarify why I asked. I think the “multiple layers of abstraction” idea is essentially “build in a lot of ‘manual’ checks that the AI isn’t misbehaving”, and I don’t think that is a desirable or even possible solution. You can write n layer of checks, but how do you know that you don’t need n+1?
The idea being—as has been pointed out here on LW—that what you really want and need is a mathematical model of morality, which the AI will implement and which moral behaviour will fall out of without you having to specify it explicitly. This is what MIRI are working on with CEV & co.
Whether or not CEV or whatever emerges as the best model to use are gameable is itself a mathematical question,[1] central to the FAI problem.
[1] There are also implementation details to consider, e.g. “can I mess with the substrate” or “can I trust my substrate”.
Let me clarify why I asked. I think the “multiple layers of abstraction” idea is essentially “build in a lot of ‘manual’ checks that the AI isn’t misbehaving”, and I don’t think that is a desirable or even possible solution. You can write n layer of checks, but how do you know that you don’t need n+1?
The idea being—as has been pointed out here on LW—that what you really want and need is a mathematical model of morality, which the AI will implement and which moral behaviour will fall out of without you having to specify it explicitly. This is what MIRI are working on with CEV & co.
Whether or not CEV or whatever emerges as the best model to use are gameable is itself a mathematical question,[1] central to the FAI problem.
[1] There are also implementation details to consider, e.g. “can I mess with the substrate” or “can I trust my substrate”.