Let me clarify why I asked. I think the “multiple layers of abstraction” idea is essentially “build in a lot of ‘manual’ checks that the AI isn’t misbehaving”, and I don’t think that is a desirable or even possible solution. You can write n layer of checks, but how do you know that you don’t need n+1?
The idea being—as has been pointed out here on LW—that what you really want and need is a mathematical model of morality, which the AI will implement and which moral behaviour will fall out of without you having to specify it explicitly. This is what MIRI are working on with CEV & co.
Whether or not CEV or whatever emerges as the best model to use are gameable is itself a mathematical question,[1] central to the FAI problem.
[1] There are also implementation details to consider, e.g. “can I mess with the substrate” or “can I trust my substrate”.
What happens if an AI manages to game the system despite the n layers of abstraction?
This is the fundamental problem that is being researched—the top layer of abstraction would be that difficult to define one called “Be Friendly”.
Instead of friendly AI maybe we should look at “dont be an asshole” AI (DBAAAI) - this may be simpler to test and monitor.
Let me clarify why I asked. I think the “multiple layers of abstraction” idea is essentially “build in a lot of ‘manual’ checks that the AI isn’t misbehaving”, and I don’t think that is a desirable or even possible solution. You can write n layer of checks, but how do you know that you don’t need n+1?
The idea being—as has been pointed out here on LW—that what you really want and need is a mathematical model of morality, which the AI will implement and which moral behaviour will fall out of without you having to specify it explicitly. This is what MIRI are working on with CEV & co.
Whether or not CEV or whatever emerges as the best model to use are gameable is itself a mathematical question,[1] central to the FAI problem.
[1] There are also implementation details to consider, e.g. “can I mess with the substrate” or “can I trust my substrate”.