MIRIs argument, which I agree with for once, is that a safe goal can have dangerous sub goals.
The tool AI proponents argument, as I understand it, is that a system that defaults to doing nothing is safer.
I think MIRI types are persistently mishearing that, because they have an entirely different set of presuppositions....that safety is all-or-nothing, not a series of mitigations. That safety is not a matter of engineering, but mathematical proof....not that you can prove anything behind the point where the uncertainty within the system is less than the uncertainty about the system.
MIRIs argument, which I agree with for once, is that a safe goal can have dangerous sub goals.
The tool AI proponents argument, as I understand it, is that a system that defaults to doing nothing is safer.
I think MIRI types are persistently mishearing that, because they have an entirely different set of presuppositions....that safety is all-or-nothing, not a series of mitigations. That safety is not a matter of engineering, but mathematical proof....not that you can prove anything behind the point where the uncertainty within the system is less than the uncertainty about the system.