I also disagree with that false conclusion, but I would probably say that ‘goals are dangerous’ is the false premise. Goals are dangerous when, well, they actually are dangerous (to my life or yours,) and when they are attached to sufficient optimising power, as you get at in your last paragraph.
I think the line of argumentation Bostrom is taking here is that superintelligence by definition has a huge amount of optimisation power, so whether it is dangerous to us is reduced to whether its goals are dangerous to us.
MIRIs argument, which I agree with for once, is that a safe goal can have dangerous sub goals.
The tool AI proponents argument, as I understand it, is that a system that defaults to doing nothing is safer.
I think MIRI types are persistently mishearing that, because they have an entirely different set of presuppositions....that safety is all-or-nothing, not a series of mitigations. That safety is not a matter of engineering, but mathematical proof....not that you can prove anything behind the point where the uncertainty within the system is less than the uncertainty about the system.
You’d have to clarify what you mean by “a huge amount of optimization power.” I can imagine plenty of better-than-human intelligences which nevertheless would not have the capability to pose a significant threat to humanity.
I also disagree with that false conclusion, but I would probably say that ‘goals are dangerous’ is the false premise. Goals are dangerous when, well, they actually are dangerous (to my life or yours,) and when they are attached to sufficient optimising power, as you get at in your last paragraph.
I think the line of argumentation Bostrom is taking here is that superintelligence by definition has a huge amount of optimisation power, so whether it is dangerous to us is reduced to whether its goals are dangerous to us.
(Happy New Year!)
MIRIs argument, which I agree with for once, is that a safe goal can have dangerous sub goals.
The tool AI proponents argument, as I understand it, is that a system that defaults to doing nothing is safer.
I think MIRI types are persistently mishearing that, because they have an entirely different set of presuppositions....that safety is all-or-nothing, not a series of mitigations. That safety is not a matter of engineering, but mathematical proof....not that you can prove anything behind the point where the uncertainty within the system is less than the uncertainty about the system.
You’d have to clarify what you mean by “a huge amount of optimization power.” I can imagine plenty of better-than-human intelligences which nevertheless would not have the capability to pose a significant threat to humanity.