The claim here is that either (a) the AI in question doesn’t achieve the main value prop of AI (i.e. reasoning about systems too complicated for humans), or (b) the system itself has to do the work of making sure it’s safe.
I see the intuitive appeal of this claim, but it seems too strong. I suspect if we look at rates of accidents over time they’ll have been going down over time, at least for the last few centuries. It seems like this can continue going down, to an asymptote of zero, in the same way it has been so far—we become better at understanding how accidents happen and more careful in how we use dangerous technologies. We already use tools for this (in software, we use debuggers, profilers, type systems, etc) or delegate to other humans (as in a large company). We can continue to do so with AI systems.
I buy that eventually “most of the work” has to be done by the AI system, but it seems plausible that this won’t happen until well after advanced AI, and that advanced AI will help us in getting there. And so, that from a what-should-we-do perspective, it’s fine to rely on humans for some aspects of safety in the short term (though of course it would be preferable to delegate entirely to a system we knew was safe and beneficial).
(Why bother relying on humans? If you want to build a goal-directed AI system, it sure seems better if it’s under the control of some human, rather than not. It’s not clear what a plausible option is if you can’t have the AI system under the control of some human.)
In the die-roll analogy, the hope is the rate at which you roll dice approximately decays exponentially, so that you only roll an asymptotically constant number of dice.
Yeah, this makes much more sense.
I see the intuitive appeal of this claim, but it seems too strong. I suspect if we look at rates of accidents over time they’ll have been going down over time, at least for the last few centuries. It seems like this can continue going down, to an asymptote of zero, in the same way it has been so far—we become better at understanding how accidents happen and more careful in how we use dangerous technologies. We already use tools for this (in software, we use debuggers, profilers, type systems, etc) or delegate to other humans (as in a large company). We can continue to do so with AI systems.
I buy that eventually “most of the work” has to be done by the AI system, but it seems plausible that this won’t happen until well after advanced AI, and that advanced AI will help us in getting there. And so, that from a what-should-we-do perspective, it’s fine to rely on humans for some aspects of safety in the short term (though of course it would be preferable to delegate entirely to a system we knew was safe and beneficial).
(Why bother relying on humans? If you want to build a goal-directed AI system, it sure seems better if it’s under the control of some human, rather than not. It’s not clear what a plausible option is if you can’t have the AI system under the control of some human.)
In the die-roll analogy, the hope is the rate at which you roll dice approximately decays exponentially, so that you only roll an asymptotically constant number of dice.