Yup. The point is just that there’s a level of intelligence past which, it seems, we can’t do literally anything to get things back on track. Even if we have some theoretically-perfect tools for dealing with it, these tools’ software and physical implementations are guaranteed to have some flaws that a sufficiently smart adversary will be able to arbitrarily exploit given any high-bandwidth access to them. And at that point, even some very benign-seeming interaction with it will be fatal.
This transition point may be easy to miss, too — consider, e. g., the hypothesized sharp capabilities gain. A minimally dangerous system may scale to a maximally dangerous one in the blink of an eye, relatively speaking, and we need to be mindful of that.
Besides, some of the currently-proposed alignment techniques may be able to deal with a minimally-dangerous system if it doesn’t scale. But if it does, and if we let any degree of misalignment persist into that stage, not even the most ambitious solutions would help us then.
Seems like this assumes an actual superintelligence, rather than near-term scarily capable successor of current ML systems.
Yup. The point is just that there’s a level of intelligence past which, it seems, we can’t do literally anything to get things back on track. Even if we have some theoretically-perfect tools for dealing with it, these tools’ software and physical implementations are guaranteed to have some flaws that a sufficiently smart adversary will be able to arbitrarily exploit given any high-bandwidth access to them. And at that point, even some very benign-seeming interaction with it will be fatal.
This transition point may be easy to miss, too — consider, e. g., the hypothesized sharp capabilities gain. A minimally dangerous system may scale to a maximally dangerous one in the blink of an eye, relatively speaking, and we need to be mindful of that.
Besides, some of the currently-proposed alignment techniques may be able to deal with a minimally-dangerous system if it doesn’t scale. But if it does, and if we let any degree of misalignment persist into that stage, not even the most ambitious solutions would help us then.