Right, this sounds somewhat less like un-unpluggability and more like (reasoning?) capabilities or the instrumental incorrigibility motives I pointed to at the start as a complementary insight. In particular applied to unboxing/escape—perhaps tied to expansionism (replication) of a system which is not intended to do so.
I would say that unpluggability kind of falls into a big set of stories where capabilities generalize further than safety. Having a “plug” is just another type of safety feature. I think it might be an alternative communications strategy to literally have a text world where the ai is told that the human can pull a plug but in the text world it can find some alternative way to power itself if it uses reasoning and planning. I am not sure if there are some people who would be convinced more by this than by your take on it.
I agree that concrete toy demonstrations are one good communication tool! I also agree that demonstrating the capability to act on unpluggability, and discussing/demonstrating the motive to do so, are also useful.
unpluggability kind of falls into a big set of stories where capabilities generalize further than safety
Interesting, I think I see what you mean. This applies for e.g. some kinds of control over active defenses (weapons, propaganda etc.) and many paths to replication. But foundationality (dependence), imperceptibility (of harmful ends), and robustness don’t seem to fit this pattern, to me. They’re properties which a capable system might aim towards, but not capabilities per se, and they can obviously arise through other means too (e.g. accidental or deliberate human activity).
Simply, the properties I’m pointing at here have in common that they’re mechanisms of un-unpluggability. They can arise through exertion of capability, they can be appreciated by intelligent and situationally-aware systems, but they are not intrinsically tied to those. They’re systemic properties which one thing has in relation to its context (i.e. an AI system could have in relation to society).
Right, this sounds somewhat less like un-unpluggability and more like (reasoning?) capabilities or the instrumental incorrigibility motives I pointed to at the start as a complementary insight. In particular applied to unboxing/escape—perhaps tied to expansionism (replication) of a system which is not intended to do so.
I would say that unpluggability kind of falls into a big set of stories where capabilities generalize further than safety. Having a “plug” is just another type of safety feature. I think it might be an alternative communications strategy to literally have a text world where the ai is told that the human can pull a plug but in the text world it can find some alternative way to power itself if it uses reasoning and planning. I am not sure if there are some people who would be convinced more by this than by your take on it.
I agree that concrete toy demonstrations are one good communication tool! I also agree that demonstrating the capability to act on unpluggability, and discussing/demonstrating the motive to do so, are also useful.
Interesting, I think I see what you mean. This applies for e.g. some kinds of control over active defenses (weapons, propaganda etc.) and many paths to replication. But foundationality (dependence), imperceptibility (of harmful ends), and robustness don’t seem to fit this pattern, to me. They’re properties which a capable system might aim towards, but not capabilities per se, and they can obviously arise through other means too (e.g. accidental or deliberate human activity).
Simply, the properties I’m pointing at here have in common that they’re mechanisms of un-unpluggability. They can arise through exertion of capability, they can be appreciated by intelligent and situationally-aware systems, but they are not intrinsically tied to those. They’re systemic properties which one thing has in relation to its context (i.e. an AI system could have in relation to society).