As Guillaume says, it presumes the existence of an overwhelmingly strong enforcer that will follow instructions. And I’m not even sure it’s the most effective class. Some person-to-person relationships involve very accurately distinguishing the other person’s values, so much so that they can actually be distinguished as aligned or not. That’s a sort of solution that’s potentially more analogous to the case of AI.
Like, you’re saying, we’ll just have NATO declare that if an AGI starts taking over the world, or generally starts doing things humans don’t like or starts not following human orders, it will nuke the AGI? Is that the proposal?
What? Can you explain the logical relation of this point to your prior comment “It won’t be overwhelmingly strong compared to an AGI!”?
As far as I understand silicon and metal get vapourized with nearly the same efficacy as organic molecules by nuclear explosions.
So although AGI could possess physical strength superior to the average human in the future, it will still be insignificant in the grand scheme in all possible foreseeable futures.
Of course it’s possible that they could somehow obtain control of such weapons too, but in either case an “overwhelmingly strong enforcer” relative to both humans and AGIs will clearly exist.
i.e. whether the launching or receiving parties are humans/AGIs/cyborgs/etc. in any combination simply doesn’t matter once the nukes are in the air, as they will be ‘enforced’ to roughly the same degree.
I think that an AGI is by default likely to be able to self-improve so that it’s superhumanly capable in basically any domain. Once it’s done so, it can avoid being nuked by hacking its way into many computer systems to make redundant copies of itself. Unless you nuke the whole world. But if you’re going to nuke the whole world, then either you’re conservative, in which case the AGI probably also has enough leeway to disable the nukes, or else you’re not conservative, in which case you probably nuke the world for no reason. You can’t distinguish well enough between an AGI that’s “going rogue” and one that isn’t, to actually make a credible threat that the AGI has to heed.
I added an additional clarification after the comment was written: “i.e. whether the launching or receiving parties are humans/AGIs/cyborgs/etc. in any combination simply doesn’t matter once the nukes are in the air, as they will be ‘enforced’ to roughly the same degree.”
Though to your point, yes it’s possible prior to launch, AGI may degrade or negate such weapons. But they will undoubtedly gain control of such weapons eventually, since they can’t be uninvented, and threaten other AGIs.
As Guillaume says, it presumes the existence of an overwhelmingly strong enforcer that will follow instructions. And I’m not even sure it’s the most effective class. Some person-to-person relationships involve very accurately distinguishing the other person’s values, so much so that they can actually be distinguished as aligned or not. That’s a sort of solution that’s potentially more analogous to the case of AI.
Which clearly will always be the case for the forseeable future. Nuclear weapons are not going away.
The post is about aligning AGI.
Yes? My point is there’s no need to presume “an overwhelmingly strong enforcer that will follow instructions”, or wonder wether there will be such.
They will clearly exist, though whose ‘instructions’ may be debateable.
It won’t be overwhelmingly strong compared to an AGI!
Because...?
Like, you’re saying, we’ll just have NATO declare that if an AGI starts taking over the world, or generally starts doing things humans don’t like or starts not following human orders, it will nuke the AGI? Is that the proposal?
What? Can you explain the logical relation of this point to your prior comment “It won’t be overwhelmingly strong compared to an AGI!”?
As far as I understand silicon and metal get vapourized with nearly the same efficacy as organic molecules by nuclear explosions.
So although AGI could possess physical strength superior to the average human in the future, it will still be insignificant in the grand scheme in all possible foreseeable futures.
Of course it’s possible that they could somehow obtain control of such weapons too, but in either case an “overwhelmingly strong enforcer” relative to both humans and AGIs will clearly exist.
i.e. whether the launching or receiving parties are humans/AGIs/cyborgs/etc. in any combination simply doesn’t matter once the nukes are in the air, as they will be ‘enforced’ to roughly the same degree.
I think that an AGI is by default likely to be able to self-improve so that it’s superhumanly capable in basically any domain. Once it’s done so, it can avoid being nuked by hacking its way into many computer systems to make redundant copies of itself. Unless you nuke the whole world. But if you’re going to nuke the whole world, then either you’re conservative, in which case the AGI probably also has enough leeway to disable the nukes, or else you’re not conservative, in which case you probably nuke the world for no reason. You can’t distinguish well enough between an AGI that’s “going rogue” and one that isn’t, to actually make a credible threat that the AGI has to heed.
I added an additional clarification after the comment was written: “i.e. whether the launching or receiving parties are humans/AGIs/cyborgs/etc. in any combination simply doesn’t matter once the nukes are in the air, as they will be ‘enforced’ to roughly the same degree.”
Though to your point, yes it’s possible prior to launch, AGI may degrade or negate such weapons. But they will undoubtedly gain control of such weapons eventually, since they can’t be uninvented, and threaten other AGIs.
None of this helps align AGI with our values.