This still seems to somewhat miss the point (as I pointed out last time): Conditional on org X having an aligned / corrigible AGI, we should expect:
If the AGI is an aligned sovereign, it’ll do the pivotal act (PA) unilaterally if that’s best, and do it in distributed fashion if that’s best (according to whatever it’s aligned to).
If the AGI is more like a corrigible tool, we should expect X to ask ‘their’ AGI what would be best to do (or equivalent), and we’re pretty-much back to case 1.
The question isn’t what the humans in X would do, but what the [AGI + humans] would do, given that the humans have access to that AGI.
If org X is initially pro-unilateral-PAs, then we should expect an aligned AGI to talk them out of it if it’s not best. If org X is initially anti-unilateral-PAs, then we should expect an aligned AGI to talk them into it if it is best.
X will only be favouring/disfavouring PAs for instrumental reasons—and we should expect the AGI to correct them as appropriate.
For these reasons, I’d expect the initial attitude of org X to be largely irrelevant. Since this is predictable, I don’t expect it to impact race dynamics: what will matter is whether the unilateral PA seems more/less likely to succeed than the distributed approach to the AGI.
I think you are missing the possibility that the outcomes of the pivotal process could be -no one builds autonomous AGI -autonomos AGI is build only in post-pivotal outcome states, where the condition of building it is alignment being solved
Sure, that’s true—but in that case the entire argument should be put in terms of: We can (aim to) implement a pivotal process before a unilateral AGI-assisted pivotal act is possible.
And I imagine the issue there would all be around the feasibility of implementation. I think I’d give a Manhattan project to solve the technical problem much higher chances than a pivotal process. (of course people should think about it—I just won’t expect them to come up with anything viable)
Once it’s possible, the attitude of the creating org before interacting with their AGI is likely to be irrelevant.
So e.g. this just seems silly to me:
So, thankfully-according-to-me, no currently-successful AGI labs are oriented on carrying out pivotal acts, at least not all on their own.
They won’t be on their own: they’ll have an AGI to set them straight on what will/won’t work.
This still seems to somewhat miss the point (as I pointed out last time):
Conditional on org X having an aligned / corrigible AGI, we should expect:
If the AGI is an aligned sovereign, it’ll do the pivotal act (PA) unilaterally if that’s best, and do it in distributed fashion if that’s best (according to whatever it’s aligned to).
If the AGI is more like a corrigible tool, we should expect X to ask ‘their’ AGI what would be best to do (or equivalent), and we’re pretty-much back to case 1.
The question isn’t what the humans in X would do, but what the [AGI + humans] would do, given that the humans have access to that AGI.
If org X is initially pro-unilateral-PAs, then we should expect an aligned AGI to talk them out of it if it’s not best.
If org X is initially anti-unilateral-PAs, then we should expect an aligned AGI to talk them into it if it is best.
X will only be favouring/disfavouring PAs for instrumental reasons—and we should expect the AGI to correct them as appropriate.
For these reasons, I’d expect the initial attitude of org X to be largely irrelevant.
Since this is predictable, I don’t expect it to impact race dynamics: what will matter is whether the unilateral PA seems more/less likely to succeed than the distributed approach to the AGI.
I think you are missing the possibility that the outcomes of the pivotal process could be
-no one builds autonomous AGI
-autonomos AGI is build only in post-pivotal outcome states, where the condition of building it is alignment being solved
Sure, that’s true—but in that case the entire argument should be put in terms of:
We can (aim to) implement a pivotal process before a unilateral AGI-assisted pivotal act is possible.
And I imagine the issue there would all be around the feasibility of implementation. I think I’d give a Manhattan project to solve the technical problem much higher chances than a pivotal process. (of course people should think about it—I just won’t expect them to come up with anything viable)
Once it’s possible, the attitude of the creating org before interacting with their AGI is likely to be irrelevant.
So e.g. this just seems silly to me:
They won’t be on their own: they’ll have an AGI to set them straight on what will/won’t work.