I’m not sure I understand your weighting argument. Some capabilities are “convergently instrumental” because they are useful for achieving a lot of purposes. I agree that AIs construction techniques will target obtaining such capabilities, precisely because they are useful.
But if you gain a certain convergently instrumental capability, it then automatically allows you to do a lot of random stuff. That’s what the words mean. And most of that random stuff will not be safe.
I don’t get what the difference is between “the AI will get convergently instrumental capabilities, and we’ll point those at AI alignment” and “the AI will get very powerful and we’ll just ask it to be aligned”, other than a bit of technical jargon.
As soon as the AI it gets sufficiently powerful [convergently instrumental capabilities], it is already dangerous. You need to point it precisely at a safe target in outcomes-space or you’re in trouble. Just vaguely pointing it “towards AI alignment” is almost certainly not enough; specifying that outcome safely is the problem we started with.
(And you still have the problem that while it’s working on that someone else can point it at something much worse.)
You might want to know that I took a look through the site, and was curious, but I just closed the page the moment the “Calculate your contribution” form refused to show me the pricing options unless I gave it an email address.