I suspect we’re talking about two different things.
If you just naively program a super-intelligent AI to satisfice a goal, then, sure, most of the candidate pathways to satisfice will involve accruing a lot of some type of power, because power is useful for achieving goals. That’s a valid point, and it’s important to understand that merely switching from optimizers to satisficers won’t adequately protect us against overly ambitious AIs.
However, that doesn’t mean that it’s futile to explicitly penalize most (but not literally all) of the paths that the AI could take to accumulate too much power. Suppose you adequately penalize all of the tactics that would have catastrophic side effects except for, oh, I don’t know, cornering the world market on all types of grain, because you forgot to think of that one particular path to power. Would most of the candidate plans that the AI submits for human approval then turn out to involve secretly cornering the grain market?
I don’t see why they would. All else being equal, sure, the most powerful tactic available is going to be somewhat more attractive than other, less powerful tactics. But how much more attractive? Would an AI generate 1,000x more plans that involve one specific globally disruptive tactic like cornering the grain market vs. all of the various tactics that are only locally disruptive, like publishing propaganda or hiring away competing researchers or hacking into competing research labs or interrupting the flow of electricity to those labs? 10x more such plans? 2x more such plans? I don’t think that’s the kind of estimate you can make just by gesturing at basic principles of game theory; you’d need some concrete domain knowledge about the AI’s specific planning algorithms.
If the truly dangerous plans are only 10x more common on the initial brainstorming list, then we can probably make sure those plans aren’t chosen by filtering for plans that the AI rates as safe and transparent. We can use 3 different AIs on airgapped equipment—one that only “cares” about evaluating whether a plan is safe (i.e., would the plan cause unacceptably large changes if implemented?), one that only “cares” about evaluating whether a plan is transparent (i.e., can the plan be explained to humans so that the humans will have an accurate understanding of what actions will be taken as part of the plan?), and one that generates plans that satisfice for a substantive goal (in this case, shutting down opposing AI research).
Of the plans that are approved by all 3 AIs, most of them probably won’t involve accumulating excessive amounts of power, because usually if you accumulate too much power, then that is (a) unsafe, and (b) likely to involve lying to ‘your’ human programmers.
If powerful plans are 10x more common on the initial brainstorming list, then you need filters for safety and/or transparency that are at least 90% effective at catching unacceptably powerful plans in order to get back up to a 50% chance of humanity surviving. If powerful plans are more than 10x more common, then you need a set of filters that’s more reliable than 90%. I don’t think we have any idea what those numbers are yet, but I do think it’s worth trying to reduce how common it is for excessively powerful plans to show up on the initial brainstorming list, and I think we can do that by training AIs to recognize dangerously disruptive plans and to try to avoid those types of plans. It’s better to at least try to get AIs to engage with the concept of “this plan is too disruptive” then to throw up our hands and say, “Oh, power is an attractor in game theory space, so there’s no possible way to get brilliant AIs that don’t seize infinite power.”
I just came here to point out that even nuclear weapons were a slow takeoff in terms of their impact on geopolitics and specific wars. American nuclear attacks on Hiroshima and Nagasaki were useful but not necessarily decisive in ending the war on Japan; some historians argue that the Russian invasion of Japanese-occupied Manchuria, the firebombing of Japanese cities with massive conventional bombers, and the ongoing starvation of the Japanese population due to an increasingly successful blockade were at least as influential in the Japanese decision to surrender.
After 1945, the American public had no stomach for nuclear attacks on enough ‘enemy’ civilians to actually cause large countries like the USSR or China to surrender, and nuclear weapons were too expensive and too rare to use them to wipe out large enemy armies—the 300 nukes America had stockpiled at the start of the Korean War in 1950 would not necessarily have been enough to kill the 3 million dispersed Chinese soldiers who actually fought in Korea, let alone the millions more who would likely have volunteered to retaliate against a nuclear attack.
The Soviet Union had a similarly-sized nuclear stockpile and no way to deliver it to the United States or even to the territory of key US allies; the only practical delivery vehicle at that time was via heavy bomber, and the Soviet Union had no heavy bomber force that could realistically penetrate western air defense systems—hence the Nike anti-aircraft missiles rusting along ridgelines near the California coast and the early warning stations dotting the Canadian wilderness. If you can shoot their bombers down before they can reach your cities, then they can’t actually win a nuclear war against you.
Nukes didn’t become a complete gamechanger until the late 1950s, when the increased yields from hydrogen bombs and the increased range from ICBMs created a truly credible threat of annihilation.